Memory usage of queries can't be properly accounted, which can be an issue when
large queries are cached since the actual memory usage will be much higher than
what the cache thinks. This problem is very hard if not impossible to fix so as
a workaround I would like to decrease the maximum number of cached queries so
that this problem is less likely to cause trouble in practice.
For the record, this problem is more likely to occur in envirenments that have
small shards or don't give much memory to the JVM.
Closes#26938
Today we internally accumulate elapsed scroll time in nanoseconds. The
problem here is that this can reasonably overflow. For example, on a
system with scrolls that are open for ten minutes on average, after
sixteen million scrolls the largest value that can be represented by a
long will be executed. To address this, we switch to internally
representing scrolls using microseconds as this enables with the same
number of scrolls scrolls that are open for seven days on average, or
with the same average elapsed time sixteen billion scrolls which will
never happen (executing one scroll a second until sixteen billion have
executed would not occur until more than five-hundred years had
elapsed).
Relates #27068
Upgrade to Jackson 2.9.2 and also use a boolean `closed` flag to
indicate that a FastStringReader instance is closed, so that length
is still correctly reported after the reader is closed.
Due to a change happened via #26102 to make the nested source consistent
with or without source filtering, the _source of a nested inner hit was
always wrapped in the parent path. This turned out to be not ideal for
users relying on the nested source, as it would require additional parsing
on the client side. This change fixes this, the _source of nested inner hits
is now no longer wrapped by parent json objects, irregardless of whether
the _source is included as is or source filtering is used.
Internally source filtering and highlighting relies on the fact that the
_source of nested inner hits are accessible by its full field path, so
in order to now break this, the conversion of the _source into its binary
form is performed in FetchSourceSubPhase, after any potential source filtering
is performed to make sure the structure of _source of the nested inner hit
is consistent irregardless if source filtering is performed.
PR for #26944Closes#26944
* Balance shards for an index more evenly across multiple data paths
When a node has multiple data paths configured, and is assigned all of the
shards for a particular index, it's possible now that all shards will be
assigned to the same path (see #16763).
This change keeps the same behavior around determining the "best" path for a
shard based on space, however, it enforces limits for the number of shards on a
path for an index from the single-node perspective. For example:
Assume you had a node with 4 data paths, where `/path1` has a tremendously high
amount of disk space available compared to the other paths. If you create an
index with 5 primary shards, the previous behavior would be to assign all 5
shards to `/path1`.
This change would enforce a limit of 2 shards to each data path for that
particular node, so you would end up with the following distribution:
- `/path1` - 2 shards (because it has the most usable space)
- `/path2` - 1 shard
- `/path3` - 1 shard
- `/path4` - 1 shard
Note, however, that this limit is only enforced at the local node level for
simplicity in implementation, so if you had multiple nodes, the "limit" for the
node is still 2, so assuming you had enough nodes that there was only 2 shards
for this index assigned to this node, they would still both be assigned to
`/path1`.
* Switch from ObjectLongHashMap to regular HashMap
* Remove unneeded Files.isDirectory check
* Skip iterating directories when not necessary
* Add message to assert
* Implement different (better) ranking for node paths
This is the method we discussed
* Remove unused pathHasEnoughSpace method
* Use findFirst instead of .get(0);
* Update for master merge to fix compilation
Settings.putArray -> Settings.putList
Today when getting ready to enter seccomp, we do some probes to ensure
that we are really talking to seccomp, etc. One of these probes is pure
paranoia. The paranoia was driven by a kernel bug
(https://lkml.org/lkml/2014/7/20/222) that only impacted 32-bit x86
kernels wherein invoking a non-existant syscall was not returning ENOSYS
(as it should). This probe causes problems though, for example in
containers with syscall filters, invoking a non-existant syscall will
lead to the process being sent SIGSYS and terminated. We do not need
this paranoid, we do not support 32-bit, and our other probes give us
enough of a defense to ensure that we are talking to seccomp (and we
hardcode the seccomp syscall number for platforms that we
support). Given that this probe offers us little value, but does cause
problems in valid use-cases, this commit removes this paranoia.
Relates #27016
The internal engine constructor declares a checked engine exception yet
this constructor does not actually throw this exception. This commit
removes this declaration from the internal engine constructor.
Relates #27022
Today all these API calls have a sideeffect of making documents visible
to search requests. While this is sometimes desired it's an unnecessary sideeffect
and now that we have an internal (engine-private) index reader (#26972) we artificially
add a refresh call for bwc. This change removes this sideeffect in 7.0.
Right now we are attempting to set SO_LINGER to 0 on server channels
when we are stopping the tcp transport. This is not a supported socket
option and throws an exception. This also prevents the channels from
being closed.
This commit 1. doesn't set SO_LINGER for server channges, 2. checks
that it is a supported option in nio, and 3. changes the log message
to warn for server channel close exceptions.
This commit adds a note to the docs on the full_id parameter in the cat
nodes API. This is a useful parameter but was not previously documented
anywhere.
Relates #27009
This commit reformats a paragraph in the template docs to fit in 80
columns as for the rest of the doc, and as-is a standard that we loosely
adhere to.
This commit clarifies the interaction between settings specified in a
create index request, and those that would come from any templates that
apply to the create index request.
Relates #26994
Today we only allow to decode byte arrays where the data has a 0 offset
and the same length as the array. Allowing to decode stuff from a slice will
make decoding IDs cheaper if the the ID is for instance coming from a term dictionary
or BytesRef.
Relates to #26931
Today, when ES detects it's using too much heap vs the configured indexing
buffer (default 10% of JVM heap) it opens a new searcher to force Lucene to move
the bytes to disk, clear version map, etc.
But this has the unexpected side effect of making newly indexed/deleted
documents visible to future searches, which is not nice for users who are trying
to prevent that, e.g. #3593.
This is also an indirect spinoff from #26802 where we potentially pay a big
price on rebuilding caches etc. when updates / realtime-get is used. We are
refreshing the internal reader for realtime gets which causes for instance
global ords to be rebuild. I think we can gain quite a bit if we'd use a reader
that is only used for GETs and not for searches etc. that way we can also solve
problems of searchers being refreshed unexpectedly aside of replica recovery /
relocation.
Closes#15768Closes#26912
Previous to this change the weights for the filter and filters aggregation were created in the `Filter(s)AggregatorFactory` which meant that they were created regardless of whether the aggregator actually collects any documents. This meant that for filters that are expensive to initialise, requests would not be quick when the query of the request was (or effectively was) a `match_none` query.
This change maintains a single Weight instance for each filter across parent buckets but passes a weight supplier to the aggregator instances which will create the weight on first call and then return that instance for subsequent calls.
Our convention is to use lower case when naming things "Tcp". For
example, `TcpTransport`. This commit renames the outlier
(TcpTransportTests) to use lower case.
When a node which contains the primary shard is unavailable, the primary
stats (and the total stats) of an `IndexStats` will be empty for a short
moment (while the primary shard is being relocated). However, we assume
that these stats are always non-empty when handling `_cat/indices` in
RestIndicesAction. This commit checks the content of these stats before
accessing.
Closes#26942
While opening a connection to a node, a channel can subsequently
close. If this happens, a future callback whose purpose is to close all
other channels and disconnect from the node will fire. However, this
future will not be ready to close all the channels because the
connection will not be exposed to the future callback yet. Since this
callback is run once, we will never try to disconnect from this node
again and we will be left with a closed channel. This commit adds a
check that all channels are open before exposing the channel and throws
a general connection exception. In this case, the usual connection retry
logic will take over.
Relates #26932
We had a TODO about adding tests around cached boxing. In #24077
I tracked down the uncached boxing tests and saw the TODO. Cached
boxing testing is a fairly small extension to that work.
This commit fixes an issue with the handling of paths containing
parentheses on Windows. When such a path is used as a component of
Elasticsearch home, then a later echo statement that is guarded by an if
will fail because the parentheses in the path will be confused with the
parentheses defining the if block. This commit fixes the issue by
protecting this echo statement by wrapping the possibly offending path
in quotes.
Relates #26916
With this commit we simplify our network layer by only allowing to define a
fixed receive predictor size instead of a minimum and maximum value. This also
means that the following (previously undocumented) settings are removed:
* http.netty.receive_predictor_min
* http.netty.receive_predictor_max
Using an adaptive sizing policy in the receive predictor is a very low-level
optimization. The implications on allocation behavior are extremely hard to grasp
(see our previous work in #23185) and adaptive sizing does not provide a lot of
benefits (see benchmarks in #26165 for more details).