This change adds cgroup memory usage/limit to the OS stats section of
the node stats on Linux. This information is useful because in Docker
containers the standard node stats report the host memory limit, not
taking account of extra restrictions that may have been applied to the
container.
The original idea was to store these values as Long, truncating any values
outside the range of long. However, this meant that in the relatively common
case of no limit being applied, users would not see the same value in the OS
stats as they see by querying Linux directly. So instead the values are stored
as String. This change places a burden on consumers of the strings to
convert the strings to numbers and decide what to do about extremely large
values, but there will be very few consumers and they would need to have a
policy for dealing with "no limit" in any case.
This commit fixes a silly doc bug where the field that represents the
total CPU time consumed by all tasks in the same cgroup was mistakenly
reported as "usage" instead of "usage_nanos".
Relates #21029
Today when parsing a stats request, Elasticsearch silently ignores
incorrect metrics. This commit removes lenient parsing of stats requests
for the nodes stats and indices stats APIs.
Relates #21417
On some systems, cgroups will be available but not configured. And in
some cases, cgroups will be configured, but not for the subsystems that
we are expecting (e.g., cpu and cpuacct). This commit strengthens the
handling of cgroup stats on such systems.
Relates #21094
This commit adds a variety of real disk metrics for the block devices
that back Elasticsearch data paths. A collection of statistics are read
from /proc/diskstats and are used to report the raw metrics for
operations and read/write bytes.
Relates #15915
The ingest stats include the following statistics:
* `ingest.total.count`- The total number of document ingested during the lifetime of this node
* `ingest.total.time_in_millis` - The total time spent on ingest preprocessing documents during the lifetime of this node
* `ingest.total.current` - The total number of documents currently being ingested.
* `ingest.total.failed` - The total number ingest preprocessing operations failed during the lifetime of this node
Also these stats are returned on a per pipeline basis.
This commit modifies the load_average in the node stats API response
to be an object containing the one-minute, five-minute and
fifteen-minute load averages as fields (if those values are
available). Additionally, this commit modifies the cat nodes API
response to format the one-minute, five-minute and fifteen-minute load
averages as null if any of the respective values are not available.
First, this breaks backwards compatibility!
* Removed /_cluster/nodes/stats endpoint
* Excpect the stats types not as parameters, but as part of the URL
* Returning all indices stats by default, returning all nodes stats by default
* Supporting groups & types in nodes stats now as well
* Updated documentation & tests accordingly
* Allow level parameter for "shards" and "indices" (cluster does not make sense here)
Closes#4057
This adds the field data circuit breaker, which is used to estimate
the amount of memory required to load field data before loading it. It
then raises a CircuitBreakingException if the limit is exceeded.
It is configured with two parameters:
`indices.fielddata.cache.breaker.limit` - the maximum number of bytes
of field data to be loaded before circuit breaking. Defaults to
`indices.fielddata.cache.size` if set, unbounded otherwise.
`indices.fielddata.cache.breaker.overhead` - a contast for all field
data estimations to be multiplied with before aggregation. Defaults to
1.03.
Both settings can be configured dynamically using the cluster update
settings API.