OpenSearch/docs/reference/cat/health.asciidoc

[[cat-health]]
== cat health

`health` is a terse, one-line representation of the same information
from `/_cluster/health`. It has one option `ts` to disable the
timestamping.

[source,shell]
--------------------------------------------------
% curl 192.168.56.10:9200/_cat/health
1384308967 18:16:07 foo green 3 3 3 3 0 0 0
% curl '192.168.56.10:9200/_cat/health?v&ts=0'
cluster status nodeTotal nodeData shards pri relo init unassign
foo     green          3        3      3   3    0    0        0
--------------------------------------------------

A common use of this command is to verify the health is consistent
across nodes:

[source,shell]
--------------------------------------------------
% pssh -i -h list.of.cluster.hosts curl -s localhost:9200/_cat/health
[1] 20:20:52 [SUCCESS] es3.vm
1384309218 18:20:18 foo green 3 3 3 3 0 0 0
[2] 20:20:52 [SUCCESS] es1.vm
1384309218 18:20:18 foo green 3 3 3 3 0 0 0
[3] 20:20:52 [SUCCESS] es2.vm
1384309218 18:20:18 foo green 3 3 3 3 0 0 0
--------------------------------------------------

A less obvious use is to track recovery of a large cluster over
time. With enough shards, starting a cluster, or even recovering after
losing a node, can take time (depending on your network & disk). A way
to track its progress is by using this command in a delayed loop:

[source,shell]
--------------------------------------------------
% while true; do curl 192.168.56.10:9200/_cat/health; sleep 120; done
1384309446 18:24:06 foo red 3 3 20 20 0 0 1812
1384309566 18:26:06 foo yellow 3 3 950 916 0 12 870
1384309686 18:28:06 foo yellow 3 3 1328 916 0 12 492
1384309806 18:30:06 foo green 3 3 1832 916 4 0 0
^C
--------------------------------------------------

In this scenario, we can tell that recovery took roughly four minutes.
If this were going on for hours, we would be able to watch the
`UNASSIGNED` shards drop precipitously.  If that number remained
static, we would have an idea that there is a problem.

[float]
[[timestamp]]
=== Why the timestamp?

You typically are using the `health` command when a cluster is
malfunctioning.  During this period, it's extremely important to
correlate activities across log files, alerting systems, etc.

There are two outputs.  The `HH:MM:SS` output is simply for quick
human consumption.  The epoch time retains more information, including
date, and is machine sortable if your recovery spans days.
First pass at cat docs. 2013-11-14 20:14:39 -05:00			`[[cat-health]]`
[DOCS] Renamed the "cat" chapters to be more searchable 2014-05-16 15:43:35 -04:00			`== cat health`
First pass at cat docs. 2013-11-14 20:14:39 -05:00
			`health` is a terse, one-line representation of the same information
			from `/_cluster/health`. It has one option `ts` to disable the
			`timestamping.`

			`[source,shell]`
			`--------------------------------------------------`
			`% curl 192.168.56.10:9200/_cat/health`
			`1384308967 18:16:07 foo green 3 3 3 3 0 0 0`
[DOCS] Changed the _cat docs to use ?v instead of ?v=true 2013-12-02 09:27:41 -05:00			`% curl '192.168.56.10:9200/_cat/health?v&ts=0'`
First pass at cat docs. 2013-11-14 20:14:39 -05:00			`cluster status nodeTotal nodeData shards pri relo init unassign`
			`foo green 3 3 3 3 0 0 0`
			`--------------------------------------------------`

			`A common use of this command is to verify the health is consistent`
			`across nodes:`

			`[source,shell]`
			`--------------------------------------------------`
			`% pssh -i -h list.of.cluster.hosts curl -s localhost:9200/_cat/health`
			`[1] 20:20:52 [SUCCESS] es3.vm`
			`1384309218 18:20:18 foo green 3 3 3 3 0 0 0`
			`[2] 20:20:52 [SUCCESS] es1.vm`
			`1384309218 18:20:18 foo green 3 3 3 3 0 0 0`
			`[3] 20:20:52 [SUCCESS] es2.vm`
			`1384309218 18:20:18 foo green 3 3 3 3 0 0 0`
			`--------------------------------------------------`

			`A less obvious use is to track recovery of a large cluster over`
			`time. With enough shards, starting a cluster, or even recovering after`
			`losing a node, can take time (depending on your network & disk). A way`
			`to track its progress is by using this command in a delayed loop:`

			`[source,shell]`
			`--------------------------------------------------`
			`% while true; do curl 192.168.56.10:9200/_cat/health; sleep 120; done`
			`1384309446 18:24:06 foo red 3 3 20 20 0 0 1812`
			`1384309566 18:26:06 foo yellow 3 3 950 916 0 12 870`
			`1384309686 18:28:06 foo yellow 3 3 1328 916 0 12 492`
			`1384309806 18:30:06 foo green 3 3 1832 916 4 0 0`
			`^C`
			`--------------------------------------------------`

			`In this scenario, we can tell that recovery took roughly four minutes.`
			`If this were going on for hours, we would be able to watch the`
			`UNASSIGNED` shards drop precipitously. If that number remained
			`static, we would have an idea that there is a problem.`

			`[float]`
			`[[timestamp]]`
			`=== Why the timestamp?`

			You typically are using the `health` command when a cluster is
			`malfunctioning. During this period, it's extremely important to`
			`correlate activities across log files, alerting systems, etc.`

			There are two outputs. The `HH:MM:SS` output is simply for quick
			`human consumption. The epoch time retains more information, including`
			`date, and is machine sortable if your recovery spans days.`