Commit Graph

109 Commits

Author SHA1 Message Date
Nik Everett 777ea124c7 Fix health docs test
It failed inconsistently when there were pending tasks.
2016-07-16 07:18:11 -04:00
Nik Everett 9f78f8cc91 Convert snippets in health docs to CONSOLE
This should make them easier to read and adds them to the test suite
I changed the example from a two node cluster to a single node cluster
because that is what we have running in the integration tests. It is also
what a user just starting out is likely to see so I think that is ok.
2016-07-15 16:31:37 -04:00
Lee Hinman 58db63b610 Expose the ClusterInfo object in the allocation explain output
This adds an optional parameter to the cluster allocation explain API
that will return the cluster info object, `include_disk_info`, the
output looks like:

GET /_cluster/allocation/explain?include_disk_info -d'
{"index": "i", "shard": 0, "primary": false}'

{
  ... other info ...

  "cluster_info" : {
    "nodes" : {
      "7Uws-vL7R6WVm3ZwQA1n5A" : {
        "node_name" : "Kraven the Hunter",
        "least_available" : {
          "path" : "/path/to/data1",
          "total_bytes" : 165999570944,
          "used_bytes" : 118180614144,
          "free_bytes" : 47818956800,
          "free_disk_percent" : 28.80667493781158,
          "used_disk_percent" : 71.19332506218842
        },
        "most_available" : {
          "path" : "/path/to/data2",
          "total_bytes" : 165999570944,
          "used_bytes" : 118180614144,
          "free_bytes" : 47818956800,
          "free_disk_percent" : 28.80667493781158,
          "used_disk_percent" : 71.19332506218842
        }
      }
    },
    "shard_sizes" : {
      "[i][2][p]_bytes" : 0,
      "[i][4][p]_bytes" : 130,
      "[i][1][p]_bytes" : 0,
      "[i][3][p]_bytes" : 0,
      "[i][0][p]_bytes" : 130
    },
    "shard_paths" : {
      "[i][3], node[7Uws-vL7R6WVm3ZwQA1n5A], [P], s[STARTED], a[id=LegZLDniTVaw0Y1urv7s3g]" : "/path/to/data1/nodes/0",
      "[i][1], node[7Uws-vL7R6WVm3ZwQA1n5A], [P], s[STARTED], a[id=lAU_4vf_SKmoRdtg0ACnjQ]" : "/path/to/data1/nodes/0",
      "[i][2], node[7Uws-vL7R6WVm3ZwQA1n5A], [P], s[STARTED], a[id=Aurpeuj7SeGeyPDDpCtRgg]" : "/path/to/data1/nodes/0",
      "[i][0], node[7Uws-vL7R6WVm3ZwQA1n5A], [P], s[STARTED], a[id=Vgg8GlQTQ82C2j6HYBq8DQ]" : "/path/to/data1/nodes/0",
      "[i][4], node[7Uws-vL7R6WVm3ZwQA1n5A], [P], s[STARTED], a[id=t8hQlVSxQe-58fSeaXcAqg]" : "/path/to/data1/nodes/0"
    }
  }
}

Resolves #14405
2016-07-12 15:52:20 -06:00
Mike McCandless eecf094ac1 add indices nodes info flag to docs 2016-06-20 14:23:32 -04:00
Mike McCandless 3f221bf7cb Add total_indexing_buffer/_in_bytes to nodes info API 2016-06-16 04:39:34 -04:00
Nik Everett e392e0b1df Create get task API that falls back to the .tasks index
This adds a get task API that supports GET /_tasks/${taskId} and
removes that responsibility from the list tasks API. The get task
API supports wait_for_complation just as the list tasks API does
but doesn't support any of the list task API's filters. In exchange,
it supports falling back to the .results index when the task isn't
running any more. Like any good GET API it 404s when it doesn't
find the task.

Then we change reindex, update-by-query, and delete-by-query to
persist the task result when wait_for_completion=false. The leads
to the neat behavior that, once you start a reindex with
wait_for_completion=false, you can fetch the result of the task by
using the get task API and see the result when it has finished.

Also rename the .results index to .tasks.
2016-06-14 13:37:34 -04:00
Mike McCandless 5c525e6606 Remove index_writer_max_memory stat from segment stats 2016-05-31 06:29:29 -04:00
Lee Hinman bfce901edf Merge remote-tracking branch 'dakrone/explain-add-fetch-in-progress' 2016-05-23 09:43:16 -06:00
Lee Hinman 8040ed0c16 Add whether the shard state fetch is pending to the allocation explain API
If the shard state fetch is still pending, this will now return a
message like:

```json
{
  "shard" : {
    "index" : "i",
    "index_uuid" : "de1W1374T4qgvUP4a9Ieaw",
    "id" : 0,
    "primary" : false
  },
  "assigned" : false,
  "shard_state_fetch_pending": true,
  "unassigned_info" : {
    "reason" : "INDEX_CREATED",
    "at" : "2016-04-26T16:34:53.227Z"
  },
  "allocation_delay_ms" : 0,
  "remaining_delay_ms" : 0,
  "nodes" : {
    "z-CbkiELT-SoWT91HIszLA" : {
      "node_name" : "Brain Cell",
      "node_attributes" : {
        "testattr" : "test"
      },
      "store" : {
        "shard_copy" : "NONE"
      },
      "final_decision" : "NO",
      "final_explanation" : "the shard state fetch is pending",
      "weight" : 5.0,
      "decisions" : [ ]
    }
  }
}
```

Adds the `shard_state_fetch_pending` field and uses the state to
influence the final decision and final explanation.

Relates to #17372
2016-05-23 09:42:57 -06:00
Simon Willnauer 35e705877b Limit retries of failed allocations per index (#18467)
Today if a shard fails during initialization phase due to misconfiguration, broken disks,
missing analyzers, not installed plugins etc. elasticsaerch keeps on trying to initialize
or rather allocate that shard. Yet, in the worst case scenario this ends in an endless
allocation loop. To prevent this loop and all it's sideeffects like spamming log files over
and over again this commit adds an allocation decider that stops allocating a shard that
failed more than N times in a row to allocate. The number or retries can be configured via
`index.allocation.max_retry` and it's default is set to `5`. Once the setting is updated
shards with less failures than the number set per index will be allowed to allocate again.

Internally we maintain a counter on the UnassignedInfo that is reset to `0` once the shards
has been started.

Relates to #18417
2016-05-20 20:37:45 +02:00
Jason Tedor ecce53f0df Add I/O statistics on Linux
This commit adds a variety of real disk metrics for the block devices
that back Elasticsearch data paths. A collection of statistics are read
from /proc/diskstats and are used to report the raw metrics for
operations and read/write bytes.

Relates #15915
2016-05-17 16:16:39 -04:00
Clinton Gormley 3f594089c2 Renamed all AUTOSENSE snippets to CONSOLE (#18210) 2016-05-09 15:42:23 +02:00
Nik Everett 4b1c116461 Generate and run tests from the docs
Adds infrastructure so `gradle :docs:check` will extract tests from
snippets in the documentation and execute the tests. This is included
in `gradle check` so it should happen on CI and during a normal build.

By default each `// AUTOSENSE` snippet creates a unique REST test. These
tests are executed in a random order and the cluster is wiped between
each one. If multiple snippets chain together into a test you can annotate
all snippets after the first with `// TEST[continued]` to have the
generated tests for both snippets joined.

Snippets marked as `// TESTRESPONSE` are checked against the response
of the last action.

See docs/README.asciidoc for lots more.

Closes #12583. That issue is about catching bugs in the docs during build.
This catches *some* bugs in the docs during build which is a good start.
2016-05-05 13:58:03 -04:00
Lee Hinman 5648253d45 Add documentation for shard store output in allocation explain API
Relates to #17689
2016-05-03 09:51:15 -06:00
Igor Motov 81c59cae18 Add _cat/tasks
Adds new _cat endpoint that lists all tasks
2016-04-07 09:28:21 -06:00
Igor Motov f599ac5d5a Expose whether a task is cancellable in the _tasks list API
Closes #17369
2016-04-05 19:16:08 -06:00
Daniel Mitterdorfer 930ce1bfec Add up-to-date example of cluster stats API output 2016-03-31 14:41:37 +02:00
Igor Motov e073b0c75d Add ability to group tasks by common parent
By default, tasks are grouped by node. However, task execution in elasticsearch can be quite complex and an individual task that runs on a coordinating node can have many subtasks running on other nodes in the cluster. This commit makes it possible to list task grouped by common parents instead of by node. When this option is enabled all subtask are grouped under the coordinating node task that started all subtasks in the group. To group tasks by common parents, use the following syntax:

 GET /tasks?group_by=parents
2016-03-30 17:50:27 -04:00
javanna 061f09d9a4 Merge branch 'master' into enhancement/remove_node_client_setting 2016-03-29 20:19:33 +02:00
Igor Motov c356b30cff Update task management docs to reflect the latest changes in the interface
Brings docs in line with new list task syntax and adds task cancellation API docs.
2016-03-29 12:26:37 -04:00
javanna 8fc9dbbb99 Merge branch 'master' into enhancement/remove_node_client_setting 2016-03-29 14:27:04 +02:00
Clinton Gormley 978b24327e Docs: Included Nodes Task API and tidied reindex/update-by-query 2016-03-29 13:51:11 +02:00
javanna de5cbda8e7 Merge branch 'master' into enhancement/remove_node_client_setting 2016-03-29 10:48:47 +02:00
Lee Hinman 80ab366de4 Add API to explain why a shard is or isn't assigned
This adds a new `/_cluster/allocation/explain` API that explains why a
shard can or cannot be allocated to nodes in the cluster. Additionally,
it will show where the master *desires* to put the shard, according to
the `ShardsAllocator`.

It looks like this:

```
GET /_cluster/allocation/explain?pretty
{
  "index": "only-foo",
  "shard": 0,
  "primary": false
}
```

Though, you can optionally send an empty body, which means "explain the
allocation for the first unassigned shard you find".

The output when a shard is unassigned looks like this:

```
{
  "shard" : {
    "index" : "only-foo",
    "index_uuid" : "KnW0-zELRs6PK84l0r38ZA",
    "id" : 0,
    "primary" : false
  },
  "assigned" : false,
  "unassigned_info" : {
    "reason" : "INDEX_CREATED",
    "at" : "2016-03-22T20:04:23.620Z"
  },
  "nodes" : {
    "V-Spi0AyRZ6ZvKbaI3691w" : {
      "node_name" : "Susan Storm",
      "node_attributes" : {
        "bar" : "baz"
      },
      "final_decision" : "NO",
      "weight" : 0.06666675,
      "decisions" : [ {
        "decider" : "filter",
        "decision" : "NO",
        "explanation" : "node does not match index include filters [foo:\"bar\"]"
      } ]
    },
    "Qc6VL8c5RWaw1qXZ0Rg57g" : {
      "node_name" : "Slipstream",
      "node_attributes" : {
        "bar" : "baz",
        "foo" : "bar"
      },
      "final_decision" : "NO",
      "weight" : -1.3833332,
      "decisions" : [ {
        "decider" : "same_shard",
        "decision" : "NO",
        "explanation" : "the shard cannot be allocated on the same node id [Qc6VL8c5RWaw1qXZ0Rg57g] on which it already exists"
      } ]
    },
    "PzdyMZGXQdGhqTJHF_hGgA" : {
      "node_name" : "The Symbiote",
      "node_attributes" : { },
      "final_decision" : "NO",
      "weight" : 2.3166666,
      "decisions" : [ {
        "decider" : "filter",
        "decision" : "NO",
        "explanation" : "node does not match index include filters [foo:\"bar\"]"
      } ]
    }
  }
}
```

And when the shard *is* assigned, the output looks like:

```
{
  "shard" : {
    "index" : "only-foo",
    "index_uuid" : "KnW0-zELRs6PK84l0r38ZA",
    "id" : 0,
    "primary" : true
  },
  "assigned" : true,
  "assigned_node_id" : "Qc6VL8c5RWaw1qXZ0Rg57g",
  "nodes" : {
    "V-Spi0AyRZ6ZvKbaI3691w" : {
      "node_name" : "Susan Storm",
      "node_attributes" : {
        "bar" : "baz"
      },
      "final_decision" : "NO",
      "weight" : 1.4499999,
      "decisions" : [ {
        "decider" : "filter",
        "decision" : "NO",
        "explanation" : "node does not match index include filters [foo:\"bar\"]"
      } ]
    },
    "Qc6VL8c5RWaw1qXZ0Rg57g" : {
      "node_name" : "Slipstream",
      "node_attributes" : {
        "bar" : "baz",
        "foo" : "bar"
      },
      "final_decision" : "CURRENTLY_ASSIGNED",
      "weight" : 0.0,
      "decisions" : [ {
        "decider" : "same_shard",
        "decision" : "NO",
        "explanation" : "the shard cannot be allocated on the same node id [Qc6VL8c5RWaw1qXZ0Rg57g] on which it already exists"
      } ]
    },
    "PzdyMZGXQdGhqTJHF_hGgA" : {
      "node_name" : "The Symbiote",
      "node_attributes" : { },
      "final_decision" : "NO",
      "weight" : 3.6999998,
      "decisions" : [ {
        "decider" : "filter",
        "decision" : "NO",
        "explanation" : "node does not match index include filters [foo:\"bar\"]"
      } ]
    }
  }
}
```

Only "NO" decisions are returned by default, but all decisions can be
shown by specifying the `?include_yes_decisions=true` parameter in the
request.

Resolves #14593
2016-03-28 15:21:02 -06:00
javanna bf390a935e Merge branch 'master' into enhancement/remove_node_client_setting 2016-03-21 17:18:23 +01:00
Robin Clarke 046212035c Clarification about precedence of settings
Closes #14559
2016-03-10 14:29:51 +01:00
Martijn van Groningen 2fa33d5c47 Added ingest statistics to node stats API
The ingest stats include the following statistics:
* `ingest.total.count`- The total number of document ingested during the lifetime of this node
* `ingest.total.time_in_millis` - The total time spent on ingest preprocessing documents during the lifetime of this node
* `ingest.total.current` - The total number of documents currently being ingested.
* `ingest.total.failed` - The total number ingest preprocessing operations failed during the lifetime of this node

Also these stats are returned on a per pipeline basis.
2016-03-10 13:21:43 +01:00
Martijn van Groningen 82d01e4315 Added ingest info to node info API, which contains a list of available processors.
Internally the put pipeline API uses this information in node info API to validate if all specified processors in a pipeline exist on all nodes in the cluster.
2016-03-07 14:44:50 +01:00
javanna 9c4a5bbe7e adapt cluster stats api to node.client setting removal
The cluster stats api now returns counts for each node role. The `master_data`, `master_only`, `data_only` and `client` fields have been removed from the response in favour of `master`, `data`, `ingest` and `coordinating_only`. The same node can have multiple roles, hence contribute to multiple roles counts. Every node is implicitly a coordinating node, so whenever a node has no explicit roles, it will be counted as coordinating only.
2016-03-05 10:55:19 +01:00
Clinton Gormley 4e5316591a Update stats.asciidoc
Renamed filter_cache->query_cache and removed id_cache

Closes #16626
2016-01-26 13:48:46 +01:00
Yannick Welsch d5b691b68e Extend reroute with an option to force assign stale primary shard copies
Closes #15708
2016-01-19 12:07:01 +01:00
Jason Tedor df598e8129 Modify load average formats
This commit modifies the load_average in the node stats API response
to be an object containing the one-minute, five-minute and
fifteen-minute load averages as fields (if those values are
available). Additionally, this commit modifies the cat nodes API
response to format the one-minute, five-minute and fifteen-minute load
averages as null if any of the respective values are not available.
2016-01-18 11:41:34 -05:00
Jason Tedor 1de2081ed3 Reintroduce five-minute and fifteen-minute load averages on Linux
This commit reintroduces the five-minute and fifteen-minute load stats
on Linux, and changes the format of the load_average field back to an
array.
2016-01-11 23:42:47 -05:00
Simon Willnauer 6ea266a89c Merge branch 'master' into settings_prototype 2015-12-15 16:33:01 +01:00
Felipe Forbeck 708abcc59a Added desc for parameter <local> 2015-12-11 22:26:33 -02:00
Simon Willnauer ce417540c5 apply review from @clintongormley 2015-12-09 12:24:40 +01:00
Simon Willnauer 2e27ee393f add rest API to reset settings 2015-12-08 14:39:16 +01:00
Jason Tedor 6872d545ac Add system CPU percent to OS stats
This commit adds the system CPU percent reflecting the recent CPU usage
for the whole system.
2015-11-17 13:48:46 -05:00
xuzha fb1d8bb149 Add os.allocated_processors
Current processors setting is not reflected in nodes info API
("os.available_processors"). Add os.allocated_processors to shows
actual number of processors that we are using.
2015-11-03 09:50:17 -08:00
xuzha 97ecd7bf5a Expose pending cluster state queue size in node stats
Add 3 stats about the queue: total queue size, number of committed cluster
states, and number of pending cluster states.
2015-10-28 10:59:15 -07:00
Tanguy Leroux db7aecab4d update list of available os stats
os cpu information is no longer exposed through the nodes stats api
2015-08-31 17:03:45 +02:00
Tanguy Leroux 8e052f0da2 Make platform specific assumptions in OS & Process probes tests 2015-08-17 14:47:23 +02:00
Andrey Fadeyev 081fb1a899 Fixes #11571 - update "Cluster Stats" documentation with valid example 2015-08-13 12:09:31 +02:00
Tanguy Leroux 03c327ff12 Expose ClassloadingMXBean in Node Stats
Closes #12738
2015-08-12 14:29:13 +02:00
Clinton Gormley db541d6fbe Docs: Add warning about allow_primary to the cluster reroute docs
Closes #12503
2015-08-07 12:03:19 +02:00
Tanguy Leroux cf6acbd7c2 Remove obsolete plugins.info_refresh_interval setting
This setting has been removed in  #12367
2015-08-04 21:46:31 +02:00
Tanguy Leroux 19e348a82c Update OS stats 2015-07-08 17:48:10 +02:00
Tanguy Leroux 1c5d8efd47 Process Stats: remove sigar specific stats from APIs and add JMX implementation 2015-07-08 15:12:45 +02:00
Tanguy Leroux 26fd4ba95b Docs: fix wrong title level 2015-07-08 09:29:21 +02:00
Tanguy Leroux fbcf4dbbf7 FS Stats: remove sigar specific stats from APIs:
- fs.*.disk_reads
- fs.*.disk_writes
- fs.*.disk_io_op
- fs.*.disk_read_size_in_bytes
- fs.*.disk_write_size_in_bytes
- fs.*.disk_io_size_in_bytes
- fs.*.disk_queue
- fs.*.disk_service_time
2015-07-07 22:16:39 +02:00