mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-03-25 01:19:02 +00:00
Docs: Included Nodes Task API and tidied reindex/update-by-query
This commit is contained in:
parent
b87beeb05f
commit
978b24327e
@ -45,6 +45,8 @@ include::cluster/nodes-stats.asciidoc[]
|
||||
|
||||
include::cluster/nodes-info.asciidoc[]
|
||||
|
||||
include::cluster/nodes-task.asciidoc[]
|
||||
|
||||
include::cluster/nodes-hot-threads.asciidoc[]
|
||||
|
||||
include::cluster/allocation-explain.asciidoc[]
|
||||
|
49
docs/reference/cluster/nodes-task.asciidoc
Normal file
49
docs/reference/cluster/nodes-task.asciidoc
Normal file
@ -0,0 +1,49 @@
|
||||
[[nodes-task]]
|
||||
== Nodes Task API
|
||||
|
||||
The nodes task management API retrieves information about the tasks currently
|
||||
executing on one or more nodes in the cluster.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
GET /_tasks <1>
|
||||
GET /_tasks/nodeId1,nodeId2 <2>
|
||||
GET /_tasks/nodeId1,nodeId2/cluster:* <3>
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> Retrieves all tasks currently running on all nodes in the cluster.
|
||||
<2> Retrieves all tasks running on nodes `nodeId1` and `nodeId2`. See <<cluster-nodes>> for more info about how to select individual nodes.
|
||||
<3> Retrieves all cluster-related tasks running on nodes `nodeId1` and `nodeId2`.
|
||||
|
||||
The result will look similar to the following:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"nodes": {
|
||||
"fDlEl7PrQi6F-awHZ3aaDw": {
|
||||
"name": "Gazer",
|
||||
"transport_address": "127.0.0.1:9300",
|
||||
"host": "127.0.0.1",
|
||||
"ip": "127.0.0.1:9300",
|
||||
"tasks": [
|
||||
{
|
||||
"node": "fDlEl7PrQi6F-awHZ3aaDw",
|
||||
"id": 105,
|
||||
"type": "transport",
|
||||
"action": "cluster:monitor/nodes/tasks"
|
||||
},
|
||||
{
|
||||
"node": "fDlEl7PrQi6F-awHZ3aaDw",
|
||||
"id": 106,
|
||||
"type": "direct",
|
||||
"action": "cluster:monitor/nodes/tasks[n]",
|
||||
"parent_node": "fDlEl7PrQi6F-awHZ3aaDw",
|
||||
"parent_id": 105
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
@ -1,8 +1,8 @@
|
||||
[[docs-reindex]]
|
||||
== Reindex API
|
||||
|
||||
`_reindex`'s most basic form just copies documents from one index to another.
|
||||
This will copy documents from `twitter` into `new_twitter`:
|
||||
The most basic form of `_reindex` just copies documents from one index to another.
|
||||
This will copy documents from the `twitter` index into the `new_twitter` index:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
@ -32,12 +32,13 @@ That will return something like this:
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
Just like `_update_by_query`, `_reindex` gets a snapshot of the source index
|
||||
but its target must be a **different** index so version conflicts are unlikely.
|
||||
The `dest` element can be configured like the index API to control optimistic
|
||||
concurrency control. Just leaving out `version_type` (as above) or setting it
|
||||
to `internal` will cause Elasticsearch to blindly dump documents into the
|
||||
target, overwriting any that happen to have the same type and id:
|
||||
Just like <<docs-update-by-query,`_update_by_query`>>, `_reindex` gets a
|
||||
snapshot of the source index but its target must be a **different** index so
|
||||
version conflicts are unlikely. The `dest` element can be configured like the
|
||||
index API to control optimistic concurrency control. Just leaving out
|
||||
`version_type` (as above) or setting it to `internal` will cause Elasticsearch
|
||||
to blindly dump documents into the target, overwriting any that happen to have
|
||||
the same type and id:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
@ -113,7 +114,7 @@ POST /_reindex
|
||||
// AUTOSENSE
|
||||
|
||||
You can limit the documents by adding a type to the `source` or by adding a
|
||||
query. This will only copy `tweet`s made by `kimchy` into `new_twitter`:
|
||||
query. This will only copy ++tweet++'s made by `kimchy` into `new_twitter`:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
@ -140,9 +141,9 @@ lots of sources in one request. This will copy documents from the `tweet` and
|
||||
`post` types in the `twitter` and `blog` index. It'd include the `post` type in
|
||||
the `twitter` index and the `tweet` type in the `blog` index. If you want to be
|
||||
more specific you'll need to use the `query`. It also makes no effort to handle
|
||||
id collisions. The target index will remain valid but it's not easy to predict
|
||||
ID collisions. The target index will remain valid but it's not easy to predict
|
||||
which document will survive because the iteration order isn't well defined.
|
||||
Just avoid that situation, ok?
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
POST /_reindex
|
||||
@ -222,14 +223,15 @@ POST /_reindex
|
||||
|
||||
Think of the possibilities! Just be careful! With great power.... You can
|
||||
change:
|
||||
* "_id"
|
||||
* "_type"
|
||||
* "_index"
|
||||
* "_version"
|
||||
* "_routing"
|
||||
* "_parent"
|
||||
* "_timestamp"
|
||||
* "_ttl"
|
||||
|
||||
* `_id`
|
||||
* `_type`
|
||||
* `_index`
|
||||
* `_version`
|
||||
* `_routing`
|
||||
* `_parent`
|
||||
* `_timestamp`
|
||||
* `_ttl`
|
||||
|
||||
Setting `_version` to `null` or clearing it from the `ctx` map is just like not
|
||||
sending the version in an indexing request. It will cause that document to be
|
||||
@ -257,6 +259,7 @@ the `=`.
|
||||
For example, you can use the following request to copy all documents from
|
||||
the `source` index with the company name `cat` into the `dest` index with
|
||||
routing set to `cat`.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
POST /_reindex
|
||||
@ -316,7 +319,7 @@ Elasticsearch log file. This will be fixed soon.
|
||||
`consistency` controls how many copies of a shard must respond to each write
|
||||
request. `timeout` controls how long each write request waits for unavailable
|
||||
shards to become available. Both work exactly how they work in the
|
||||
{ref}/docs-bulk.html[Bulk API].
|
||||
<<docs-bulk,Bulk API>>.
|
||||
|
||||
`requests_per_second` can be set to any decimal number (1.4, 6, 1000, etc) and
|
||||
throttle the number of requests per second that the reindex issues. The
|
||||
@ -385,7 +388,7 @@ from aborting the operation.
|
||||
=== Works with the Task API
|
||||
|
||||
While Reindex is running you can fetch their status using the
|
||||
{ref}/task/list.html[Task List APIs]:
|
||||
<<nodes-task,Nodes Task API>>:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
|
@ -56,7 +56,7 @@ POST /twitter/tweet/_update_by_query?conflicts=proceed
|
||||
// AUTOSENSE
|
||||
|
||||
You can also limit `_update_by_query` using the
|
||||
{ref}/query-dsl.html[Query DSL]. This will update all documents from the
|
||||
<<query-dsl,Query DSL>>. This will update all documents from the
|
||||
`twitter` index for the user `kimchy`:
|
||||
|
||||
[source,js]
|
||||
@ -73,7 +73,7 @@ POST /twitter/_update_by_query?conflicts=proceed
|
||||
// AUTOSENSE
|
||||
|
||||
<1> The query must be passed as a value to the `query` key, in the same
|
||||
way as the {ref}/search-search.html[Search API]. You can also use the `q`
|
||||
way as the <<search-search,Search API>>. You can also use the `q`
|
||||
parameter in the same way as the search api.
|
||||
|
||||
So far we've only been updating documents without changing their source. That
|
||||
@ -81,6 +81,7 @@ is genuinely useful for things like
|
||||
<<picking-up-a-new-property,picking up new properties>> but it's only half the
|
||||
fun. `_update_by_query` supports a `script` object to update the document. This
|
||||
will increment the `likes` field on all of kimchy's tweets:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
POST /twitter/_update_by_query
|
||||
@ -97,7 +98,7 @@ POST /twitter/_update_by_query
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
Just as in {ref}/docs-update.html[Update API] you can set `ctx.op = "noop"` if
|
||||
Just as in <<docs-update,Update API>> you can set `ctx.op = "noop"` if
|
||||
your script decides that it doesn't have to make any changes. That will cause
|
||||
`_update_by_query` to omit that document from its updates. Setting `ctx.op` to
|
||||
anything else is an error. If you want to delete by a query you can use the
|
||||
@ -167,7 +168,7 @@ the Elasticsearch log file. This will be fixed soon.
|
||||
`consistency` controls how many copies of a shard must respond to each write
|
||||
request. `timeout` controls how long each write request waits for unavailable
|
||||
shards to become available. Both work exactly how they work in the
|
||||
{ref}/docs-bulk.html[Bulk API].
|
||||
<<docs-bulk,Bulk API>>.
|
||||
|
||||
`requests_per_second` can be set to any decimal number (1.4, 6, 1000, etc) and
|
||||
throttle the number of requests per second that the update by query issues. The
|
||||
@ -232,7 +233,7 @@ from aborting the operation.
|
||||
=== Works with the Task API
|
||||
|
||||
While Update By Query is running you can fetch their status using the
|
||||
{ref}/task/list.html[Task List APIs]:
|
||||
<<nodes-task,Nodes Task API>>:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
@ -285,6 +286,7 @@ progress by adding the `updated`, `created`, and `deleted` fields. The request
|
||||
will finish when their sum is equal to the `total` field.
|
||||
|
||||
|
||||
[float]
|
||||
[[picking-up-a-new-property]]
|
||||
=== Pick up a new property
|
||||
|
||||
@ -379,4 +381,4 @@ POST test/_search?filter_path=hits.total
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
Hurray! You can do the exact same thing when adding a field to a multifield.
|
||||
You can do the exact same thing when adding a field to a multifield.
|
||||
|
@ -1,46 +0,0 @@
|
||||
[[tasks-list]]
|
||||
== Tasks List
|
||||
|
||||
The task management API allows to retrieve information about currently running tasks.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
curl -XGET 'http://localhost:9200/_tasks'
|
||||
curl -XGET 'http://localhost:9200/_tasks/nodeId1,nodeId2'
|
||||
curl -XGET 'http://localhost:9200/_tasks/nodeId1,nodeId2/cluster:*'
|
||||
--------------------------------------------------
|
||||
|
||||
The first command retrieves all tasks currently running on all nodes.
|
||||
The second command selectively retrieves tasks from nodes
|
||||
`nodeId1` and `nodeId2`. All the nodes selective options are explained
|
||||
<<cluster-nodes,here>>.
|
||||
The third command retrieves all cluster-related tasks running on nodes `nodeId1` and `nodeId2`.
|
||||
|
||||
The result will look similar to:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"nodes" : {
|
||||
"fDlEl7PrQi6F-awHZ3aaDw" : {
|
||||
"name" : "Gazer",
|
||||
"transport_address" : "127.0.0.1:9300",
|
||||
"host" : "127.0.0.1",
|
||||
"ip" : "127.0.0.1:9300",
|
||||
"tasks" : [ {
|
||||
"node" : "fDlEl7PrQi6F-awHZ3aaDw",
|
||||
"id" : 105,
|
||||
"type" : "transport",
|
||||
"action" : "cluster:monitor/nodes/tasks"
|
||||
}, {
|
||||
"node" : "fDlEl7PrQi6F-awHZ3aaDw",
|
||||
"id" : 106,
|
||||
"type" : "direct",
|
||||
"action" : "cluster:monitor/nodes/tasks[n]",
|
||||
"parent_node" : "fDlEl7PrQi6F-awHZ3aaDw",
|
||||
"parent_id" : 105
|
||||
} ]
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
Loading…
x
Reference in New Issue
Block a user