OpenSearch/docs/river/index.asciidoc

[[river]]
= Rivers

== Intro 

A river is a pluggable service running within elasticsearch cluster
pulling data (or being pushed with data) that is then indexed into the
cluster.

A river is composed of a unique name and a type. The type is the type of
the river (out of the box, there is the `dummy` river that simply logs
that it is running). The name uniquely identifies the river within the
cluster. For example, one can run a river called `my_river` with type
`dummy`, and another river called `my_other_river` with type `dummy`.


== How it Works

A river instance (and its name) is a type within the `_river` index. All
different rivers implementations accept a document called `_meta` that
at the very least has the type of the river (twitter / couchdb / ...)
associated with it. Creating a river is a simple curl request to index
that `_meta` document (there is actually a `dummy` river used for
testing):

[source,js]
--------------------------------------------------
curl -XPUT 'localhost:9200/_river/my_river/_meta' -d '{
    "type" : "dummy"
}'
--------------------------------------------------

A river can also have more data associated with it in the form of more
documents indexed under the given index type (the river name). For
example, storing the last indexed state can be stored in a document that
holds it.

Deleting a river is a call to delete the type (and all documents
associated with it):

[source,js]
--------------------------------------------------
curl -XDELETE 'localhost:9200/_river/my_river/'
--------------------------------------------------


== Cluster Allocation

Rivers are singletons within the cluster. They get allocated
automatically to one of the nodes and run. If that node fails, an river
will be automatically allocated to another node.

River allocation on nodes can be controlled on each node. The
`node.river` can be set to `_none_` disabling any river allocation to
it. The `node.river` can also include a comma separated list of either
river names or types controlling the rivers allowed to run on it. For
example: `my_river1,my_river2`, or `dummy,twitter`.


== Status

Each river (regardless of the implementation) exposes a high level
`_status` doc which includes the node the river is running on. Getting
the status is a simple curl GET request to
`/_river/{river name}/_status`.

include::couchdb.asciidoc[]

include::rabbitmq.asciidoc[]

include::twitter.asciidoc[]

include::wikipedia.asciidoc[]
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`[[river]]`
			`= Rivers`

			`== Intro`

			`A river is a pluggable service running within elasticsearch cluster`
			`pulling data (or being pushed with data) that is then indexed into the`
			`cluster.`

			`A river is composed of a unique name and a type. The type is the type of`
			the river (out of the box, there is the `dummy` river that simply logs
			`that it is running). The name uniquely identifies the river within the`
			cluster. For example, one can run a river called `my_river` with type
			`dummy`, and another river called `my_other_river` with type `dummy`.

[DOCS] Chunk depth now configurable, so [float] not required 2013-09-03 10:15:07 -04:00
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`== How it Works`

			A river instance (and its name) is a type within the `_river` index. All
			different rivers implementations accept a document called `_meta` that
			`at the very least has the type of the river (twitter / couchdb / ...)`
			`associated with it. Creating a river is a simple curl request to index`
			that `_meta` document (there is actually a `dummy` river used for
			`testing):`

			`[source,js]`
			`--------------------------------------------------`
			`curl -XPUT 'localhost:9200/_river/my_river/_meta' -d '{`
			`"type" : "dummy"`
			`}'`
			`--------------------------------------------------`

			`A river can also have more data associated with it in the form of more`
			`documents indexed under the given index type (the river name). For`
			`example, storing the last indexed state can be stored in a document that`
			`holds it.`

			`Deleting a river is a call to delete the type (and all documents`
			`associated with it):`

			`[source,js]`
			`--------------------------------------------------`
			`curl -XDELETE 'localhost:9200/_river/my_river/'`
			`--------------------------------------------------`

[DOCS] Chunk depth now configurable, so [float] not required 2013-09-03 10:15:07 -04:00
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`== Cluster Allocation`

			`Rivers are singletons within the cluster. They get allocated`
			`automatically to one of the nodes and run. If that node fails, an river`
			`will be automatically allocated to another node.`

			`River allocation on nodes can be controlled on each node. The`
			`node.river` can be set to `_none_` disabling any river allocation to
			it. The `node.river` can also include a comma separated list of either
			`river names or types controlling the rivers allowed to run on it. For`
			example: `my_river1,my_river2`, or `dummy,twitter`.

[DOCS] Chunk depth now configurable, so [float] not required 2013-09-03 10:15:07 -04:00
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`== Status`

			`Each river (regardless of the implementation) exposes a high level`
			`_status` doc which includes the node the river is running on. Getting
			`the status is a simple curl GET request to`
			`/_river/{river name}/_status`.

			`include::couchdb.asciidoc[]`

			`include::rabbitmq.asciidoc[]`

			`include::twitter.asciidoc[]`

			`include::wikipedia.asciidoc[]`