2015-07-13 06:23:07 -04:00
|
|
|
[[getting-started]]
|
|
|
|
== Getting Started
|
|
|
|
|
|
|
|
This getting started guide walks you through installing Watcher and creating your first watches,
|
|
|
|
and introduces the building blocks you'll use to create custom watches. You can install Watcher
|
2015-08-24 08:40:16 -04:00
|
|
|
on nodes running Elasticsearch {version}.
|
2015-07-13 06:23:07 -04:00
|
|
|
|
|
|
|
To install and run Watcher:
|
|
|
|
|
2015-07-24 06:24:44 -04:00
|
|
|
. Run `bin/plugin install` from `ES_HOME` to install the License plugin:
|
2015-07-13 06:23:07 -04:00
|
|
|
+
|
|
|
|
[source,shell]
|
|
|
|
----------------------------------------------------------
|
2015-08-06 09:46:48 -04:00
|
|
|
bin/plugin install license
|
2015-07-13 06:23:07 -04:00
|
|
|
----------------------------------------------------------
|
|
|
|
+
|
|
|
|
NOTE: You need to install the License and Watcher plugins on each node in your cluster.
|
|
|
|
|
2015-07-24 06:24:44 -04:00
|
|
|
. Run `bin/plugin install` to install the Watcher plugin.
|
2015-07-13 06:23:07 -04:00
|
|
|
+
|
|
|
|
[source,shell]
|
|
|
|
----------------------------------------------------------
|
2015-08-06 09:46:48 -04:00
|
|
|
bin/plugin install watcher
|
2015-07-13 06:23:07 -04:00
|
|
|
----------------------------------------------------------
|
|
|
|
+
|
|
|
|
NOTE: If you are using a <<package-installation, DEB/RPM distribution>> of Elasticsearch,
|
|
|
|
run the installation with superuser permissions. To perform an offline installation,
|
|
|
|
<<offline-installation, download the Watcher binaries>>.
|
|
|
|
|
|
|
|
. Start Elasticsearch.
|
|
|
|
+
|
|
|
|
[source,shell]
|
|
|
|
----------------------------------------------------------
|
|
|
|
bin/elasticsearch
|
|
|
|
----------------------------------------------------------
|
|
|
|
|
|
|
|
. To verify that Watcher is set up, call the Watcher `_stats` API:
|
|
|
|
+
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
curl -XGET 'http://localhost:9200/_watcher/stats?pretty'
|
|
|
|
--------------------------------------------------
|
|
|
|
+
|
|
|
|
You haven't set up any watches yet, so the `watch_count` is zero and the `execution_thread_pool` queue
|
|
|
|
is empty:
|
|
|
|
+
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"watcher_state": "started",
|
|
|
|
"watch_count": 0,
|
|
|
|
"execution_thread_pool": {
|
|
|
|
"queue_size": 0,
|
|
|
|
"max_size": 0
|
|
|
|
}
|
2015-10-13 17:20:05 -04:00
|
|
|
"manually_stopped" : false
|
2015-07-13 06:23:07 -04:00
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
Ready to start building watches? Choose one of the following scenarios:
|
|
|
|
|
|
|
|
* <<watch-log-data, Watch Log Data for Errors>>
|
|
|
|
* <<watch-cluster-status, Watch Your Cluster Health>>
|
|
|
|
|
|
|
|
[[watch-log-data]]
|
|
|
|
=== Watch Log Data for Errors
|
|
|
|
|
|
|
|
You can easily configure a watch that periodically checks your log data for error conditions:
|
|
|
|
|
|
|
|
* <<log-add-input, Schedule the watch and define an input>> to search your log data for error events.
|
|
|
|
* <<log-add-condition, Add a condition>> that checks to see if any errors were found.
|
|
|
|
* <<log-take-action, Take action>> if there are any errors.
|
|
|
|
|
|
|
|
[float]
|
|
|
|
[[log-add-input]]
|
|
|
|
==== Schedule the Watch and Add an Input
|
|
|
|
|
|
|
|
A watch <<trigger-schedule, schedule>> controls how often a watch is triggered. The watch
|
|
|
|
<<input, input>> gets the data that you want to evaluate.
|
|
|
|
|
|
|
|
To periodically search your log data and load the results into the watch, you use an
|
|
|
|
<<schedule-interval, interval>> schedule and a <<input-search, search>> input. For example, the
|
|
|
|
following Watch searches the `logs` index for errors every 10 seconds:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
curl -XPUT 'http://localhost:9200/_watcher/watch/log_error_watch' -d '{
|
|
|
|
"trigger" : {
|
|
|
|
"schedule" : { "interval" : "10s" } <1>
|
|
|
|
},
|
|
|
|
"input" : {
|
|
|
|
"search" : {
|
|
|
|
"request" : {
|
|
|
|
"indices" : [ "logs" ],
|
|
|
|
"body" : {
|
|
|
|
"query" : {
|
|
|
|
"match" : { "message": "error" }
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}'
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
<1> Schedules are typically configured to run less frequently. This example sets the interval to
|
|
|
|
10 seconds so you can easily see the watches being triggered. Since this watch runs so frequently,
|
|
|
|
don't forget to <<log-delete, delete the watch>> when you're done experimenting.
|
|
|
|
|
|
|
|
If you check the watch history you'll see that the watch is being triggered every 10 seconds.
|
|
|
|
However, the search isn't returning any results so nothing is loaded into the watch payload.
|
|
|
|
|
|
|
|
For example, the following snippet gets the last ten watch executions (a.k.a watch records) from
|
|
|
|
the watch history:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------------------------------------
|
|
|
|
curl -XGET 'http://localhost:9200/.watch_history*/_search?pretty' -d '{
|
|
|
|
"sort" : [
|
|
|
|
{ "result.execution_time" : "desc" }
|
|
|
|
]
|
|
|
|
}'
|
|
|
|
--------------------------------------------------------------------------------
|
|
|
|
|
|
|
|
[float]
|
|
|
|
[[log-add-condition]]
|
|
|
|
==== Add a Condition
|
|
|
|
A <<condition, condition>> evaluates the data you've loaded into the watch and determines if any
|
|
|
|
action is required. Since you've defined an input that loads log errors into the watch, you can
|
|
|
|
define a condition that checks to see if any errors were found.
|
|
|
|
|
|
|
|
For example, you could add a condition that simply checks to see if the search input returned
|
|
|
|
any hits.
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
curl -XPUT 'http://localhost:9200/_watcher/watch/log_error_watch' -d '{
|
|
|
|
"trigger" : { "schedule" : { "interval" : "10s" } },
|
|
|
|
"input" : {
|
|
|
|
"search" : {
|
|
|
|
"request" : {
|
|
|
|
"indices" : [ "logs" ],
|
|
|
|
"body" : {
|
|
|
|
"query" : {
|
|
|
|
"match" : { "message": "error" }
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"condition" : {
|
|
|
|
"compare" : { "ctx.payload.hits.total" : { "gt" : 0 }} <1>
|
|
|
|
}
|
|
|
|
}'
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
<1> The <<condition-compare, compare>> condition lets you easily compare against values in the
|
|
|
|
execution context without enabling dynamic scripting.
|
|
|
|
|
|
|
|
The condition result is recorded as part of the `watch_record` each time the watch executes. Since
|
|
|
|
there are currently no log events in the `logs` index, the watch condition will not be met. If you
|
|
|
|
search the history for watch executions where the condition was met during the last 5 seconds,
|
|
|
|
there are no hits:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------------------------------------
|
|
|
|
curl -XGET 'http://localhost:9200/.watch_history*/_search?pretty' -d '{
|
|
|
|
"query" : {
|
|
|
|
"bool" : {
|
|
|
|
"must" : [
|
|
|
|
{ "match" : { "result.condition.met" : true }},
|
|
|
|
{ "range" : { "result.execution_time" : { "from" : "now-10s"}}}
|
|
|
|
]
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}'
|
|
|
|
--------------------------------------------------------------------------------
|
|
|
|
|
|
|
|
For the condition in the example above to evaluate to `true`, you need to add an event to the
|
|
|
|
`logs` index that contains an error.
|
|
|
|
|
|
|
|
For example, the following snippet adds a 404 error to the `logs` index:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
curl -XPOST 'http://localhost:9200/logs/event' -d '{
|
|
|
|
"timestamp" : "2015-05-17T18:12:07.613Z",
|
|
|
|
"request" : "GET index.html",
|
|
|
|
"status_code" : 404,
|
|
|
|
"message" : "Error: File not found"
|
|
|
|
}'
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
Once you add this event, the next time the watch executes its condition will evaluate to `true`.
|
|
|
|
You can verify this by searching the watch history:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------------------------------------
|
|
|
|
curl -XGET 'http://localhost:9200/.watch_history*/_search?pretty' -d '{
|
|
|
|
"query" : {
|
|
|
|
"bool" : {
|
|
|
|
"must" : [
|
|
|
|
{ "match" : { "result.condition.met" : true }},
|
|
|
|
{ "range" : { "result.execution_time" : { "from" : "now-10s"}}}
|
|
|
|
]
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}'
|
|
|
|
--------------------------------------------------------------------------------
|
|
|
|
|
|
|
|
[float]
|
|
|
|
[[log-take-action]]
|
|
|
|
==== Take Action
|
|
|
|
|
|
|
|
Recording `watch_records` in the watch history is nice, but the real power of Watcher is being able
|
|
|
|
to do something when the watch condition is met. The watch's <<actions, actions>> define what to
|
|
|
|
do when the watch condition evaluates to `true`--you can send emails, call third-party webhooks,
|
|
|
|
write documents to an Elasticsearch or log messages to the standards Elasticsearch log files.
|
|
|
|
|
|
|
|
For example, you could add an action to write a message to the Elasticsearch log when an error is
|
|
|
|
detected.
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
curl -XPUT 'http://localhost:9200/_watcher/watch/log_error_watch' -d '{
|
|
|
|
"trigger" : { "schedule" : { "interval" : "10s" } },
|
|
|
|
"input" : {
|
|
|
|
"search" : {
|
|
|
|
"request" : {
|
|
|
|
"indices" : [ "logs" ],
|
|
|
|
"body" : {
|
|
|
|
"query" : {
|
|
|
|
"match" : { "message": "error" }
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"condition" : {
|
|
|
|
"compare" : { "ctx.payload.hits.total" : { "gt" : 0 }}
|
|
|
|
},
|
|
|
|
"actions" : {
|
|
|
|
"log_error" : {
|
|
|
|
"logging" : {
|
|
|
|
"text" : "Found {{ctx.payload.hits.total}} errors in the logs"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}'
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
[float]
|
|
|
|
[[log-delete]]
|
|
|
|
==== Delete the Watch
|
|
|
|
|
|
|
|
Since the `log_error_watch` is configured to run every 10 seconds, make sure you delete it when
|
|
|
|
you're done experimenting. Otherwise, the noise from this sample watch will make it hard to see
|
|
|
|
what else is going on in your watch history and log file.
|
|
|
|
|
|
|
|
To remove the watch, use the <<api-rest-delete-watch, DELETE watch>> API:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
curl -XDELETE 'http://localhost:9200/_watcher/watch/log_error_watch'
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
[[watch-cluster-status]]
|
|
|
|
=== Watch Your Cluster Health
|
|
|
|
|
|
|
|
You can easily configure a basic watch to monitor the health of your Elasticsearch cluster:
|
|
|
|
|
|
|
|
* <<health-add-input, Schedule the watch and define an input>> that gets the cluster health status.
|
|
|
|
* <<health-add-condition, Add a condition>> that evaluates the health status to determine if action
|
|
|
|
is required.
|
|
|
|
* <<health-take-action, Take action>> if the cluster is RED.
|
|
|
|
|
|
|
|
[float]
|
|
|
|
[[health-add-input]]
|
|
|
|
==== Schedule the Watch and Add an Input
|
|
|
|
A watch <<trigger-schedule, schedule>> controls how often a watch is triggered. The watch
|
|
|
|
<<input, input>> gets the data that you want to evaluate.
|
|
|
|
|
|
|
|
The simplest way to define a schedule is to specify an interval. For example, the following
|
|
|
|
schedule runs every 10 seconds:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
curl -XPUT 'http://localhost:9200/_watcher/watch/cluster_health_watch' -d '{
|
|
|
|
"trigger" : {
|
|
|
|
"schedule" : { "interval" : "10s" } <1>
|
|
|
|
}
|
|
|
|
}'
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
<1> Schedules are typically configured to run less frequently. This example sets the interval to
|
|
|
|
10 seconds to you can easily see the watches being triggered. Since this watch runs so frequently,
|
|
|
|
don't forget to <<health-delete, delete the watch>> when you're done experimenting.
|
|
|
|
|
|
|
|
To get the status of your cluster, you can call the Elasticsearch
|
|
|
|
{ref}//cluster-health.html[cluster health] API:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
curl -XGET 'http://localhost:9200/_cluster/health?pretty'
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
To load the health status into your watch, you simply add an <<input-http, HTTP input>> that calls
|
|
|
|
the cluster health API:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
curl -XPUT 'http://localhost:9200/_watcher/watch/cluster_health_watch' -d '{
|
|
|
|
"trigger" : {
|
|
|
|
"schedule" : { "interval" : "10s" }
|
|
|
|
},
|
|
|
|
"input" : {
|
|
|
|
"http" : {
|
|
|
|
"request" : {
|
|
|
|
"host" : "localhost",
|
|
|
|
"port" : 9200,
|
|
|
|
"path" : "/_cluster/health"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}'
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
If you check the watch history, you'll see that the cluster status is recorded as part of the
|
|
|
|
`watch_record` each time the watch executes.
|
|
|
|
|
|
|
|
For example, the following snippet gets the last ten watch records from the watch history:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------------------------------------
|
|
|
|
curl -XGET 'http://localhost:9200/.watch_history*/_search' -d '{
|
|
|
|
"sort" : [
|
|
|
|
{ "result.execution_time" : "desc" }
|
|
|
|
]
|
|
|
|
}'
|
|
|
|
--------------------------------------------------------------------------------
|
|
|
|
|
|
|
|
[float]
|
|
|
|
[[health-add-condition]]
|
|
|
|
==== Add a Condition
|
|
|
|
A <<condition, condition>> evaluates the data you've loaded into the watch and determines if any
|
|
|
|
action is required. Since you've defined an input that loads the cluster status into the watch,
|
|
|
|
you can define a condition that checks that status.
|
|
|
|
|
|
|
|
For example, you could add a condition to check to see if the status is RED.
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
curl -XPUT 'http://localhost:9200/_watcher/watch/cluster_health_watch' -d '{
|
|
|
|
"trigger" : {
|
|
|
|
"schedule" : { "interval" : "10s" } <1>
|
|
|
|
},
|
|
|
|
"input" : {
|
|
|
|
"http" : {
|
|
|
|
"request" : {
|
|
|
|
"host" : "localhost",
|
|
|
|
"port" : 9200,
|
|
|
|
"path" : "/_cluster/health"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"condition" : {
|
|
|
|
"compare" : {
|
|
|
|
"ctx.payload.status" : { "eq" : "red" }
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}'
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
<1> Schedules are typically configured to run less frequently. This example sets the interval to
|
|
|
|
10 seconds to you can easily see the watches being triggered.
|
|
|
|
|
|
|
|
If you check the watch history, you'll see that the condition result is recorded as part of the
|
|
|
|
`watch_record` each time the watch executes.
|
|
|
|
|
|
|
|
To check to see if the condition was met, you can run the following query.
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------------------------------------
|
|
|
|
curl -XGET 'http://localhost:9200/.watch_history*/_search?pretty' -d '{
|
|
|
|
"query" : {
|
|
|
|
"match" : { "result.condition.met" : true }
|
|
|
|
}
|
|
|
|
}'
|
|
|
|
--------------------------------------------------------------------------------
|
|
|
|
|
|
|
|
[float]
|
|
|
|
[[health-take-action]]
|
|
|
|
==== Take Action
|
|
|
|
|
|
|
|
Recording `watch_records` in the watch history is nice, but the real power of Watcher is being able
|
|
|
|
to do something in response to an alert. A watch's <<actions, actions>> define what to do when the
|
|
|
|
watch condition is true--you can send emails, call third-party webhooks, or write documents to an
|
|
|
|
Elasticsearch index or log when the watch condition is met.
|
|
|
|
|
|
|
|
For example, you could add an action to index the cluster status information when the status is RED.
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
curl -XPUT 'http://localhost:9200/_watcher/watch/cluster_health_watch' -d '{
|
|
|
|
"trigger" : {
|
|
|
|
"schedule" : { "interval" : "10s" }
|
|
|
|
},
|
|
|
|
"input" : {
|
|
|
|
"http" : {
|
|
|
|
"request" : {
|
|
|
|
"host" : "localhost",
|
|
|
|
"port" : 9200,
|
|
|
|
"path" : "/_cluster/health"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"condition" : {
|
|
|
|
"compare" : {
|
|
|
|
"ctx.payload.status" : { "eq" : "red" }
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"actions" : {
|
|
|
|
"send_email" : {
|
|
|
|
"email" : {
|
|
|
|
"to" : "<username>@<domainname>",
|
|
|
|
"subject" : "Cluster Status Warning",
|
|
|
|
"body" : "Cluster status is RED"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}'
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
For Watcher to send email, you must configure an email account in your `elasticsearch.yml`
|
|
|
|
configuration file and restart Elasticsearch. To add an email account, set the
|
|
|
|
`watcher.actions.email.service.account` property.
|
|
|
|
|
|
|
|
For example, the following snippet configures a single Gmail account named `work`.
|
|
|
|
|
|
|
|
[source,shell]
|
|
|
|
----------------------------------------------------------
|
|
|
|
watcher.actions.email.service.account:
|
|
|
|
work:
|
|
|
|
profile: gmail
|
|
|
|
email_defaults:
|
|
|
|
from: <email> <1>
|
|
|
|
smtp:
|
|
|
|
auth: true
|
|
|
|
starttls.enable: true
|
|
|
|
host: smtp.gmail.com
|
|
|
|
port: 587
|
|
|
|
user: <username> <2>
|
|
|
|
password: <password> <3>
|
|
|
|
|
|
|
|
----------------------------------------------------------
|
|
|
|
|
|
|
|
<1> Replace `<email>` with the email address from which you want to send notifications.
|
|
|
|
<2> Replace `<username>` with your Gmail user name (typically your Gmail address).
|
|
|
|
<3> Replace `<password>` with your Gmail password.
|
|
|
|
|
|
|
|
NOTE: If you have advanced security options enabled for your email account, you need to take
|
|
|
|
additional steps to send email from Watcher. For more information, see
|
|
|
|
<<email-services, Working with Various Email Services>>.
|
|
|
|
|
|
|
|
You can check the watch history or the `status_index` to see that the action was performed.
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------------------------------------
|
|
|
|
curl -XGET 'http://localhost:9200/.watch_history*/_search?pretty' -d '{
|
|
|
|
"query" : {
|
|
|
|
"match" : { "result.condition.met" : true }
|
|
|
|
}
|
|
|
|
}'
|
|
|
|
--------------------------------------------------------------------------------
|
|
|
|
|
|
|
|
[float]
|
|
|
|
[[health-delete]]
|
|
|
|
==== Delete the Watch
|
|
|
|
|
|
|
|
Since the `cluster_health_watch` is configured to run every 10 seconds, make sure you delete it
|
|
|
|
when you're done experimenting. Otherwise, you'll spam yourself indefinitely.
|
|
|
|
|
|
|
|
To remove the watch, use the <<api-rest-delete-watch, DELETE watch>> API:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------------------------------------
|
|
|
|
curl -XDELETE 'http://localhost:9200/_watcher/watch/cluster_health_watch'
|
|
|
|
--------------------------------------------------------------------------------
|