SOLR-11165: Merge docs from issue branch with master for 7.1 editing

This commit is contained in:
Cassandra Targett 2017-09-26 14:10:02 -05:00
parent 424fb3bfb7
commit ee687c39a8
8 changed files with 444 additions and 6 deletions

View File

@ -0,0 +1,28 @@
= SolrCloud Autoscaling Actions
:page-shortname: solrcloud-autoscaling-actions
:page-permalink: solrcloud-autoscaling-actions.html
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
`TriggerAction` implementations process events generated by triggers in order to ensure the cluster's
health and good use of resources.
Currently two implementations are provided: `ComputePlanAction` and `ExecutePlanAction`.
== Compute plan action
== Execute plan action

View File

@ -20,11 +20,11 @@
// specific language governing permissions and limitations
// under the License.
The Autoscaling API is used to manage autoscaling policies and preferences, and to get diagnostics on the state of the cluster.
The Autoscaling API is used to manage autoscaling policies, preferences, triggers, listeners and to get diagnostics on the state of the cluster.
== Read API
The autoscaling Read API is available at `/admin/autoscaling` or `/v2/cluster/autoscaling`. It returns information about the configured cluster preferences, cluster policy and collection-specific policies.
The autoscaling Read API is available at `/admin/autoscaling` or `/v2/cluster/autoscaling`. It returns information about the configured cluster preferences, cluster policy, collection-specific policies triggers and listeners.
This API does not take any parameters.

View File

@ -0,0 +1,21 @@
= SolrCloud AutoScaling Automatically Adding Replicas
:page-shortname: solrcloud-autoscaling-auto-add-replicas
:page-permalink: solrcloud-autoscaling-auto-add-replicas.html
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
TODO

View File

@ -0,0 +1,60 @@
= SolrCloud Autoscaling Fault Tolerance
:page-shortname: solrcloud-autoscaling-fault-tolerance
:page-permalink: solrcloud-autoscaling-fault-tolerance.html
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
== Node added / lost markers
Since triggers execute on the node that runs Overseer, should this node go down the `nodeLost`
event would be lost because there would be no mechanism to generate it. Similarly, if a node has
been added between the Overseer leader change was completed the `nodeAdded` event would not be
generated.
For this reason Solr implements additional mechanisms to ensure that these events are generated
reliably.
When a node joins a cluster its presence is marked as an ephemeral ZK path in the `/live_nodes/<nodeName>`
ZooKeeper directory, but now also an ephemeral path is created under `/autoscaling/nodeAdded/<nodeName>`.
When a new instance of Overseer leader is started it will run the `nodeAdded` trigger (if it's configured)
and discover the presence of this ZK path, at which point it will remove it and generate a `nodeAdded` event.
When a node leaves the cluster up to three remaining nodes will try to create a persistent ZK path
`/autoscaling/nodeLost/<nodeName>` and eventually one of them succeeds. When a new instance of Overseer leader
is started it will run the `nodeLost` trigger (if it's configured) and discover the presence of this ZK
path, at which point it will remove it and generate a `nodeLost` event.
== Trigger state checkpointing
Triggers generate events based on their internal state. If Overseer leader goes down while the trigger is
about to generate a new event, it's likely that the event would be lost because a new trigger instance
running on the new Overseer leader would start from a clean slate.
For this reason after each time a trigger is executed its internal state is persisted to ZooKeeper, and
on Overseer start its internal state is restored.
== Trigger event queues
Autoscaling framework limits the rate at which events are processed using several different mechanisms.
One is the locking mechanism that prevents concurrent
processing of events, and another is a single-threaded executor that runs trigger actions.
This means that the processing of an event may take significant time, and during this time it's possible that
Overseer may go down. In order to avoid losing events that were already generated but not yet fully
processed events are queued before processing is started.
Separate ZooKeeper queues are created for each trigger, and events produced by triggers are put on these
per-trigger queues. When a new Overseer leader is started it will first check
these queues and process events accumulated there, and only then it will continue to run triggers
normally. Queued events that fail processing during this "replay" stage are discarded.

View File

@ -0,0 +1,179 @@
= SolrCloud AutoScaling Listeners
:page-shortname: solrcloud-autoscaling-listeners
:page-permalink: solrcloud-autoscaling-listeners.html
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
Trigger listener API allows users to provide additional behavior related to trigger events as they are being processed.
For example, users may want to record autoscaling events to an external system, or notify administrator when a
particular type of event occurs, or when its processing reaches certain stage (eg. failed).
Listener configuration always refers to a specific trigger configuration - listener is notified of
events generated by that specific trigger. Several (or none) named listeners can be registered for a trigger,
and they will be notified in the order in which they were defined.
Listener configuration can specify what processing stages are of interest - when an event enters this processing stage
the listener will be notified. Currently the following stages are recognized:
* STARTED - when event has been generated by a trigger and its processing is starting.
* ABORTED - when event was being processed while the source trigger closed.
* BEFORE_ACTION - when a `TriggerAction` is about to be invoked. Action name and the current `ActionContext` are passed to the listener.
* AFTER_ACTION - after a `TriggerAction` has been successfully invoked. Action name, `ActionContext` and the list of action
names invoked so far are passed to the listener.
* FAILED - when event processing failed (or when a `TriggerAction` failed)
* SUCCEEDED - when event processing completes successfully
Listener configuration can also specify what particular actions are of interest, both
before and/or after they are invoked.
== Listener configuration
Currently the following listener configuration properties are supported:
* `name` - (string, required) unique listener configuration name.
* `trigger` - (string, required) name of an existing trigger configuration.
* `class` - (string, required) listener implementation class name.
* `stage` - (list of strings, optional, ignored case) list of processing stages that
this listener should be notified. Default is empty list.
* `beforeAction` - (list of strings, optional) list of action names (as defined in trigger configuration) before
which the listener will be notified. Default is empty list.
* `afterAction` - (list of strings, optional) list of action names after which the listener will be notified.
Default is empty list.
* additional implementation-specific properties may be provided.
Note: when both `stage` and `beforeAction` / `afterAction` lists are non-empty then the listener will be notified both
when a specified stage is entered and before / after specified actions.
=== Managing listener configurations
Listener configurations can be managed using autoscaling Write API, and using `set-listener` and `remove-listener`
commands.
For example:
[source,json]
----
{
'set-listener': {
'name': 'foo',
'trigger': 'node_lost_trigger',
'stage': ['STARTED', 'ABORTED', 'SUCCEEDED', 'FAILED'],
'class': 'solr.SystemLogListener'
}
}
----
[source,json]
----
{
'remove-listener': {
'name': 'foo'
}
}
----
== Listener implementations
Trigger listeners must implement `TriggerListener` interface. Solr provides some
implementations of trigger listeners, which cover common use cases. These implementations are described in sections
below, together with their configuration parameters.
=== `SystemLogListener`
This trigger listener sends trigger events and processing context as documents for indexing in
SolrCloud `.system` collection.
Supported configuration properties:
* `collection` - (string, optional) specifies the target collection where documents are sent.
Default value is `.system`
* `enabled` - (boolean, optional) enables the listener when true. Default value is true.
Documents created by this listener have several predefined fields:
* `id` - time-based random id
* `type` - always set to `autoscaling_event`
* `source_s` - always set to `SystemLogListener`
* `timestamp` - current time when document was created
* `stage_s` - current stage of event processing
* `action_s` - current action name, if available
* `message_t` - optional additional message
* `error.message_t` - message from Throwable, if available
* `error.details_t` - stacktrace from Throwable, if available
* `before.actions_ss` - list of action names to be invoked so far
* `after.actions_ss` - list of action names that have been successfully invoked so far
* `event_str` - JSON representation of all event properties
* `context_str` - JSON representation of all `ActionContext` properties, if available
The following fields are created using the information from trigger event:
* `event.id_s` - event id
* `event.type_s` - event type
* `event.source_s` - event source (trigger name)
* `event.time_l` - Unix time when the event was created (may significantly differ from the time when it was actually
processed)
* `event.property.*` - additional fields that represent other arbitrary event properties. These fields use either
`_s` or `_ss` suffix depending on whether the property value is a collection (values inside collection are treated as
strings, there's no recursive flattening)
=== `HttpTriggerListener`
This listener uses HTTP POST to send a representation of event and context to a specified URL.
URL, payload and headers may contain property substitution patterns, which are then replaced with values takes from the
current event or context properties.
Templates use the same syntax as property substitution in Solr configuration files, eg.
`${foo.bar:baz}` means that the value of `foo.bar` property should be taken, and `baz` should be used
if the value is absent.
Supported configuration properties:
* `url` - (string, required) a URL template
* `payload` - (string, optional) payload template. If absent a JSON map of all properties listed above will be used.
* `contentType` - (string, optional) payload content type. If absent then application/json will be used.
* `header.*` - (string, optional) header template(s). The name of the property without "header." prefix defines the literal header name.
* `timeout` - (int, optional) connection and socket timeout in milliseconds. Default is 60 seconds.
* `followRedirects` - (boolean, optional) setting to follow redirects. Default is false.
The following properties are available in context and can be referenced from templates:
* `config.*` - listener configuration properties
* `event.*` - current event properties
* `stage` - current stage of event processing
* `actionName` - optional current action name
* `context.*` - optional ActionContext properties
* `error` - optional error string (from Throwable.toString())
* `message` - optional message
Example configuration:
[source,json]
----
{
'name': 'foo',
'trigger': 'node_added_trigger',
'class': 'solr.HttpTriggerListener',
'url': 'http://foo.com/${config.name:invalidName}/${config.properties.xyz:invalidXyz}/${event.eventType}',
'xyz': 'foobar',
'header.X-Trigger': '${config.trigger}',
'payload': 'actionName=${actionName}, source=${event.source}, type=${event.eventType}',
'contentType': 'text/plain',
'stage': ['STARTED', 'ABORTED', SUCCEEDED', 'FAILED'],
'beforeAction': ['compute_plan', 'execute_plan'],
'afterAction': ['compute_plan', 'execute_plan']
}
----
This configuration specifies that each time one of the listed stages is reached, or before and after each of the listed
actions is executed, the listener will send the templated payload to a URL that also depends on the config and the current event,
and with a custom header that indicates the trigger name.

View File

@ -20,7 +20,34 @@
// specific language governing permissions and limitations
// under the License.
Autoscaling in Solr aims to provide good defaults so a SolrCloud cluster remains balanced and stable in the face of various cluster change events. This balance is achieved by satisfying a set of rules and sorting preferences to select the target of cluster management operations.
Autoscaling in Solr aims to provide good defaults so a SolrCloud cluster remains balanced and stable in the face of various cluster change events. This balance is achieved by satisfying a set of rules and sorting preferences to select the target of cluster management operations automatically on cluster events.
A simple example is automatically adding a replica for a SolrCloud collection when a node containing an existing replica goes down.
The goal of autoscaling feature is to make SolrCloud cluster management easier, automatic and intelligent. It aims to provide good defaults such that the cluster remains balanced and stable in the face of various events such as a node joining the cluster or leaving the cluster. This is achieved by satisfying a set of rules and sorting preferences that help Solr select the target of cluster management operations.
There are three distinct problems that this feature solves:
* When to run cluster management tasks? e.g. we might want to add a replica when an existing replica is no longer alive.
* Which cluster management task to run? e.g. do we add a new replica or should we move an existing one to a new node
* How to run the cluster management tasks such that the cluster remains balanced and stable?
Before we get into the details of how each of these problems are solved, let's take a quick look at the easiest way to setup autoscaling for your cluster.
== QuickStart: Automatically adding replicas
Say that we want to create a collection which always requires us to have three replicas available for each shard all the time. We can set the replicationFactor=3 while creating the collection but what happens if a node containing one or more of the replicas either crashed or was shutdown for maintenance. In such a case, we'd like to create additional replicas to replace the ones that are no longer available to preserve the original number of replicas.
We have an easy way to enable this behavior without needing to understand the autoscaling feature in depth. We can create a collection with such behavior by adding an additional parameter `autoAddReplicas=true` to the create collection API. For example:
`/admin/collections?action=CREATE&name=_name_of_collection_&numShards=1&replicationFactor=3&autoAddReplicas=true`
A collection created with `autoAddReplicas=true` will be monitored by Solr such that if a node containing a replica of this collection goes down, Solr will add new replicas on other nodes after waiting for up to thirty seconds for the node to come back.
You can see the section __TODO_FIX_ME__ to learn more about how to enable or disable this feature as well as other details.
The selection of the node that will host the new replica is made according to the default cluster preferences that we will learn more about in the next sections.
== Cluster Preferences
@ -50,7 +77,21 @@ The above create collection command will associate a policy named `policy1` with
Note that the collection-specific policy is applied *in addition to* the cluster policy, i.e., it is not an override but an augmentation. Therefore the collection will follow all conditions laid out in the cluster preferences, cluster policy, and the policy named `policy1`.
You can learn more about collection-specific policies in the section <<solrcloud-autoscaling-policy-preferences.adoc#collection-specific-policy,Defining Collection-Specific Policies>>.
You can learn more about collection-specific policies in the section <<solrcloud-autoscaling-policy-preferences.adoc#defining-collection-specific-policies,Defining Collection-Specific Policies>>.
== Triggers
Now that we have an idea about how cluster management operations use policy and preferences help Solr keep the cluster balanced and stable, we can talk about when to invoke such operations. Triggers are used to watch for events such as a node joining or leaving the cluster. When the event happens, the trigger executes a set of `actions` that compute and execute a *plan* i.e. a set of operations to change the cluster so that the policy and preferences are respected.
The `autoAddReplicas` parameter passed to the create collection API in the quickstart section automatically creates a trigger that watches for a node going away. When the trigger fires, it computes and executes a plan to move all replicas hosted by the lost node to new nodes in the cluster. The target nodes are chosen based on the policy and preferences.
You can learn more about Triggers in the __TODO__ section.
== Listeners
An AutoScaling *Listener* is attached to a trigger. Solr calls the listener each time the trigger fires as well as before and after the actions performed by the trigger. Listeners are useful as a call back mechanism to perform tasks such as logging or informing external systems about events. For example, a listener is automatically added by Solr to each Trigger to log details of the trigger fire and actions to the `.system` collection.
You can learn more about Listeners in the __TODO__ section.
== Autoscaling APIs

View File

@ -0,0 +1,106 @@
= SolrCloud AutoScaling Triggers
:page-shortname: solrcloud-autoscaling-triggers
:page-permalink: solrcloud-autoscaling-triggers.html
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
Triggers are used by autoscaling API to watch for cluster events such as node joining or leaving,
and in the future also for other cluster, node and replica events that are important from the
point of view of cluster performance.
Trigger implementations verify the state of resources that they monitor. When they detect a
change that merits attention they generate events, which are then queued and processed by configured
`TriggerAction` implementations - this usually involves computing and executing a plan to manage the new cluster
resources (eg. move replicas). Solr provides predefined implementations of triggers for specific event types.
Triggers execute on the node that runs `Overseer`. They are scheduled to run periodically,
currently at fixed interval of 1s between each execution (not every execution produces events).
== Event types
Currently the following event types (and corresponding trigger implementations) are defined:
* `nodeAdded` - generated when a new node joins the cluster
* `nodeLost` - generated when a node leaves the cluster
Events are not necessarily generated immediately after the corresponding state change occurred - the
maximum rate of events is controlled by the `waitFor` configuration parameter (see below).
The following properties are common to all event types:
* `id` - (string) unique time-based event id.
* `eventType` - (string) event type.
* `source` - (string) name of the trigger that produced this event.
* `eventTime` - (long) Unix time when the condition that caused this event occurred. For example, for
`nodeAdded` event this will be the time when the node was added and not when the event was actually
generated, which may significantly differ due to the rate limits set by `waitFor`.
* `properties` - (map, optional) additional properties. Currently contains `nodeName` property that
indicates the node that was lost or added.
== `.autoAddReplicas` trigger
When a collection has a flag `autoAddReplicas` set to true then a trigger configuration named `.auto_add_replicas`
is automatically created to watch for nodes going away. This trigger produces `nodeLost` events,
which are then processed by configured actions (usually resulting in computing and executing a plan
to add replicas on the live nodes to maintain the expected replication factor).
== Trigger configuration
Trigger configurations are managed using autoscaling Write API with commands `set-trigger`, `remove-trigger`,
`suspend-trigger`, `resume-trigger`.
Trigger configuration consists of the following properties:
* `name` - (string, required) unique trigger configuration name.
* `event` - (string, required) one of predefined event types (nodeAdded, nodeLost).
* `actions` - (list of action configs, optional) ordered list of actions to execute when event is fired
* `waitFor` - (string, optional) time to wait between generating new events, as an integer number immediately followed
by unit symbol, one of "s" (seconds), "m" (minutes), or "h" (hours). Default is "0s".
* `enabled` - (boolean, optional) when true the trigger is enabled. Default is true.
* additional implementation-specific properties may be provided
Action configuration consists of the following properties:
* `name` - (string, required) unique name of the action configuration.
* `class` - (string, required) action implementation class
* additional implementation-specific properties may be provided
Example: adding / updating a trigger for `nodeAdded` events. This trigger configuration will
compute and execute a plan to allocate the resources available on the new node. A custom action
is also used to possibly modify the plan.
[source,json]
----
{
'set-trigger': {
'name' : 'node_lost_trigger',
'event' : 'nodeLost',
'waitFor' : '1s',
'enabled' : true,
'actions' : [
{
'name' : 'compute_plan',
'class': 'solr.ComputePlanAction'
},
{
'name' : 'custom_action',
'class': 'com.example.CustomAction'
},
{
'name' : 'execute_plan',
'class': 'solr.ExecutePlanAction'
}
]
}
}
----

View File

@ -1,7 +1,7 @@
= SolrCloud Autoscaling
= SolrCloud AutoScaling
:page-shortname: solrcloud-autoscaling
:page-permalink: solrcloud-autoscaling.html
:page-children: solrcloud-autoscaling-overview, solrcloud-autoscaling-policy-preferences, solrcloud-autoscaling-api
:page-children: solrcloud-autoscaling-overview, solrcloud-autoscaling-api, solrcloud-autoscaling-policy-preferences, solrcloud-autoscaling-triggers, solrcloud-autoscaling-listeners, solrcloud-autoscaling-auto-add-replicas
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
@ -29,3 +29,6 @@ The following sections describe the autoscaling features of SolrCloud:
* <<solrcloud-autoscaling-overview.adoc#solrcloud-autoscaling-overview,Overview of Autoscaling in SolrCloud>>
* <<solrcloud-autoscaling-api.adoc#solrcloud-autoscaling-api,SolrCloud Autoscaling API>>
* <<solrcloud-autoscaling-policy-preferences.adoc#solrcloud-autoscaling-policy-preferences,SolrCloud Autoscaling Policy and Preferences>>
* <<solrcloud-autoscaling-triggers.adoc#solrcloud-autoscaling-triggers,SolrCloud AutoScaling Triggers>>
* <<solrcloud-autoscaling-listeners.adoc#solrcloud-autoscaling-listeners,SolrCloud AutoScaling Listeners>>
* <<solrcloud-autoscaling-auto-add-replicas.adoc#solrcloud-autoscaling-auto-add-replicas,SolrCloud AutoScaling - Automatically Adding Replicas>>