SOLR-12163: Updated and expanded ZK ensemble docs

2018-04-19 09:50:09 -05:00 · 2018-04-19 09:50:09 -05:00 · 42da6f795d
parent aab2c770c6
commit 42da6f795d
1 changed files with 260 additions and 91 deletions
--- a/solr/solr-ref-guide/src/setting-up-an-external-zookeeper-ensemble.adoc
+++ b/solr/solr-ref-guide/src/setting-up-an-external-zookeeper-ensemble.adoc
@ -16,25 +16,31 @@
 // specific language governing permissions and limitations
 // under the License.
-Although Solr comes bundled with http://zookeeper.apache.org[Apache ZooKeeper], you should consider yourself discouraged from using this internal ZooKeeper in production.
+Although Solr comes bundled with http://zookeeper.apache.org[Apache ZooKeeper], you are strongly encouraged to use an external ZooKeeper setup in production.
-Shutting down a redundant Solr instance will also shut down its ZooKeeper server, which might not be quite so redundant. Because a ZooKeeper ensemble must have a quorum of more than half its servers running at any given time, this can be a problem.
+While using Solr's embedded ZooKeeper instance is fine for getting started, you shouldn't use this in production because it does not provide any failover: if the Solr instance that hosts ZooKeeper shuts down, ZooKeeper is also shut down.
 Any shards or Solr instances that rely on it will not be able to communicate with it or each other.
-The solution to this problem is to set up an external ZooKeeper ensemble. Fortunately, while this process can seem intimidating due to the number of powerful options, setting up a simple ensemble is actually quite straightforward, as described below.
+The solution to this problem is to set up an external ZooKeeper _ensemble_, which is a number of servers running ZooKeeper that communicate with each other to coordinate the activities of the cluster.
 == How Many ZooKeeper Nodes?
 The first question to answer is the number of ZooKeeper nodes you will run in your ensemble.
 When planning how many ZooKeeper nodes to configure, keep in mind that the main principle for a ZooKeeper ensemble is maintaining a majority of servers to serve requests. This majority is called a _quorum_.
 .How Many ZooKeepers?
 [quote,ZooKeeper Administrator's Guide,http://zookeeper.apache.org/doc/r{ivy-zookeeper-version}/zookeeperAdmin.html]
 ____
-"For a ZooKeeper service to be active, there must be a majority of non-failing machines that can communicate with each other. *To create a deployment that can tolerate the failure of F machines, you should count on deploying 2xF+1 machines*. Thus, a deployment that consists of three machines can handle one failure, and a deployment of five machines can handle two failures. Note that a deployment of six machines can only handle two failures since three machines is not a majority.
+"For a ZooKeeper service to be active, there must be a majority of non-failing machines that can communicate with each other. *To create a deployment that can tolerate the failure of F machines, you should count on deploying 2xF+1 machines*.
 For this reason, ZooKeeper deployments are usually made up of an odd number of machines."
 ____
-When planning how many ZooKeeper nodes to configure, keep in mind that the main principle for a ZooKeeper ensemble is maintaining a majority of servers to serve requests. This majority is also called a _quorum_.
+To properly maintain a quorum, it's highly recommended to have an odd number of ZooKeeper servers in your ensemble, so a majority is maintained.
-It is generally recommended to have an odd number of ZooKeeper servers in your ensemble, so a majority is maintained.
+To explain why, think about this scenario: If you have two ZooKeeper nodes and one goes down, this means only 50% of available servers are available. Since this is not a majority, ZooKeeper will no longer serve requests.
-For example, if you only have two ZooKeeper nodes and one goes down, 50% of available servers is not a majority, so ZooKeeper will no longer serve requests. However, if you have three ZooKeeper nodes and one goes down, you have 66% of available servers available, and ZooKeeper will continue normally while you repair the one down node. If you have 5 nodes, you could continue operating with two down nodes if necessary.
+However, if you have three ZooKeeper nodes and one goes down, you have 66% of your servers available and ZooKeeper will continue normally while you repair the one down node. If you have 5 nodes, you could continue operating with two down nodes if necessary.
 It's not generally recommended to go above 5 nodes. While it may seem that more nodes provide greater fault-tolerance and availability, in practice it becomes less efficient because of the amount of inter-node coordination that occurs. Unless you have a truly massive Solr cluster (on the scale of 1,000s of nodes), try to stay to 3 as a general rule, or maybe 5 if you have a larger cluster.
 More information on ZooKeeper clusters is available from the ZooKeeper documentation at http://zookeeper.apache.org/doc/r{ivy-zookeeper-version}/zookeeperAdmin.html#sc_zkMulitServerSetup.
@ -42,22 +48,37 @@ More information on ZooKeeper clusters is available from the ZooKeeper documenta
 The first step in setting up Apache ZooKeeper is, of course, to download the software. It's available from http://zookeeper.apache.org/releases.html.
 [IMPORTANT]
 ====
 When using stand-alone ZooKeeper, you need to take care to keep your version of ZooKeeper updated with the latest version distributed with Solr. Since you are using it as a stand-alone application, it does not get upgraded when you upgrade Solr.
 Solr currently uses Apache ZooKeeper v{ivy-zookeeper-version}.
 [WARNING]
 ====
 When using an external ZooKeeper ensemble, you will need need to keep your local installation up-to-date with the latest version distributed with Solr. Since it is a stand-alone application in this scenario, it does not get upgraded as part of a standard Solr upgrade.
 ====
-== Setting Up a Single ZooKeeper
+== ZooKeeper Installation
-=== Create the Instance
+Installation consists of extracting the files into a specific target directory where you'd like to have ZooKeeper store its internal data. The actual directory itself doesn't matter, as long as you know where it is.
 Creating the instance is a simple matter of extracting the files into a specific target directory. The actual directory itself doesn't matter, as long as you know where it is, and where you'd like to have ZooKeeper store its internal data.
-=== Configure the Instance
+The command to unpack the ZooKeeper package is:
 The next step is to configure your ZooKeeper instance. To do that, create the following file: `<ZOOKEEPER_HOME>/conf/zoo.cfg`. To this file, add the following information:
-[source,bash]
+[source,bash,subs="attributes"]
 tar xvf zookeeper-{ivy-zookeeper-version}.tar.gz
 This location is the `<ZOOKEEPER_HOME>` for ZooKeeper on this server.
 Installing and unpacking ZooKeeper must be repeated on each server where ZooKeeper will be run.
 == Configuration for a ZooKeeper Ensemble
 After installation, we'll first take a look at the basic configuration for ZooKeeper, then specific parameters for configuring each node to be part of an ensemble.
 === Initial Configuration
 To configure your ZooKeeper instance, create a file named `<ZOOKEEPER_HOME>/conf/zoo.cfg`. A sample configuration file is included in your ZooKeeper installation, as `conf/zoo_sample.cfg`. You can edit and rename that file instead of creating it new if you prefer.
 The file should have the following information to start:
 [source,properties]
 ----
 tickTime=2000
 dataDir=/var/lib/zookeeper
@ -66,122 +87,270 @@ clientPort=2181
 The parameters are as follows:
-`tickTime`:: Part of what ZooKeeper does is to determine which servers are up and running at any given time, and the minimum session time out is defined as two "ticks". The `tickTime` parameter specifies, in miliseconds, how long each tick should be.
+`tickTime`:: Part of what ZooKeeper does is determine which servers are up and running at any given time, and the minimum session time out is defined as two "ticks". The `tickTime` parameter specifies in milliseconds how long each tick should be.
-`dataDir`:: This is the directory in which ZooKeeper will store data about the cluster. This directory should start out empty.
+`dataDir`:: This is the directory in which ZooKeeper will store data about the cluster. This directory must be empty before starting ZooKeeper for the first time.
 `clientPort`:: This is the port on which Solr will access ZooKeeper.
-Once this file is in place, you're ready to start the ZooKeeper instance.
+These are the basic parameters that need to be in use on each ZooKeeper node, so this file must be copied to or created on each node.
-=== Run the Instance
+Next we'll customize this configuration to work within an ensemble.
-To run the instance, you can simply use the `ZOOKEEPER_HOME/bin/zkServer.sh` script provided, as with this command: `zkServer.sh start`
+=== Ensemble Configuration
-Again, ZooKeeper provides a great deal of power through additional configurations, but delving into them is beyond the scope of this tutorial. For more information, see the ZooKeeper http://zookeeper.apache.org/doc/r{ivy-zookeeper-version}/zookeeperStarted.html[Getting Started] page. For this example, however, the defaults are fine.
+To complete configuration for an ensemble, we need to set additional parameters so each node knows who it is in the ensemble and where every other node is.
-=== Point Solr at the Instance
+Each of the examples below assume you are installing ZooKeeper on different servers with different hostnames.
-Pointing Solr at the ZooKeeper instance you've created is a simple matter of using the `-z` parameter when using the bin/solr script. For example, in order to point the Solr instance to the ZooKeeper you've started on port 2181, this is what you'd need to do:
+Once complete, your `zoo.cfg` file might look like this:
-Starting `cloud` example with ZooKeeper already running at port 2181 (with all other defaults):
+[source,properties]
 ----
 tickTime=2000
 dataDir=/var/lib/zookeeper
 clientPort=2181
 initLimit=5
 syncLimit=2
 server.1=zoo1:2888:3888
 server.2=zoo2:2888:3888
 server.3=zoo3:2888:3888
 autopurge.snapRetainCount=3
 autopurge.purgeInterval=1
 ----
 We've added these parameters to the three we had already:
 `initLimit`:: Amount of time, in ticks, to allow followers to connect and sync to a leader. In this case, you have 5 ticks, each of which is 2000 milliseconds long, so the server will wait as long as 10 seconds to connect and sync with the leader.
 `syncLimit`:: Amount of time, in ticks, to allow followers to sync with ZooKeeper. If followers fall too far behind a leader, they will be dropped.
 `server._X_`:: These are the server IDs (the `_X_` part), hostnames (or IP addresses) and ports for all servers in the ensemble. The IDs differentiate each node of the ensemble, and allow each node to know where each of the other node is located. The ports can be any ports you choose; ZooKeeper's default ports are `2888:3888`.
 +
 Since we've assigned server IDs to specific hosts/ports, we must also define which server in the list this node is. We do this with a `myid` file stored in the data directory (defined by the `dataDir` parameter). The contents of the `myid` file is only the server ID.
 +
 In the case of the configuration example above, you would create the file `/var/lib/zookeeperdata/1/myid` with the content "1" (without quotes), as in this example:
 +
 [source,bash]
 1
 `autopurge.snapRetainCount`:: The number of snapshots and corresponding transaction logs to retain when purging old snapshots and transaction logs.
 +
 ZooKeeper automatically keeps a transaction log and writes to it as changes are made. A snapshot of the current state is taken periodically, and this snapshot supersedes transaction logs older than the snapshot. However, ZooKeeper never cleans up neither the old snapshots nor the old transaction logs; over time they will silently fill available disk space on each server.
 +
 To avoid this, set the `autopurge.snapRetainCount` and `autopurge.purgeInterval` parameters to enable an automatic clean up (purge) to occur at regular intervals. The `autopurge.snapRetainCount` parameter will keep the set number of snapshots and transaction logs when a clean up occurs. This parameter can be configured higher than `3`, but cannot be set lower than 3.
 `autopurge.purgeInterval`:: The time in hours between purge tasks. The default for this parameter is `0`, so must be set to `1` or higher to enable automatic clean up of snapshots and transaction logs. Setting it as high as `24`, for once a day, is acceptable if preferred.
 We'll repeat this configuration on each node.
 On the second node, update `<ZOOKEEPER_HOME>/conf/zoo.cfg` file so it matches the content on node 1 (particularly the server hosts and ports):
 [source,properties]
 ----
 tickTime=2000
 dataDir=/var/lib/zookeeper
 clientPort=2181
 initLimit=5
 syncLimit=2
 server.1=zoo1:2888:3888
 server.2=zoo2:2888:3888
 server.3=zoo3:2888:3888
 autopurge.snapRetainCount=3
 autopurge.purgeInterval=1
 ----
 On the second node, create a `myid` file with the contents "2", and put it in the `/var/lib/zookeeper` directory.
 [source,bash]
 2
 On the third node, update `<ZOOKEEPER_HOME>/conf/zoo.cfg` file so it matches the content on nodes 1 and 2 (particularly the server hosts and ports):
 [source,properties]
 ----
-bin/solr start -e cloud -z localhost:2181 -noprompt
+tickTime=2000
 dataDir=/var/lib/zookeeper
 clientPort=2181
 initLimit=5
 syncLimit=2
 server.1=zoo1:2888:3888
 server.2=zoo2:2888:3888
 server.3=zoo3:2888:3888
 autopurge.snapRetainCount=3
 autopurge.purgeInterval=1
 ----
-Add a node pointing to an existing ZooKeeper at port 2181:
+And create the `myid` file in the `/var/lib/zookeeper` directory:
 [source,bash]
----
+3
 bin/solr start -cloud -s <path to solr home for new node> -p 8987 -z localhost:2181
 ----
-NOTE: When you are not using an example to start solr, make sure you upload the configuration set to ZooKeeper before creating the collection.
+Repeat this for servers 4 and 5 if you are creating a 5-node ensemble (a rare case).
 ===  ZooKeeper Environment Configuration
 To ease troubleshooting in case of problems with the ensemble later, it's recommended to run ZooKeeper with logging enabled and with proper JVM garbage collection (GC) settings.
 . Create a file named `zookeeper-env.sh` and put it in the `ZOOKEEPER_HOME/conf` directory (the same place you put `zoo.cfg`). This file will need to exist on each server of the ensemble.
 . Add the following settings to the file:
 +
 [source,properties]
 ----
 ZOO_LOG_DIR="/path/for/log/files"
 ZOO_LOG4J_PROP="INFO,ROLLINGFILE"
 SERVER_JVMFLAGS="-Xms2048m -Xmx2048m -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:$ZOO_LOG_DIR/zookeeper_gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M"
 ----
 +
 The property `ZOO_LOG_DIR` defines the location on the server where ZooKeeper will print its logs. `ZOO_LOG4J_PROP` sets the logging level and log appenders.
 +
 With `SERVER_JVMFLAGS`, we've defined several parameters for garbage collection and logging GC-related events. One of the system parameters is `-Xloggc:$ZOO_LOG_DIR/zookeeper_gc.log`, which will put the garbage collection logs in the same directory we've defined for ZooKeeper logs, in a file named `zookeeper_gc.log`.
 . Review the default settings in `ZOOKEEPER_HOME/conf/log4j.properties`, especially the `log4j.appender.ROLLINGFILE.MaxFileSize` parameter. This sets the size at which log files will be rolled over, and by default it is 10MB.
 . Copy `zookeeper-env.sh` and any changes to `log4j.properties` to each server in the ensemble.
 NOTE: The above instructions are for Linux servers only. The default `zkServer.sh` script includes support for a `zookeeper-env.sh` file but the Windows version of the script, `zkServer.cmd`, does not. To make the same configuration on a Windows server, the changes would need to be made directly in the `zkServer.cmd`.
 At this point, you are ready to start your ZooKeeper ensemble.
 === More Information about ZooKeeper
 ZooKeeper provides a great deal of power through additional configurations, but delving into them is beyond the scope of Solr's documentation. For more information, see the  http://zookeeper.apache.org/doc/r{ivy-zookeeper-version}[ZooKeeper documentation].
 == Starting and Stopping ZooKeeper
 === Start ZooKeeper
 To start the ensemble, use the `ZOOKEEPER_HOME/bin/zkServer.sh` or `zkServer.cmd` script, as with this command:
 .Linux OS
 [source,bash]
 zkServer.sh start
 .Windows OS
 [source,text]
 zkServer.cmd start
 This command needs to be run on each server that will run ZooKeeper.
 TIP: You should see the ZooKeeper logs in the directory where you defined to store them. However, immediately after startup, you may not see the `zookeeper_gc.log` yet, as it likely will not appear until garbage collection has happened the first time.
 === Shut Down ZooKeeper
-To shut down ZooKeeper, use the zkServer script with the "stop" command: `zkServer.sh stop`.
+To shut down ZooKeeper, use the same `zkServer.sh` or `zkServer.cmd` script on each server with the "stop" command:
-== Setting up a ZooKeeper Ensemble
+.Linux OS
 [source,bash]
 zkServer.sh stop
-With an external ZooKeeper ensemble, you need to set things up just a little more carefully as compared to the Getting Started example.
+.Windows OS
 [source,text]
 zkServer.cmd stop
-The difference is that rather than simply starting up the servers, you need to configure them to know about and talk to each other first. So your original `zoo.cfg` file might look like this:
+== Solr Configuration
 When starting Solr, you must provide an address for ZooKeeper or Solr won't know how to use it. This can be done in two ways: by defining the _connect string_, a list of servers where ZooKeeper is running, at every startup on every node of the Solr cluster, or by editing Solr's include file as a permanent system parameter. Both approaches are described below.
 When referring to the location of ZooKeeper within Solr, it's best to use the addresses of all the servers in the ensemble. If one happens to be down, Solr will automatically be able to send its request to another server in the list.
 === Using a chroot
 If your ensemble is or will be shared among other systems besides Solr, you should consider defining application-specific _znodes_, or a hierarchical namespace that will only include Solr's files.
 Once you create a znode for each application, you add it's name, also called a _chroot_, to the end of your connect string whenever you tell Solr where to access ZooKeeper.
 Creating a chroot is done with a `bin/solr` command:
 [source,text]
 bin/solr zk mkroot /solr -z zk1:2181,zk2:2181,zk3:2181
 See the section <<solr-control-script-reference.adoc#create-a-znode-supports-chroot,Create a znode>> for more examples of this command.
 Once the znode is created, it behaves in a similar way to a directory on a filesystem: the data stored by Solr in ZooKeeper is nested beneath the main data directory and won't be mixed with data from another system or process that uses the same ZooKeeper ensemble.
 === Using the -z Parameter with bin/solr
 Pointing Solr at the ZooKeeper instance you've created is a simple matter of using the `-z` parameter when using the `bin/solr` script.
 For example, to point the Solr instance to the ZooKeeper you've started on port 2181 on three servers, this is what you'd need to do:
 [source,bash]
 ----
-dataDir=/var/lib/zookeeperdata/1
+bin/solr start -e cloud -z zk1:2181,zk2:2181,zk3:2181/solr
 clientPort=2181
 initLimit=5
 syncLimit=2
 server.1=localhost:2888:3888
 server.2=localhost:2889:3889
 server.3=localhost:2890:3890
 ----
-Here you see three new parameters:
+=== Updating Solr's Include Files
-initLimit:: Amount of time, in ticks, to allow followers to connect and sync to a leader. In this case, you have 5 ticks, each of which is 2000 milliseconds long, so the server will wait as long as 10 seconds to connect and sync with the leader.
+If you update Solr's include file (`solr.in.sh` or `solr.in.cmd`), which overrides defaults used with `bin/solr`, you will not have to use the `-z` parameter with `bin/solr` commands.
 syncLimit:: Amount of time, in ticks, to allow followers to sync with ZooKeeper. If followers fall too far behind a leader, they will be dropped.
-server.X:: These are the IDs and locations of all servers in the ensemble, the ports on which they communicate with each other. The server ID must additionally stored in the `<dataDir>/myid` file and be located in the `dataDir` of each ZooKeeper instance. The ID identifies each server, so in the case of this first instance, you would create the file `/var/lib/zookeeperdata/1/myid` with the content "1".
+[.dynamic-tabs]
 --
 [example.tab-pane#linux1]
 ====
 [.tab-label]*Linux: solr.in.sh*
-Now, whereas with Solr you need to create entirely new directories to run multiple instances, all you need for a new ZooKeeper instance, even if it's on the same machine for testing purposes, is a new configuration file. To complete the example you'll create two more configuration files.
+The section to look for will be commented out:
-The `<ZOOKEEPER_HOME>/conf/zoo2.cfg` file should have the content:
+[source,properties]
 [source,bash]
 ----
-tickTime=2000
+# Set the ZooKeeper connection string if using an external ZooKeeper ensemble
-dataDir=/var/lib/zookeeperdata/2
+# e.g. host1:2181,host2:2181/chroot
-clientPort=2182
+# Leave empty if not using SolrCloud
-initLimit=5
+#ZK_HOST=""
 syncLimit=2
 server.1=localhost:2888:3888
 server.2=localhost:2889:3889
 server.3=localhost:2890:3890
 ----
-You'll also need to create `<ZOOKEEPER_HOME>/conf/zoo3.cfg`:
+Remove the comment marks at the start of the line and enter the ZooKeeper connect string:
-[source,bash]
+[source,properties]
 ----
-tickTime=2000
+# Set the ZooKeeper connection string if using an external ZooKeeper ensemble
-dataDir=/var/lib/zookeeperdata/3
+# e.g. host1:2181,host2:2181/chroot
-clientPort=2183
+# Leave empty if not using SolrCloud
-initLimit=5
+ZK_HOST="zk1:2181,zk2:2181,zk3:2181/solr"
-syncLimit=2
+----
-server.1=localhost:2888:3888
+====
-server.2=localhost:2889:3889
+
-server.3=localhost:2890:3890
+[example.tab-pane#zkwindows]
 ====
 [.tab-label]*Windows: solr.in.cmd*
 The section to look for will be commented out:
 [source,bat]
 ----
 REM Set the ZooKeeper connection string if using an external ZooKeeper ensemble
 REM e.g. host1:2181,host2:2181/chroot
 REM Leave empty if not using SolrCloud
 REM set ZK_HOST=
 ----
-Finally, create your `myid` files in each of the `dataDir` directories so that each server knows which instance it is. The id in the `myid` file on each machine must match the "server.X" definition. So, the ZooKeeper instance (or machine) named "server.1" in the above example, must have a `myid` file containing the value "1". The `myid` file can be any integer between 1 and 255, and must match the server IDs assigned in the `zoo.cfg` file.
+Remove the comment marks at the start of the line and enter the ZooKeeper connect string:
-To start the servers, you can simply explicitly reference the configuration files:
+[source,bat]
 ----
 REM Set the ZooKeeper connection string if using an external ZooKeeper ensemble
 REM e.g. host1:2181,host2:2181/chroot
 REM Leave empty if not using SolrCloud
 set ZK_HOST=zk1:2181,zk2:2181,zk3:2181/solr
 ----
 ====
 --
-[source,bash]
+Now you will not have to enter the connection string when starting Solr.
 ----
 cd <ZOOKEEPER_HOME>
 bin/zkServer.sh start zoo.cfg
 bin/zkServer.sh start zoo2.cfg
 bin/zkServer.sh start zoo3.cfg
 ----
 Once these servers are running, you can reference them from Solr just as you did before:
 [source,bash]
 ----
 bin/solr start -e cloud -z localhost:2181,localhost:2182,localhost:2183 -noprompt
 ----
 == Securing the ZooKeeper Connection
 You may also want to secure the communication between ZooKeeper and Solr.
-To setup ACL protection of znodes, see <<zookeeper-access-control.adoc#zookeeper-access-control,ZooKeeper Access Control>>.
+To setup ACL protection of znodes, see the section <<zookeeper-access-control.adoc#zookeeper-access-control,ZooKeeper Access Control>>.
 For more information on getting the most power from your ZooKeeper installation, check out the http://zookeeper.apache.org/doc/r{ivy-zookeeper-version}/zookeeperAdmin.html[ZooKeeper Administrator's Guide].