diff --git a/src/docbkx/configuration.xml b/src/docbkx/configuration.xml index 480634bebaf..5e7b233785f 100644 --- a/src/docbkx/configuration.xml +++ b/src/docbkx/configuration.xml @@ -941,6 +941,8 @@ index e70ebc6..96f8c27 100644
Recommended Configurations +
+ ZooKeeper Configuration
<varname>zookeeper.session.timeout</varname> The default timeout is three minutes (specified in milliseconds). This means that if a server crashes, it will be three minutes before the Master notices @@ -966,6 +968,18 @@ index e70ebc6..96f8c27 100644
Number of ZooKeeper Instances See . +
+
+
+ HDFS Configurations +
+ dfs.datanode.failed.volumes.tolerated + This is the "...number of volumes that are allowed to fail before a datanode stops offering service. By default + any volume failure will cause a datanode to shutdown" from the hdfs-default.xml + description. If you have > three or four disks, you might want to set this to 1 or if you have many disks, + two or more. + +
<varname>hbase.regionserver.handler.count</varname> diff --git a/src/docbkx/ops_mgt.xml b/src/docbkx/ops_mgt.xml index 623bfb6037c..16c19da8960 100644 --- a/src/docbkx/ops_mgt.xml +++ b/src/docbkx/ops_mgt.xml @@ -380,6 +380,20 @@ false +
+ Bad or Failing Disk + It is good having set if you have a decent number of disks + per machine for the case where a disk plain dies. But usually disks do the "John Wayne" -- i.e. take a while + to go down spewing errors in dmesg -- or for some reason, run much slower than their + companions. In this case you want to decommission the disk. You have two options. You can + decommission the datanode + or, less disruptive in that only the bad disks data will be rereplicated, is that you can stop the datanode, + unmount the bad volume (You can't umount a volume while the datanode is using it), and then restart the + datanode (presuming you have set dfs.datanode.failed.volumes.tolerated > 0). The regionserver will + throw some errors in its logs as it recalibrates where to get its data from -- it will likely + roll its WAL log too -- but in general but for some latency spikes, it should keep on chugging. + +
Rolling Restart