More on bad disk handling
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1417657 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
a3500ea1c8
commit
39a0f56d47
|
@ -387,11 +387,18 @@ false
|
|||
to go down spewing errors in <filename>dmesg</filename> -- or for some reason, run much slower than their
|
||||
companions. In this case you want to decommission the disk. You have two options. You can
|
||||
<xlink href="http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F">decommission the datanode</xlink>
|
||||
or, less disruptive in that only the bad disks data will be rereplicated, is that you can stop the datanode,
|
||||
or, less disruptive in that only the bad disks data will be rereplicated, can stop the datanode,
|
||||
unmount the bad volume (You can't umount a volume while the datanode is using it), and then restart the
|
||||
datanode (presuming you have set dfs.datanode.failed.volumes.tolerated > 0). The regionserver will
|
||||
throw some errors in its logs as it recalibrates where to get its data from -- it will likely
|
||||
roll its WAL log too -- but in general but for some latency spikes, it should keep on chugging.
|
||||
<note>
|
||||
<para>If you are doing short-circuit reads, you will have to move the regions off the regionserver
|
||||
before you stop the datanode; when short-circuiting reading, though chmod'd so regionserver cannot
|
||||
have access, because it already has the files open, it will be able to keep reading the file blocks
|
||||
from the bad disk even though the datanode is down. Move the regions back after you restart the
|
||||
datanode.</para>
|
||||
</note>
|
||||
</para>
|
||||
</section>
|
||||
</section>
|
||||
|
|
Loading…
Reference in New Issue