HBASE-20354 better docs for impact of proactively checking hsync support.

* Add to the quickstart guide disabling the hsync check, with a
  big warning about how we'll lose data if the check is disabled.
* Add to troubleshooting section so folks searching can get a pointer.

Signed-off-by: Mike Drob <mdrob@apache.org>
This commit is contained in:
Sean Busbey 2018-04-05 11:52:00 -05:00
parent de57d08226
commit 8014c5c3ac
3 changed files with 107 additions and 7 deletions

View File

@ -82,7 +82,7 @@ JAVA_HOME=/usr
+ +
. Edit _conf/hbase-site.xml_, which is the main HBase configuration file. . Edit _conf/hbase-site.xml_, which is the main HBase configuration file.
At this time, you only need to specify the directory on the local filesystem where HBase and ZooKeeper write data. At this time, you need to specify the directory on the local filesystem where HBase and ZooKeeper write data and acknowledge some risks.
By default, a new directory is created under /tmp. By default, a new directory is created under /tmp.
Many servers are configured to delete the contents of _/tmp_ upon reboot, so you should store the data elsewhere. Many servers are configured to delete the contents of _/tmp_ upon reboot, so you should store the data elsewhere.
The following configuration will store HBase's data in the _hbase_ directory, in the home directory of the user called `testuser`. The following configuration will store HBase's data in the _hbase_ directory, in the home directory of the user called `testuser`.
@ -102,6 +102,21 @@ JAVA_HOME=/usr
<name>hbase.zookeeper.property.dataDir</name> <name>hbase.zookeeper.property.dataDir</name>
<value>/home/testuser/zookeeper</value> <value>/home/testuser/zookeeper</value>
</property> </property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
<description>
Controls whether HBase will check for stream capabilities (hflush/hsync).
Disable this if you intend to run on LocalFileSystem, denoted by a rootdir
with the 'file://' scheme, but be mindful of the NOTE below.
WARNING: Setting this to false blinds you to potential data loss and
inconsistent system state in the event of process and/or node failures. If
HBase is complaining of an inability to use hsync or hflush it's most
likely not a false positive.
</description>
</property>
</configuration> </configuration>
---- ----
==== ====
@ -111,7 +126,14 @@ HBase will do this for you. If you create the directory,
HBase will attempt to do a migration, which is not what you want. HBase will attempt to do a migration, which is not what you want.
+ +
NOTE: The _hbase.rootdir_ in the above example points to a directory NOTE: The _hbase.rootdir_ in the above example points to a directory
in the _local filesystem_. The 'file:/' prefix is how we denote local filesystem. in the _local filesystem_. The 'file://' prefix is how we denote local
filesystem. You should take the WARNING present in the configuration example
to heart. In standalone mode HBase makes use of the local filesystem abstraction
from the Apache Hadoop project. That abstraction doesn't provide the durability
promises that HBase needs to operate safely. This is fine for local development
and testing use cases where the cost of cluster failure is well contained. It is
not appropriate for production deployments; eventually you will lose data.
To home HBase on an existing instance of HDFS, set the _hbase.rootdir_ to point at a To home HBase on an existing instance of HDFS, set the _hbase.rootdir_ to point at a
directory up on your instance: e.g. _hdfs://namenode.example.org:8020/hbase_. directory up on your instance: e.g. _hdfs://namenode.example.org:8020/hbase_.
For more on this variant, see the section below on Standalone HBase over HDFS. For more on this variant, see the section below on Standalone HBase over HDFS.
@ -163,7 +185,7 @@ hbase(main):001:0> create 'test', 'cf'
. List Information About your Table . List Information About your Table
+ +
Use the `list` command to Use the `list` command to confirm your table exists
+ +
---- ----
hbase(main):002:0> list 'test' hbase(main):002:0> list 'test'
@ -174,6 +196,22 @@ test
=> ["test"] => ["test"]
---- ----
+
Now use the `describe` command to see details, including configuration defaults
+
----
hbase(main):003:0> describe 'test'
Table test is ENABLED
test
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE =>
'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'f
alse', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE
=> '65536'}
1 row(s)
Took 0.9998 seconds
----
. Put data into your table. . Put data into your table.
+ +
To put data into your table, use the `put` command. To put data into your table, use the `put` command.
@ -314,7 +352,7 @@ First, add the following property which directs HBase to run in distributed mode
---- ----
+ +
Next, change the `hbase.rootdir` from the local filesystem to the address of your HDFS instance, using the `hdfs:////` URI syntax. Next, change the `hbase.rootdir` from the local filesystem to the address of your HDFS instance, using the `hdfs:////` URI syntax.
In this example, HDFS is running on the localhost at port 8020. In this example, HDFS is running on the localhost at port 8020. Be sure to either remove the entry for `hbase.unsafe.stream.capability.enforce` or set it to true.
+ +
[source,xml] [source,xml]
---- ----

View File

@ -227,7 +227,7 @@ The table reference can be used to perform data read write operations such as pu
For example, previously you would always specify a table name: For example, previously you would always specify a table name:
---- ----
hbase(main):000:0> create t, f hbase(main):000:0> create 't', 'f'
0 row(s) in 1.0970 seconds 0 row(s) in 1.0970 seconds
hbase(main):001:0> put 't', 'rold', 'f', 'v' hbase(main):001:0> put 't', 'rold', 'f', 'v'
0 row(s) in 0.0080 seconds 0 row(s) in 0.0080 seconds
@ -291,7 +291,7 @@ hbase(main):012:0> tab = get_table 't'
0 row(s) in 0.0010 seconds 0 row(s) in 0.0010 seconds
=> Hbase::Table - t => Hbase::Table - t
hbase(main):013:0> tab.put r1 ,f, v hbase(main):013:0> tab.put 'r1' ,'f', 'v'
0 row(s) in 0.0100 seconds 0 row(s) in 0.0100 seconds
hbase(main):014:0> tab.scan hbase(main):014:0> tab.scan
ROW COLUMN+CELL ROW COLUMN+CELL
@ -305,7 +305,7 @@ You can then use jruby to script table operations based on these names.
The list_snapshots command also acts similarly. The list_snapshots command also acts similarly.
---- ----
hbase(main):016 > tables = list(t.*) hbase(main):016 > tables = list('t.*')
TABLE TABLE
t t
1 row(s) in 0.1040 seconds 1 row(s) in 0.1040 seconds

View File

@ -941,6 +941,45 @@ java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
\... then there is a path issue with the compression libraries. \... then there is a path issue with the compression libraries.
See the Configuration section on link:[LZO compression configuration]. See the Configuration section on link:[LZO compression configuration].
[[trouble.rs.startup.hsync]]
==== RegionServer aborts due to lack of hsync for filesystem
In order to provide data durability for writes to the cluster HBase relies on the ability to durably save state in a write ahead log. When using a version of Apache Hadoop Common's filesystem API that supports checking on the availability of needed calls, HBase will proactively abort the cluster if it finds it can't operate safely.
For RegionServer roles, the failure will show up in logs like this:
----
2018-04-05 11:36:22,785 ERROR [regionserver/192.168.1.123:16020] wal.AsyncFSWALProvider: The RegionServer async write ahead log provider relies on the ability to call hflush and hsync for proper operation during component failures, but the current FileSystem does not support doing so. Please check the config value of 'hbase.wal.dir' and ensure it points to a FileSystem mount that has suitable capabilities for output streams.
2018-04-05 11:36:22,799 ERROR [regionserver/192.168.1.123:16020] regionserver.HRegionServer: ***** ABORTING region server 192.168.1.123,16020,1522946074234: Unhandled: cannot get log writer *****
java.io.IOException: cannot get log writer
at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:112)
at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:612)
at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:124)
at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:759)
at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:489)
at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.<init>(AsyncFSWAL.java:251)
at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:69)
at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:44)
at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:138)
at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:57)
at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:252)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getWAL(HRegionServer.java:2105)
at org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:1326)
at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:1191)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1007)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: hflush and hsync
at org.apache.hadoop.hbase.io.asyncfs.AsyncFSOutputHelper.createOutput(AsyncFSOutputHelper.java:69)
at org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.initOutput(AsyncProtobufLogWriter.java:168)
at org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:167)
at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:99)
... 15 more
----
If you are attempting to run in standalone mode and see this error, please walk back through the section <<quickstart>> and ensure you have included *all* the given configuration settings.
[[trouble.rs.runtime]] [[trouble.rs.runtime]]
=== Runtime Errors === Runtime Errors
@ -1127,6 +1166,29 @@ Sure fire solution is to just use Hadoop dfs to delete the HBase root and let HB
If you have many regions on your cluster and you see an error like that reported above in this sections title in your logs, see link:https://issues.apache.org/jira/browse/HBASE-4246[HBASE-4246 Cluster with too many regions cannot withstand some master failover scenarios]. If you have many regions on your cluster and you see an error like that reported above in this sections title in your logs, see link:https://issues.apache.org/jira/browse/HBASE-4246[HBASE-4246 Cluster with too many regions cannot withstand some master failover scenarios].
[[trouble.master.startup.hsync]]
==== Master fails to become active due to lack of hsync for filesystem
HBase's internal framework for cluster operations requires the ability to durably save state in a write ahead log. When using a version of Apache Hadoop Common's filesystem API that supports checking on the availability of needed calls, HBase will proactively abort the cluster if it finds it can't operate safely.
For Master roles, the failure will show up in logs like this:
----
2018-04-05 11:18:44,653 ERROR [Thread-21] master.HMaster: Failed to become active master
java.lang.IllegalStateException: The procedure WAL relies on the ability to hsync for proper operation during component failures, but the underlying filesystem does not support doing so. Please check the config value of 'hbase.procedure.store.wal.use.hsync' to set the desired level of robustness and ensure the config value of 'hbase.wal.dir' points to a FileSystem mount that can provide it.
at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.rollWriter(WALProcedureStore.java:1034)
at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.recoverLease(WALProcedureStore.java:374)
at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.start(ProcedureExecutor.java:530)
at org.apache.hadoop.hbase.master.HMaster.startProcedureExecutor(HMaster.java:1267)
at org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:1173)
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:881)
at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2048)
at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:568)
at java.lang.Thread.run(Thread.java:745)
----
If you are attempting to run in standalone mode and see this error, please walk back through the section <<quickstart>> and ensure you have included *all* the given configuration settings.
[[trouble.master.shutdown]] [[trouble.master.shutdown]]
=== Shutdown Errors === Shutdown Errors