HBASE-20354 better docs for impact of proactively checking hsync support.
* Add to the quickstart guide disabling the hsync check, with a big warning about how we'll lose data if the check is disabled. * Add to troubleshooting section so folks searching can get a pointer. Signed-off-by: Mike Drob <mdrob@apache.org>
This commit is contained in:
parent
de57d08226
commit
8014c5c3ac
|
@ -82,7 +82,7 @@ JAVA_HOME=/usr
|
|||
+
|
||||
|
||||
. Edit _conf/hbase-site.xml_, which is the main HBase configuration file.
|
||||
At this time, you only need to specify the directory on the local filesystem where HBase and ZooKeeper write data.
|
||||
At this time, you need to specify the directory on the local filesystem where HBase and ZooKeeper write data and acknowledge some risks.
|
||||
By default, a new directory is created under /tmp.
|
||||
Many servers are configured to delete the contents of _/tmp_ upon reboot, so you should store the data elsewhere.
|
||||
The following configuration will store HBase's data in the _hbase_ directory, in the home directory of the user called `testuser`.
|
||||
|
@ -102,6 +102,21 @@ JAVA_HOME=/usr
|
|||
<name>hbase.zookeeper.property.dataDir</name>
|
||||
<value>/home/testuser/zookeeper</value>
|
||||
</property>
|
||||
<property>
|
||||
<name>hbase.unsafe.stream.capability.enforce</name>
|
||||
<value>false</value>
|
||||
<description>
|
||||
Controls whether HBase will check for stream capabilities (hflush/hsync).
|
||||
|
||||
Disable this if you intend to run on LocalFileSystem, denoted by a rootdir
|
||||
with the 'file://' scheme, but be mindful of the NOTE below.
|
||||
|
||||
WARNING: Setting this to false blinds you to potential data loss and
|
||||
inconsistent system state in the event of process and/or node failures. If
|
||||
HBase is complaining of an inability to use hsync or hflush it's most
|
||||
likely not a false positive.
|
||||
</description>
|
||||
</property>
|
||||
</configuration>
|
||||
----
|
||||
====
|
||||
|
@ -111,7 +126,14 @@ HBase will do this for you. If you create the directory,
|
|||
HBase will attempt to do a migration, which is not what you want.
|
||||
+
|
||||
NOTE: The _hbase.rootdir_ in the above example points to a directory
|
||||
in the _local filesystem_. The 'file:/' prefix is how we denote local filesystem.
|
||||
in the _local filesystem_. The 'file://' prefix is how we denote local
|
||||
filesystem. You should take the WARNING present in the configuration example
|
||||
to heart. In standalone mode HBase makes use of the local filesystem abstraction
|
||||
from the Apache Hadoop project. That abstraction doesn't provide the durability
|
||||
promises that HBase needs to operate safely. This is fine for local development
|
||||
and testing use cases where the cost of cluster failure is well contained. It is
|
||||
not appropriate for production deployments; eventually you will lose data.
|
||||
|
||||
To home HBase on an existing instance of HDFS, set the _hbase.rootdir_ to point at a
|
||||
directory up on your instance: e.g. _hdfs://namenode.example.org:8020/hbase_.
|
||||
For more on this variant, see the section below on Standalone HBase over HDFS.
|
||||
|
@ -163,7 +185,7 @@ hbase(main):001:0> create 'test', 'cf'
|
|||
|
||||
. List Information About your Table
|
||||
+
|
||||
Use the `list` command to
|
||||
Use the `list` command to confirm your table exists
|
||||
+
|
||||
----
|
||||
hbase(main):002:0> list 'test'
|
||||
|
@ -174,6 +196,22 @@ test
|
|||
=> ["test"]
|
||||
----
|
||||
|
||||
+
|
||||
Now use the `describe` command to see details, including configuration defaults
|
||||
+
|
||||
----
|
||||
hbase(main):003:0> describe 'test'
|
||||
Table test is ENABLED
|
||||
test
|
||||
COLUMN FAMILIES DESCRIPTION
|
||||
{NAME => 'cf', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE =>
|
||||
'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'f
|
||||
alse', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE
|
||||
=> '65536'}
|
||||
1 row(s)
|
||||
Took 0.9998 seconds
|
||||
----
|
||||
|
||||
. Put data into your table.
|
||||
+
|
||||
To put data into your table, use the `put` command.
|
||||
|
@ -314,7 +352,7 @@ First, add the following property which directs HBase to run in distributed mode
|
|||
----
|
||||
+
|
||||
Next, change the `hbase.rootdir` from the local filesystem to the address of your HDFS instance, using the `hdfs:////` URI syntax.
|
||||
In this example, HDFS is running on the localhost at port 8020.
|
||||
In this example, HDFS is running on the localhost at port 8020. Be sure to either remove the entry for `hbase.unsafe.stream.capability.enforce` or set it to true.
|
||||
+
|
||||
[source,xml]
|
||||
----
|
||||
|
|
|
@ -227,7 +227,7 @@ The table reference can be used to perform data read write operations such as pu
|
|||
For example, previously you would always specify a table name:
|
||||
|
||||
----
|
||||
hbase(main):000:0> create ‘t’, ‘f’
|
||||
hbase(main):000:0> create 't', 'f'
|
||||
0 row(s) in 1.0970 seconds
|
||||
hbase(main):001:0> put 't', 'rold', 'f', 'v'
|
||||
0 row(s) in 0.0080 seconds
|
||||
|
@ -291,7 +291,7 @@ hbase(main):012:0> tab = get_table 't'
|
|||
0 row(s) in 0.0010 seconds
|
||||
|
||||
=> Hbase::Table - t
|
||||
hbase(main):013:0> tab.put ‘r1’ ,’f’, ‘v’
|
||||
hbase(main):013:0> tab.put 'r1' ,'f', 'v'
|
||||
0 row(s) in 0.0100 seconds
|
||||
hbase(main):014:0> tab.scan
|
||||
ROW COLUMN+CELL
|
||||
|
@ -305,7 +305,7 @@ You can then use jruby to script table operations based on these names.
|
|||
The list_snapshots command also acts similarly.
|
||||
|
||||
----
|
||||
hbase(main):016 > tables = list(‘t.*’)
|
||||
hbase(main):016 > tables = list('t.*')
|
||||
TABLE
|
||||
t
|
||||
1 row(s) in 0.1040 seconds
|
||||
|
|
|
@ -941,6 +941,45 @@ java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
|
|||
\... then there is a path issue with the compression libraries.
|
||||
See the Configuration section on link:[LZO compression configuration].
|
||||
|
||||
[[trouble.rs.startup.hsync]]
|
||||
==== RegionServer aborts due to lack of hsync for filesystem
|
||||
|
||||
In order to provide data durability for writes to the cluster HBase relies on the ability to durably save state in a write ahead log. When using a version of Apache Hadoop Common's filesystem API that supports checking on the availability of needed calls, HBase will proactively abort the cluster if it finds it can't operate safely.
|
||||
|
||||
For RegionServer roles, the failure will show up in logs like this:
|
||||
|
||||
----
|
||||
2018-04-05 11:36:22,785 ERROR [regionserver/192.168.1.123:16020] wal.AsyncFSWALProvider: The RegionServer async write ahead log provider relies on the ability to call hflush and hsync for proper operation during component failures, but the current FileSystem does not support doing so. Please check the config value of 'hbase.wal.dir' and ensure it points to a FileSystem mount that has suitable capabilities for output streams.
|
||||
2018-04-05 11:36:22,799 ERROR [regionserver/192.168.1.123:16020] regionserver.HRegionServer: ***** ABORTING region server 192.168.1.123,16020,1522946074234: Unhandled: cannot get log writer *****
|
||||
java.io.IOException: cannot get log writer
|
||||
at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:112)
|
||||
at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:612)
|
||||
at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:124)
|
||||
at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:759)
|
||||
at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:489)
|
||||
at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.<init>(AsyncFSWAL.java:251)
|
||||
at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:69)
|
||||
at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:44)
|
||||
at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:138)
|
||||
at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:57)
|
||||
at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:252)
|
||||
at org.apache.hadoop.hbase.regionserver.HRegionServer.getWAL(HRegionServer.java:2105)
|
||||
at org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:1326)
|
||||
at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:1191)
|
||||
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1007)
|
||||
at java.lang.Thread.run(Thread.java:745)
|
||||
Caused by: org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: hflush and hsync
|
||||
at org.apache.hadoop.hbase.io.asyncfs.AsyncFSOutputHelper.createOutput(AsyncFSOutputHelper.java:69)
|
||||
at org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.initOutput(AsyncProtobufLogWriter.java:168)
|
||||
at org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:167)
|
||||
at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:99)
|
||||
... 15 more
|
||||
|
||||
----
|
||||
|
||||
If you are attempting to run in standalone mode and see this error, please walk back through the section <<quickstart>> and ensure you have included *all* the given configuration settings.
|
||||
|
||||
|
||||
[[trouble.rs.runtime]]
|
||||
=== Runtime Errors
|
||||
|
||||
|
@ -1127,6 +1166,29 @@ Sure fire solution is to just use Hadoop dfs to delete the HBase root and let HB
|
|||
|
||||
If you have many regions on your cluster and you see an error like that reported above in this sections title in your logs, see link:https://issues.apache.org/jira/browse/HBASE-4246[HBASE-4246 Cluster with too many regions cannot withstand some master failover scenarios].
|
||||
|
||||
[[trouble.master.startup.hsync]]
|
||||
==== Master fails to become active due to lack of hsync for filesystem
|
||||
|
||||
HBase's internal framework for cluster operations requires the ability to durably save state in a write ahead log. When using a version of Apache Hadoop Common's filesystem API that supports checking on the availability of needed calls, HBase will proactively abort the cluster if it finds it can't operate safely.
|
||||
|
||||
For Master roles, the failure will show up in logs like this:
|
||||
|
||||
----
|
||||
2018-04-05 11:18:44,653 ERROR [Thread-21] master.HMaster: Failed to become active master
|
||||
java.lang.IllegalStateException: The procedure WAL relies on the ability to hsync for proper operation during component failures, but the underlying filesystem does not support doing so. Please check the config value of 'hbase.procedure.store.wal.use.hsync' to set the desired level of robustness and ensure the config value of 'hbase.wal.dir' points to a FileSystem mount that can provide it.
|
||||
at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.rollWriter(WALProcedureStore.java:1034)
|
||||
at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.recoverLease(WALProcedureStore.java:374)
|
||||
at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.start(ProcedureExecutor.java:530)
|
||||
at org.apache.hadoop.hbase.master.HMaster.startProcedureExecutor(HMaster.java:1267)
|
||||
at org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:1173)
|
||||
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:881)
|
||||
at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2048)
|
||||
at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:568)
|
||||
at java.lang.Thread.run(Thread.java:745)
|
||||
----
|
||||
|
||||
If you are attempting to run in standalone mode and see this error, please walk back through the section <<quickstart>> and ensure you have included *all* the given configuration settings.
|
||||
|
||||
[[trouble.master.shutdown]]
|
||||
=== Shutdown Errors
|
||||
|
||||
|
|
Loading…
Reference in New Issue