HBASE-24535: Tweak the master registry docs for branch-2 (#1890)

Updated to include changes in HBASE-24265 and some rewording
to make it version agnostic.

Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
This commit is contained in:
Bharath Vissapragada 2020-06-12 14:59:04 -07:00 committed by GitHub
parent 21fe873eba
commit fd5002d0da
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 37 additions and 17 deletions

View File

@ -261,8 +261,8 @@ For region name, we only accept `byte[]` as the parameter type and it may be a f
Information on non-Java clients and custom protocols is covered in <<external_apis>>
[[client.masterregistry]]
=== Master registry (new as of release 3.0.0)
=== Master Registry (new as of 2.3.0)
Client internally works with a _connection registry_ to fetch the metadata needed by connections.
This connection registry implementation is responsible for fetching the following metadata.
@ -271,18 +271,18 @@ This connection registry implementation is responsible for fetching the followin
* Cluster ID (unique to this cluster)
This information is needed as a part of various client operations like connection set up, scans,
gets etc. Up until releases 2.x.y, the default connection registry is based on ZooKeeper as the
source of truth and the the clients fetched the metadata from zookeeper znodes. As of release 3.0.0,
the default implementation for connection registry has been switched to a master based
implementation. With this change, the clients now fetch the required metadata from master RPC end
points directly. This change was done for the following reasons.
gets, etc. Traditionally, the connection registry implementation has been based on ZooKeeper as the
source of truth and clients fetched the metadata directly from the ZooKeeper quorum. HBase 2.3.0
introduces a new connection registry implementation based on direct communication with the Masters.
With this implementation, clients now fetch required metadata via master RPC end points instead of
maintaining connections to ZooKeeper. This change was done for the following reasons.
* Reduce load on ZooKeeper since that is critical for cluster operation.
* Holistic client timeout and retry configurations since the new registry brings all the client
operations under HBase rpc framework.
* Remove the ZooKeeper client dependency on HBase client library.
This means that
This means:
* At least a single active or stand by master is needed for cluster connection setup. Refer to
<<master.runtime>> for more details.
@ -293,22 +293,42 @@ HMasters instead of ZooKeeper ensemble`
To reduce hot-spotting on a single master, all the masters (active & stand-by) expose the needed
service to fetch the connection metadata. This lets the client connect to any master (not just active).
Both ZooKeeper- and Master-based connection registry implementations are available in 2.3+. For
2.3 and earlier, the ZooKeeper-based implementation remains the default configuration.
The Master-based implementation becomes the default in 3.0.0.
==== RPC hedging
Change the connection registry implementation by updating the value configured for
`hbase.client.registry.impl`. To explicitly enable the ZooKeeper-based registry, use
This feature also implements an new RPC channel that can hedge requests to multiple masters. This
lets the client make the same request to multiple servers and which ever responds first is returned
back to the client and the other other in-flight requests are canceled. This improves the
performance, especially when a subset of servers are under load. The hedging fan out size is
configurable, meaning the number of requests that are hedged in a single attempt, using the
configuration key _hbase.rpc.hedged.fanout_ in the client configuration. It defaults to 2. With this
default, the RPCs are tried in batches of 2. The hedging policy is still primitive and does not
[source, xml]
<property>
<name>hbase.client.registry.impl</name>
<value>org.apache.hadoop.hbase.client.ZKConnectionRegistry</value>
</property>
To explicitly enable the Master-based registry, use
[source, xml]
<property>
<name>hbase.client.registry.impl</name>
<value>org.apache.hadoop.hbase.client.MasterRegistry</value>
</property>
==== MasterRegistry RPC hedging
MasterRegistry implements hedging of connection registry RPCs across active and stand-by masters.
This lets the client make the same request to multiple servers and which ever responds first is
returned back to the client immediately. This improves performance, especially when a subset of
servers are under load. The hedging fan out size is configurable, meaning the number of requests
that are hedged in a single attempt, using the configuration key
_hbase.client.master_registry.hedged.fanout_ in the client configuration. It defaults to 2. With
this default, the RPCs are tried in batches of 2. The hedging policy is still primitive and does not
adapt to any sort of live rpc performance metrics.
==== Additional Notes
* Clients hedge the requests in a randomized order to avoid hot-spotting a single server.
* Cluster internal connections (master<->regionservers) still use ZooKeeper based connection
* Clients hedge the requests in a randomized order to avoid hot-spotting a single master.
* Cluster internal connections (masters &lt;-&gt; regionservers) still use ZooKeeper based connection
registry.
* Cluster internal state is still tracked in Zookeeper, hence ZK availability requirements are same
as before.