HBASE-23890 Update the rsgroup section in our ref guide (#1206)

Signed-off-by: Sean Busbey <busbey@apache.org>
This commit is contained in:
Duo Zhang 2020-02-29 08:52:19 +08:00 committed by Duo Zhang
parent 7f2d823164
commit 420e38083f
2 changed files with 105 additions and 51 deletions

View File

@ -3402,40 +3402,38 @@ full implications and have a sufficient background in managing HBase clusters.
It was developed by Yahoo! and they run it at scale on their large grid cluster. It was developed by Yahoo! and they run it at scale on their large grid cluster.
See link:http://www.slideshare.net/HBaseCon/keynote-apache-hbase-at-yahoo-scale[HBase at Yahoo! Scale]. See link:http://www.slideshare.net/HBaseCon/keynote-apache-hbase-at-yahoo-scale[HBase at Yahoo! Scale].
RSGroups are defined and managed with shell commands. The shell drives a RSGroups can be defined and managed with both admin methods and shell commands.
Coprocessor Endpoint whose API is marked private given this is an evolving
feature; the Coprocessor API is not for public consumption.
A server can be added to a group with hostname and port pair and tables A server can be added to a group with hostname and port pair and tables
can be moved to this group so that only regionservers in the same rsgroup can can be moved to this group so that only regionservers in the same rsgroup can
host the regions of the table. RegionServers and tables can only belong to one host the regions of the table. The group for a table is stored in its
rsgroup at a time. By default, all tables and regionservers belong to the TableDescriptor, the property name is `hbase.rsgroup.name`. You can also set
`default` rsgroup. System tables can also be put into a rsgroup using the regular this property on a namespace, so it will cause all the tables under this
APIs. A custom balancer implementation tracks assignments per rsgroup and makes namespace to be placed into this group. RegionServers and tables can only
sure to move regions to the relevant regionservers in that rsgroup. The rsgroup belong to one rsgroup at a time. By default, all tables and regionservers
information is stored in a regular HBase table, and a zookeeper-based read-only belong to the `default` rsgroup. System tables can also be put into a
cache is used at cluster bootstrap time. rsgroup using the regular APIs. A custom balancer implementation tracks
assignments per rsgroup and makes sure to move regions to the relevant
regionservers in that rsgroup. The rsgroup information is stored in a regular
HBase table, and a zookeeper-based read-only cache is used at cluster bootstrap
time.
To enable, add the following to your hbase-site.xml and restart your Master: To enable, add the following to your hbase-site.xml and restart your Master:
[source,xml] [source,xml]
---- ----
<property> <property>
<name>hbase.coprocessor.master.classes</name> <name>hbase.balancer.rsgroup.enabled</name>
<value>org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint</value> <value>true</value>
</property>
<property>
<name>hbase.master.loadbalancer.class</name>
<value>org.apache.hadoop.hbase.rsgroup.RSGroupBasedLoadBalancer</value>
</property> </property>
---- ----
Then use the shell _rsgroup_ commands to create and manipulate RegionServer Then use the admin/shell _rsgroup_ methods/commands to create and manipulate
groups: e.g. to add a rsgroup and then add a server to it. To see the list of RegionServer groups: e.g. to add a rsgroup and then add a server to it.
rsgroup commands available in the hbase shell type: To see the list of rsgroup commands available in the hbase shell type:
[source, bash] [source, bash]
---- ----
hbase(main):008:0> help rsgroup hbase(main):008:0> help 'rsgroup'
Took 0.5610 seconds Took 0.5610 seconds
---- ----
@ -3449,7 +3447,8 @@ Master UI home page. If you click on a table, you can see what servers it is
deployed across. You should see here a reflection of the grouping done with deployed across. You should see here a reflection of the grouping done with
your shell commands. View the master log if issues. your shell commands. View the master log if issues.
Here is example using a few of the rsgroup commands. To add a group, do as follows: Here is example using a few of the rsgroup commands. To add a group, do as
follows:
[source, bash] [source, bash]
---- ----
@ -3461,20 +3460,10 @@ Here is example using a few of the rsgroup commands. To add a group, do as foll
.RegionServer Groups must be Enabled .RegionServer Groups must be Enabled
[NOTE] [NOTE]
==== ====
If you have not enabled the rsgroup Coprocessor Endpoint in the master and If you have not enabled the rsgroup feature and you call any of the rsgroup
you run the any of the rsgroup shell commands, you will see an error message admin methods or shell commands the call will fail with a
like the below: `DoNotRetryIOException` with a detail message that says the rsgroup feature
is disabled.
[source,java]
----
ERROR: org.apache.hadoop.hbase.exceptions.UnknownProtocolException: No registered master coprocessor service found for name RSGroupAdminService
at org.apache.hadoop.hbase.master.MasterRpcServices.execMasterService(MasterRpcServices.java:604)
at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1140)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:257)
----
==== ====
Add a server (specified by hostname + port) to the just-made group using the Add a server (specified by hostname + port) to the just-made group using the
@ -3500,23 +3489,21 @@ Servers come and go over the lifetime of a Cluster. Currently, you must
manually align the servers referenced in rsgroups with the actual state of manually align the servers referenced in rsgroups with the actual state of
nodes in the running cluster. What we mean by this is that if you decommission nodes in the running cluster. What we mean by this is that if you decommission
a server, then you must update rsgroups as part of your server decommission a server, then you must update rsgroups as part of your server decommission
process removing references. process removing references. Notice that, by calling `clearDeadServers`
manually will also remove the dead servers from any rsgroups, but the problem
is that we will lost track of the dead servers after master restarts, which
means you still need to update the rsgroup by your own.
But, there is no _remove_offline_servers_rsgroup_command you say! Please use `Admin.removeServersFromRSGroup` or shell command
_remove_servers_rsgroup_ to remove decommission servers from rsgroup.
The way to remove a server is to move it to the `default` group. The `default`
group is special. All rsgroups, but the `default` rsgroup, are static in that
edits via the shell commands are persisted to the system `hbase:rsgroup` table.
If they reference a decommissioned server, then they need to be updated to undo
the reference.
The `default` group is not like other rsgroups in that it is dynamic. Its server The `default` group is not like other rsgroups in that it is dynamic. Its server
list mirrors the current state of the cluster; i.e. if you shutdown a server that list mirrors the current state of the cluster; i.e. if you shutdown a server that
was part of the `default` rsgroup, and then do a _get_rsgroup_ `default` to list was part of the `default` rsgroup, and then do a _get_rsgroup_ `default` to list
its content in the shell, the server will no longer be listed. For non-`default` its content in the shell, the server will no longer be listed. For non-default
groups, though a mode may be offline, it will persist in the non-`default` groups groups, though a mode may be offline, it will persist in the non-default groups
list of servers. But if you move the offline server from the non-default rsgroup list of servers. But if you move the offline server from the non-default rsgroup
to default, it will not show in the `default` list. It will just be dropped. to default, it will not show in the `default` list. It will just be dropped.
=== Best Practice === Best Practice
The authors of the rsgroup feature, the Yahoo! HBase Engineering team, have been The authors of the rsgroup feature, the Yahoo! HBase Engineering team, have been
@ -3526,7 +3513,7 @@ practices informed by their experience.
==== Isolate System Tables ==== Isolate System Tables
Either have a system rsgroup where all the system tables are or just leave the Either have a system rsgroup where all the system tables are or just leave the
system tables in `default` rsgroup and have all user-space tables are in system tables in `default` rsgroup and have all user-space tables are in
non-`default` rsgroups. non-default rsgroups.
==== Dead Nodes ==== Dead Nodes
Yahoo! Have found it useful at their scale to keep a special rsgroup of dead or Yahoo! Have found it useful at their scale to keep a special rsgroup of dead or
@ -3541,10 +3528,23 @@ Viewing the Master log will give you insight on rsgroup operation.
If it appears stuck, restart the Master process. If it appears stuck, restart the Master process.
=== Remove RegionServer Grouping === Remove RegionServer Grouping
Removing RegionServer Grouping feature from a cluster on which it was enabled involves Simply disable RegionServer Grouping feature is easy, just remove the
more steps in addition to removing the relevant properties from `hbase-site.xml`. This is 'hbase.balancer.rsgroup.enabled' from hbase-site.xml or explicitly set it to
to clean the RegionServer grouping related meta data so that if the feature is re-enabled false in hbase-site.xml.
in the future, the old meta data will not affect the functioning of the cluster.
[source,xml]
----
<property>
<name>hbase.balancer.rsgroup.enabled</name>
<value>false</value>
</property>
----
But if you change the 'hbase.balancer.rsgroup.enabled' to true, the old rsgroup
configs will take effect again. So if you want to completely remove the
RegionServer Grouping feature from a cluster, so that if the feature is
re-enabled in the future, the old meta data will not affect the functioning of
the cluster, there are more steps to do.
- Move all tables in non-default rsgroups to `default` regionserver group - Move all tables in non-default rsgroups to `default` regionserver group
[source,bash] [source,bash]
@ -3592,6 +3592,56 @@ To enable ACL, add the following to your hbase-site.xml and restart your Master:
<value>true</value> <value>true</value>
<property> <property>
---- ----
[[migrating.rsgroup]]
=== Migrating From Old Implementation
The coprocessor `org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint` is
deprected, but for compatible, if you want the pre 3.0.0 hbase client/shell
to communicate with the new hbase cluster, you still need to add this
coprocessor to master.
The `hbase.rsgroup.grouploadbalancer.class` config has been deprecated, as now
the top level load balancer will always be `RSGroupBasedLoadBalaner`, and the
`hbase.master.loadbalancer.class` config is for configuring the balancer within
a group. This also means you should not set `hbase.master.loadbalancer.class`
to `RSGroupBasedLoadBalaner` any more even if rsgroup feature is enabled.
And we have done some special changes for compatibility. First, if coprocessor
`org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint` is specified, the
`hbase.balancer.rsgroup.enabled` flag will be set to true automatically to
enable rs group feature. Second, we will load
`hbase.rsgroup.grouploadbalancer.class` prior to
`hbase.master.loadbalancer.class`. And last, if you do not set
`hbase.rsgroup.grouploadbalancer.class` but only set
`hbase.master.loadbalancer.class` to `RSGroupBasedLoadBalancer`, we will load
the default load balancer to avoid infinite nesting. This means you do not need
to change anything when upgrading if you have already enabled rs group feature.
The main difference comparing to the old implementation is that, now the
rsgroup for a table is stored in `TableDescriptor`, instead of in
`RSGroupInfo`, so the `getTables` method of `RSGroupInfo` has been deprecated.
And if you use the `Admin` methods to get the `RSGroupInfo`, its `getTables`
method will always return empty. This is because that in the old
implementation, this method is a bit broken as you can set rsgroup on namespace
and make all the tables under this namespace into this group but you can not
get these tables through `RSGroupInfo.getTables`. Now you should use the two
new methods `listTablesInRSGroup` and
`getConfiguredNamespacesAndTablesInRSGroup` in `Admin` to get tables and
namespaces in a rsgroup.
Of course the behavior for the old RSGroupAdminEndpoint is not changed,
we will fill the tables field of the RSGroupInfo before returning, to make it
compatible with old hbase client/shell.
When upgrading, the migration between the RSGroupInfo and TableDescriptor will
be done automatically. It will take sometime, but it is fine to restart master
in the middle, the migration will continue after restart. And during the
migration, the rs group feature will still work and in most cases the region
will not be misplaced(since this is only a one time job and will not last too
long so we have not test it very seriously to make sure the region will not be
misplaced always, so we use the word 'in most cases'). The implementation is a
bit tricky, you can see the code in `RSGroupInfoManagerImpl.migrate` if
interested.

View File

@ -314,6 +314,10 @@ Quitting...
. Verify HBase contentsuse the HBase shell to list tables and scan some known values. . Verify HBase contentsuse the HBase shell to list tables and scan some known values.
== Upgrade Paths == Upgrade Paths
[[upgrade3.0]]
=== Upgrade from 2.x to 3.x
The RegionServer Grouping feature has been reimplemented. See section
<<migrating.rsgroup>> in <<ops_mgt>> for more details.
[[upgrade2.2]] [[upgrade2.2]]
=== Upgrade from 2.0 or 2.1 to 2.2+ === Upgrade from 2.0 or 2.1 to 2.2+