HBASE-11285 Expand coprocessor info in ref guide (Misty Stanley-Jones)

This commit is contained in:
Gary Helmling 2014-06-27 11:11:40 -07:00
parent 7c1135b3e3
commit bf827a0331
1 changed files with 376 additions and 4 deletions

View File

@ -29,9 +29,381 @@
*/
-->
<title>Apache HBase Coprocessors</title>
<para>The idea of HBase coprocessors was inspired by Google's BigTable coprocessors. The <link
xlink:href="https://blogs.apache.org/hbase/entry/coprocessor_introduction">Apache HBase Blog
on Coprocessor</link> is a very good documentation on that. It has detailed information about
the coprocessor framework, terminology, management, and so on. </para>
<para> HBase coprocessors are modeled after the coprocessors which are part of Google's BigTable<footnote>
<para><link
xlink:href="http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009" />, pages
66-67.</para>
</footnote>. Coprocessors function in a similar way to Linux kernel modules. They provide a way
to run server-level code against locally-stored data. The functionality they provide is very
powerful, but also carries great risk and can have adverse effects on the system, at the level
of the operating system. The information in this chapter is primarily sourced and heavily reused
from Mingjie Lai's blog post at <link
xlink:href="https://blogs.apache.org/hbase/entry/coprocessor_introduction" />. </para>
<para> Coprocessors are not designed to be used by end users of HBase, but by HBase developers who
need to add specialized functionality to HBase. One example of the use of coprocessors is
pluggable compaction and scan policies, which are provided as coprocessors in <link
xlink:href="HBASE-6427">HBASE-6427</link>. </para>
<section>
<title>Coprocessor Framework</title>
<para>The implementation of HBase coprocessors diverges from the BigTable implementation. The
HBase framework provides a library and runtime environment for executing user code within the
HBase region server and master processes. </para>
<para> The framework API is provided in the <link
xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/package-summary.html">coprocessor</link>
package.</para>
<para>Two different types of coprocessors are provided by the framework, based on their
scope.</para>
<variablelist>
<title>Types of Coprocessors</title>
<varlistentry>
<term>System Coprocessors</term>
<listitem>
<para>System coprocessors are loaded globally on all tables and regions hosted by a region
server.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Table Coprocessors</term>
<listitem>
<para>You can specify which coprocessors should be loaded on all regions for a table on a
per-table basis.</para>
</listitem>
</varlistentry>
</variablelist>
<para>The framework provides two different aspects of extensions as well:
<firstterm>observers</firstterm> and <firstterm>endpoints</firstterm>.</para>
<variablelist>
<varlistentry>
<term>Observers</term>
<listitem>
<para>Observers are analogous to triggers in conventional databases. They allow you to
insert user code by overriding upcall methods provided by the coprocessor framework.
Callback functions are executed from core HBase code when events occur. Callbacks are
handled by the framework, and the coprocessor itself only needs to insert the extended
or alternate functionality.</para>
<variablelist>
<title>Provided Observer Interfaces</title>
<varlistentry>
<term><link
xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html">RegionObserver</link></term>
<listitem>
<para>A RegionObserver provides hooks for data manipulation events, such as Get,
Put, Delete, Scan. An instance of a RegionObserver coprocessor exists for each
table region. The scope of the observations a RegionObserver can make is
constrained to that region. </para>
</listitem>
</varlistentry>
<varlistentry>
<term><link
xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/RegionServerObserver.html">RegionServerObserver</link></term>
<listitem>
<para>A RegionServerObserver provides for operations related to the RegionServer,
such as stopping the RegionServer and performing operations before or after
merges, commits, or rollbacks.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><link
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/WALObserver.html">WALObserver</link></term>
<listitem>
<para>A WALObserver provides hooks for operations related to the write-ahead log
(WAL). You can observe or intercept WAL writing and reconstruction events. A
WALObserver runs in the context of WAL processing. A single WALObserver exists on
a single region server.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><link
xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/MasterObserver.html">MasterObserver</link></term>
<listitem>
<para>A MasterObserver provides hooks for DDL-type operation, such as create,
delete, modify table. The MasterObserver runs within the context of the HBase
master. </para>
</listitem>
</varlistentry>
</variablelist>
<para>More than one observer of a given type can be loaded at once. Multiple observers are
chained to execute sequentially by order of assigned priority. Nothing prevents a
coprocessor implementor from communicating internally among its installed
observers.</para>
<para>An observer of a higher priority can preempt lower-priority observers by throwing an
IOException or a subclass of IOException.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Endpoints (HBase 0.96.x and later)</term>
<listitem>
<para>The implementation for endpoints changed significantly in HBase 0.96.x due to the
introduction of protocol buffers (protobufs) (<link
xlink:href="https://issues.apache.org/jira/browse/HBASE-5448">HBASE-5488</link>). If
you created endpoints before 0.96.x, you will need to rewrite them. Endpoints are now
defined and callable as protobuf services, rather than endpoint invocations passed
through as Writable blobs</para>
<para>Dynamic RPC endpoints resemble stored procedures. An endpoint can be invoked at any
time from the client. When it is invoked, it is executed remotely at the target region
or regions, and results of the executions are returned to the client.</para>
<para>The endpoint implementation is installed on the server and is invoked using HBase
RPC. The client library provides convenience methods for invoking these dynamic
interfaces. </para>
<para>An endpoint, like an observer, can communicate with any installed observers. This
allows you to plug new features into HBase without modifying or recompiling HBase
itself.</para>
<itemizedlist>
<title>Steps to Implement an Endpoint</title>
<listitem><para>Define the coprocessor service and related messages in a <filename>.proto</filename> file</para></listitem>
<listitem><para>Run the <command>protoc</command> command to generate the code.</para></listitem>
<listitem><para>Write code to implement the following:</para>
<itemizedlist>
<listitem><para>the generated protobuf Service interface</para></listitem>
<listitem>
<para>the new <link
xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#coprocessorService(byte[])">org.apache.hadoop.hbase.coprocessor.CoprocessorService</link>
interface (required for the <link
xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.html">RegionCoprocessorHost</link>
to register the exposed service)</para></listitem>
</itemizedlist>
</listitem>
<listitem><para>The client calls the new HTable.coprocessorService() methods to perform the endpoint RPCs.</para></listitem>
</itemizedlist>
<para>For more information and examples, refer to the API documentation for the <link
xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/coprocessor/package-summary.html">coprocessor</link>
package, as well as the included RowCount example in the
<filename>/hbase-examples/src/test/java/org/apache/hadoop/hbase/coprocessor/example/</filename>
of the HBase source.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Endpoints (HBase 0.94.x and earlier)</term>
<listitem>
<para>Dynamic RPC endpoints resemble stored procedures. An endpoint can be invoked at any
time from the client. When it is invoked, it is executed remotely at the target region
or regions, and results of the executions are returned to the client.</para>
<para>The endpoint implementation is installed on the server and is invoked using HBase
RPC. The client library provides convenience methods for invoking these dynamic
interfaces. </para>
<para>An endpoint, like an observer, can communicate with any installed observers. This
allows you to plug new features into HBase without modifying or recompiling HBase
itself.</para>
<itemizedlist>
<title>Steps to Implement an Endpoint</title>
<listitem>
<bridgehead>Server-Side</bridgehead>
<itemizedlist>
<listitem>
<para>Create new protocol interface which extends CoprocessorProtocol.</para>
</listitem>
<listitem>
<para>Implement the Endpoint interface. The implementation will be loaded into and
executed from the region context.</para>
</listitem>
<listitem>
<para>Extend the abstract class BaseEndpointCoprocessor. This convenience class
hides some internal details that the implementer does not need to be concerned
about, ˆ such as coprocessor framework class loading.</para>
</listitem>
</itemizedlist>
</listitem>
<listitem>
<bridgehead>Client-Side</bridgehead>
<para>Endpoint can be invoked by two new HBase client APIs:</para>
<itemizedlist>
<listitem>
<para><code>HTableInterface.coprocessorProxy(Class&lt;T&gt; protocol, byte[]
row)</code> for executing against a single region</para>
</listitem>
<listitem>
<para><code>HTableInterface.coprocessorExec(Class&lt;T&gt; protocol, byte[]
startKey, byte[] endKey, Batch.Call&lt;T,R&gt; callable)</code> for executing
over a range of regions</para>
</listitem>
</itemizedlist>
</listitem>
</itemizedlist>
</listitem>
</varlistentry>
</variablelist>
</section>
<section>
<title>Examples</title>
<para>An example of an observer is included in
<filename>hbase-examples/src/test/java/org/apache/hadoop/hbase/coprocessor/example/TestZooKeeperScanPolicyObserver.java</filename>.
Several endpoint examples are included in the same directory.</para>
</section>
<section>
<title>Building A Coprocessor</title>
<para>Before you can build a processor, it must be developed, compiled, and packaged in a JAR
file. The next step is to configure the coprocessor framework to use your coprocessor. You can
load the coprocessor from your HBase configuration, so that the coprocessor starts with HBase,
or you can configure the coprocessor from the HBase shell, as a table attribute, so that it is
loaded dynamically when the table is opened or reopened.</para>
<section>
<title>Load from Configuration</title>
<para> To configure a coprocessor to be loaded when HBase starts, modify the RegionServer's
<filename>hbase-site.xml</filename> and configure one of the following properties, based
on the type of observer you are configuring: </para>
<itemizedlist>
<listitem>
<para><code>hbase.coprocessor.region.classes</code>for RegionObservers and
Endpoints</para>
</listitem>
<listitem>
<para><code>hbase.coprocessor.wal.classes</code>for WALObservers</para>
</listitem>
<listitem>
<para><code>hbase.coprocessor.master.classes</code>for MasterObservers</para>
</listitem>
</itemizedlist>
<example>
<title>Example RegionObserver Configuration</title>
<para>In this example, one RegionObserver is configured for all the HBase tables.</para>
<screen><![CDATA[
<property>
<name>hbase.coprocessor.region.classes</name>
<value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value>
</property> ]]>
</screen>
</example>
<para> If multiple classes are specified for loading, the class names must be comma-separated.
The framework attempts to load all the configured classes using the default class loader.
Therefore, the jar file must reside on the server-side HBase classpath.</para>
<para>Coprocessors which are loaded in this way will be active on all regions of
all tables. These are the system coprocessor introduced earlier. The first listed
coprocessors will be assigned the priority <literal>Coprocessor.Priority.SYSTEM</literal>.
Each subsequent coprocessor in the list will have its priority value incremented by one
(which reduces its priority, because priorities have the natural sort order of Integers). </para>
<para>When calling out to registered observers, the framework executes their callbacks methods
in the sorted order of their priority. Ties are broken arbitrarily.</para>
</section>
<section>
<title>Load from the HBase Shell</title>
<para> You can load a coprocessor on a specific table via a table attribute. The following
example will load the <systemitem>FooRegionObserver</systemitem> observer when table
<systemitem>t1</systemitem> is read or re-read. </para>
<example>
<title>Load a Coprocessor On a Table Using HBase Shell</title>
<screen>
hbase(main):005:0> <userinput>alter 't1', METHOD => 'table_att',
'coprocessor'=>'hdfs:///foo.jar|com.foo.FooRegionObserver|1001|arg1=1,arg2=2'</userinput>
<computeroutput>Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.0730 seconds</computeroutput>
hbase(main):006:0> <userinput>describe 't1'</userinput>
<computeroutput>DESCRIPTION ENABLED
{NAME => 't1', coprocessor$1 => 'hdfs:///foo.jar|com.foo.FooRegio false
nObserver|1001|arg1=1,arg2=2', FAMILIES => [{NAME => 'c1', DATA_B
LOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE
=> '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS =>
'0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZ
E => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLO
CKCACHE => 'true'}, {NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE',
BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3'
, COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647'
, KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY
=> 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]}
1 row(s) in 0.0190 seconds</computeroutput>
</screen>
</example>
<para>The coprocessor framework will try to read the class information from the coprocessor
table attribute value. The value contains four pieces of information which are separated by
the <literal>|</literal> character.</para>
<itemizedlist>
<listitem>
<para>File path: The jar file containing the coprocessor implementation must be in a
location where all region servers can read it. You could copy the file onto the local
disk on each region server, but it is recommended to store it in HDFS.</para>
</listitem>
<listitem>
<para>Class name: The full class name of the coprocessor.</para>
</listitem>
<listitem>
<para>Priority: An integer. The framework will determine the execution sequence of all
configured observers registered at the same hook using priorities. This field can be
left blank. In that case the framework will assign a default priority value.</para>
</listitem>
<listitem>
<para>Arguments: This field is passed to the coprocessor implementation.</para>
</listitem>
</itemizedlist>
<example>
<title>Unload a Coprocessor From a Table Using HBase Shell</title>
<screen>
hbase(main):007:0> <userinput>alter 't1', METHOD => 'table_att_unset',</userinput>
hbase(main):008:0* <userinput>NAME => 'coprocessor$1'</userinput>
<computeroutput>Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.1130 seconds</computeroutput>
hbase(main):009:0> <userinput>describe 't1'</userinput>
<computeroutput>DESCRIPTION ENABLED
{NAME => 't1', FAMILIES => [{NAME => 'c1', DATA_BLOCK_ENCODING => false
'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSION
S => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '214
7483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN
_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true
'}, {NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER =>
'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION =>
'NONE', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_C
ELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCO
DE_ON_DISK => 'true', BLOCKCACHE => 'true'}]}
1 row(s) in 0.0180 seconds </computeroutput>
</screen>
</example>
<warning>
<para>There is no guarantee that the framework will load a given coprocessor successfully.
For example, the shell command neither guarantees a jar file exists at a particular
location nor verifies whether the given class is actually contained in the jar file.
</para>
</warning>
</section>
</section>
<section>
<title>Check the Status of a Coprocessor</title>
<para>To check the status of a coprocessor after it has been configured, use the
<command>status</command> HBase Shell command.</para>
<screen>
hbase(main):020:0> <userinput>status 'detailed'</userinput>
<computeroutput>version 0.92-tm-6
0 regionsInTransition
master coprocessors: []
1 live servers
localhost:52761 1328082515520
requestsPerSecond=3, numberOfOnlineRegions=3, usedHeapMB=32, maxHeapMB=995
-ROOT-,,0
numberOfStores=1, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0,
storefileIndexSizeMB=0, readRequestsCount=54, writeRequestsCount=1, rootIndexSizeKB=0, totalStaticIndexSizeKB=0,
totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[]
.META.,,1
numberOfStores=1, numberOfStorefiles=0, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0,
storefileIndexSizeMB=0, readRequestsCount=97, writeRequestsCount=4, rootIndexSizeKB=0, totalStaticIndexSizeKB=0,
totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[]
t1,,1328082575190.c0491168a27620ffe653ec6c04c9b4d1.
numberOfStores=2, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0,
storefileIndexSizeMB=0, readRequestsCount=0, writeRequestsCount=0, rootIndexSizeKB=0, totalStaticIndexSizeKB=0,
totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN,
coprocessors=[AggregateImplementation]
0 dead servers </computeroutput>
</screen>
</section>
<section>
<title>Status of Coprocessors in HBase</title>
<para> Coprocessors and the coprocessor framework are evolving rapidly and work is ongoing on
several different JIRAs. </para>
</section>
</chapter>