HBASE-11285 Expand coprocessor info in ref guide (Misty Stanley-Jones)

2014-06-27 11:11:40 -07:00 · 2014-06-27 11:11:40 -07:00 · bf827a0331
parent 7c1135b3e3
commit bf827a0331
1 changed files with 376 additions and 4 deletions
--- a/src/main/docbkx/cp.xml
+++ b/src/main/docbkx/cp.xml
@ -29,9 +29,381 @@
 */
 -->
  <title>Apache HBase Coprocessors</title>
-  <para>The idea of HBase coprocessors was inspired by Google's BigTable coprocessors. The <link
-      xlink:href="https://blogs.apache.org/hbase/entry/coprocessor_introduction">Apache HBase Blog
-      on Coprocessor</link> is a very good documentation on that. It has detailed information about
-    the coprocessor framework, terminology, management, and so on. </para>
+  <para> HBase coprocessors are modeled after the coprocessors which are part of Google's BigTable<footnote>
+      <para><link
+          xlink:href="http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009" />, pages
+        66-67.</para>
+    </footnote>. Coprocessors function in a similar way to Linux kernel modules. They provide a way
+    to run server-level code against locally-stored data. The functionality they provide is very
+    powerful, but also carries great risk and can have adverse effects on the system, at the level
+    of the operating system. The information in this chapter is primarily sourced and heavily reused
+    from Mingjie Lai's blog post at <link
+      xlink:href="https://blogs.apache.org/hbase/entry/coprocessor_introduction" />. </para>

+  <para> Coprocessors are not designed to be used by end users of HBase, but by HBase developers who
+    need to add specialized functionality to HBase. One example of the use of coprocessors is
+    pluggable compaction and scan policies, which are provided as coprocessors in <link
+      xlink:href="HBASE-6427">HBASE-6427</link>. </para>
+
+  <section>
+    <title>Coprocessor Framework</title>
+    <para>The implementation of HBase coprocessors diverges from the BigTable implementation. The
+      HBase framework provides a library and runtime environment for executing user code within the
+      HBase region server and master processes. </para>
+    <para> The framework API is provided in the <link
+        xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/package-summary.html">coprocessor</link>
+      package.</para>
+    <para>Two different types of coprocessors are provided by the framework, based on their
+      scope.</para>
+    <variablelist>
+      <title>Types of Coprocessors</title>
+      <varlistentry>
+        <term>System Coprocessors</term>
+        <listitem>
+          <para>System coprocessors are loaded globally on all tables and regions hosted by a region
+            server.</para>
+        </listitem>
+      </varlistentry>
+      <varlistentry>
+        <term>Table Coprocessors</term>
+        <listitem>
+          <para>You can specify which coprocessors should be loaded on all regions for a table on a
+            per-table basis.</para>
+        </listitem>
+      </varlistentry>
+    </variablelist>
+
+    <para>The framework provides two different aspects of extensions as well:
+        <firstterm>observers</firstterm> and <firstterm>endpoints</firstterm>.</para>
+    <variablelist>
+      <varlistentry>
+        <term>Observers</term>
+        <listitem>
+          <para>Observers are analogous to triggers in conventional databases. They allow you to
+            insert user code by overriding upcall methods provided by the coprocessor framework.
+            Callback functions are executed from core HBase code when events occur. Callbacks are
+            handled by the framework, and the coprocessor itself only needs to insert the extended
+            or alternate functionality.</para>
+          <variablelist>
+            <title>Provided Observer Interfaces</title>
+            <varlistentry>
+              <term><link
+                  xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html">RegionObserver</link></term>
+              <listitem>
+                <para>A RegionObserver provides hooks for data manipulation events, such as Get,
+                  Put, Delete, Scan. An instance of a RegionObserver coprocessor exists for each
+                  table region. The scope of the observations a RegionObserver can make is
+                  constrained to that region. </para>
+              </listitem>
+            </varlistentry>
+            <varlistentry>
+              <term><link
+                  xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/RegionServerObserver.html">RegionServerObserver</link></term>
+              <listitem>
+                <para>A RegionServerObserver provides for operations related to the RegionServer,
+                  such as stopping the RegionServer and performing operations before or after
+                  merges, commits, or rollbacks.</para>
+              </listitem>
+            </varlistentry>
+            <varlistentry>
+              <term><link
+                  xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/WALObserver.html">WALObserver</link></term>
+              <listitem>
+                <para>A WALObserver provides hooks for operations related to the write-ahead log
+                  (WAL). You can observe or intercept WAL writing and reconstruction events. A
+                  WALObserver runs in the context of WAL processing. A single WALObserver exists on
+                  a single region server.</para>
+              </listitem>
+            </varlistentry>
+            <varlistentry>
+              <term><link
+                  xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/MasterObserver.html">MasterObserver</link></term>
+              <listitem>
+                <para>A MasterObserver provides hooks for DDL-type operation, such as create,
+                  delete, modify table. The MasterObserver runs within the context of the HBase
+                  master. </para>
+              </listitem>
+            </varlistentry>
+          </variablelist>
+          <para>More than one observer of a given type can be loaded at once. Multiple observers are
+            chained to execute sequentially by order of assigned priority. Nothing prevents a
+            coprocessor implementor from communicating internally among its installed
+            observers.</para>
+          <para>An observer of a higher priority can preempt lower-priority observers by throwing an
+            IOException or a subclass of IOException.</para>
+        </listitem>
+      </varlistentry>
+      <varlistentry>
+        <term>Endpoints (HBase 0.96.x and later)</term>
+        <listitem>
+          <para>The implementation for endpoints changed significantly in HBase 0.96.x due to the
+            introduction of protocol buffers (protobufs) (<link
+              xlink:href="https://issues.apache.org/jira/browse/HBASE-5448">HBASE-5488</link>). If
+            you created endpoints before 0.96.x, you will need to rewrite them. Endpoints are now
+            defined and callable as protobuf services, rather than endpoint invocations passed
+            through as Writable blobs</para>
+          <para>Dynamic RPC endpoints resemble stored procedures. An endpoint can be invoked at any
+            time from the client. When it is invoked, it is executed remotely at the target region
+            or regions, and results of the executions are returned to the client.</para>
+          <para>The endpoint implementation is installed on the server and is invoked using HBase
+            RPC. The client library provides convenience methods for invoking these dynamic
+            interfaces. </para>
+          <para>An endpoint, like an observer, can communicate with any installed observers. This
+            allows you to plug new features into HBase without modifying or recompiling HBase
+            itself.</para>
+          <itemizedlist>
+            <title>Steps to Implement an Endpoint</title>
+            <listitem><para>Define the coprocessor service and related messages in a <filename>.proto</filename> file</para></listitem>
+            <listitem><para>Run the <command>protoc</command> command to generate the code.</para></listitem>
+            <listitem><para>Write code to implement the following:</para>
+            <itemizedlist>
+              <listitem><para>the generated protobuf Service interface</para></listitem>
+              <listitem>
+                  <para>the new <link
+                      xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#coprocessorService(byte[])">org.apache.hadoop.hbase.coprocessor.CoprocessorService</link>
+                    interface (required for the <link
+                      xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.html">RegionCoprocessorHost</link>
+                    to register the exposed service)</para></listitem>
+            </itemizedlist>
+            </listitem>
+            <listitem><para>The client calls the new HTable.coprocessorService() methods to perform the endpoint RPCs.</para></listitem>
+          </itemizedlist>
+
+          <para>For more information and examples, refer to the API documentation for the <link
+            xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/coprocessor/package-summary.html">coprocessor</link>
+          package, as well as the included RowCount example in the
+            <filename>/hbase-examples/src/test/java/org/apache/hadoop/hbase/coprocessor/example/</filename>
+            of the HBase source.</para>
+        </listitem>
+      </varlistentry>
+      <varlistentry>
+        <term>Endpoints (HBase 0.94.x and earlier)</term>
+        <listitem>
+          <para>Dynamic RPC endpoints resemble stored procedures. An endpoint can be invoked at any
+            time from the client. When it is invoked, it is executed remotely at the target region
+            or regions, and results of the executions are returned to the client.</para>
+          <para>The endpoint implementation is installed on the server and is invoked using HBase
+            RPC. The client library provides convenience methods for invoking these dynamic
+            interfaces. </para>
+          <para>An endpoint, like an observer, can communicate with any installed observers. This
+            allows you to plug new features into HBase without modifying or recompiling HBase
+            itself.</para>
+          <itemizedlist>
+            <title>Steps to Implement an Endpoint</title>
+            <listitem>
+              <bridgehead>Server-Side</bridgehead>
+              <itemizedlist>
+                <listitem>
+                  <para>Create new protocol interface which extends CoprocessorProtocol.</para>
+                </listitem>
+                <listitem>
+                  <para>Implement the Endpoint interface. The implementation will be loaded into and
+                    executed from the region context.</para>
+                </listitem>
+                <listitem>
+                  <para>Extend the abstract class BaseEndpointCoprocessor. This convenience class
+                    hides some internal details that the implementer does not need to be concerned
+                    about, ˆ such as coprocessor framework class loading.</para>
+                </listitem>
+              </itemizedlist>
+            </listitem>
+            <listitem>
+              <bridgehead>Client-Side</bridgehead>
+              <para>Endpoint can be invoked by two new HBase client APIs:</para>
+              <itemizedlist>
+                <listitem>
+                  <para><code>HTableInterface.coprocessorProxy(Class&lt;T&gt; protocol, byte[]
+                      row)</code> for executing against a single region</para>
+                </listitem>
+                <listitem>
+                  <para><code>HTableInterface.coprocessorExec(Class&lt;T&gt; protocol, byte[]
+                      startKey, byte[] endKey, Batch.Call&lt;T,R&gt; callable)</code> for executing
+                    over a range of regions</para>
+                </listitem>
+              </itemizedlist>
+            </listitem>
+          </itemizedlist>
+        </listitem>
+      </varlistentry>
+    </variablelist>
+  </section>
+
+  <section>
+    <title>Examples</title>
+    <para>An example of an observer is included in
+      <filename>hbase-examples/src/test/java/org/apache/hadoop/hbase/coprocessor/example/TestZooKeeperScanPolicyObserver.java</filename>.
+    Several endpoint examples are included in the same directory.</para>
+  </section>
+ 
+  
+
+  <section>
+    <title>Building A Coprocessor</title>
+
+    <para>Before you can build a processor, it must be developed, compiled, and packaged in a JAR
+      file. The next step is to configure the coprocessor framework to use your coprocessor. You can
+      load the coprocessor from your HBase configuration, so that the coprocessor starts with HBase,
+      or you can configure the coprocessor from the HBase shell, as a table attribute, so that it is
+      loaded dynamically when the table is opened or reopened.</para>
+    <section>
+      <title>Load from Configuration</title>
+      <para> To configure a coprocessor to be loaded when HBase starts, modify the RegionServer's
+          <filename>hbase-site.xml</filename> and configure one of the following properties, based
+        on the type of observer you are configuring: </para>
+      <itemizedlist>
+        <listitem>
+          <para><code>hbase.coprocessor.region.classes</code>for RegionObservers and
+            Endpoints</para>
+        </listitem>
+        <listitem>
+          <para><code>hbase.coprocessor.wal.classes</code>for WALObservers</para>
+        </listitem>
+        <listitem>
+          <para><code>hbase.coprocessor.master.classes</code>for MasterObservers</para>
+        </listitem>
+      </itemizedlist>
+      <example>
+        <title>Example RegionObserver Configuration</title>
+        <para>In this example, one RegionObserver is configured for all the HBase tables.</para>
+        <screen><![CDATA[
+<property>
+    <name>hbase.coprocessor.region.classes</name>
+    <value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value>
+ </property>  ]]>        
+        </screen>
+      </example>
+
+      <para> If multiple classes are specified for loading, the class names must be comma-separated.
+        The framework attempts to load all the configured classes using the default class loader.
+        Therefore, the jar file must reside on the server-side HBase classpath.</para>
+
+      <para>Coprocessors which are loaded in this way will be active on all regions of
+        all tables. These are the system coprocessor introduced earlier. The first listed
+        coprocessors will be assigned the priority <literal>Coprocessor.Priority.SYSTEM</literal>.
+        Each subsequent coprocessor in the list will have its priority value incremented by one
+        (which reduces its priority, because priorities have the natural sort order of Integers). </para>
+      <para>When calling out to registered observers, the framework executes their callbacks methods
+        in the sorted order of their priority. Ties are broken arbitrarily.</para>
+    </section>
+
+    <section>
+      <title>Load from the HBase Shell</title>
+      <para> You can load a coprocessor on a specific table via a table attribute. The following
+        example will load the <systemitem>FooRegionObserver</systemitem> observer when table
+          <systemitem>t1</systemitem> is read or re-read. </para>
+      <example>
+        <title>Load a Coprocessor On a Table Using HBase Shell</title>
+        <screen>
+hbase(main):005:0>  <userinput>alter 't1', METHOD => 'table_att', 
+  'coprocessor'=>'hdfs:///foo.jar|com.foo.FooRegionObserver|1001|arg1=1,arg2=2'</userinput>
+<computeroutput>Updating all regions with the new schema...
+1/1 regions updated.
+Done.
+0 row(s) in 1.0730 seconds</computeroutput>
+
+hbase(main):006:0> <userinput>describe 't1'</userinput>
+<computeroutput>DESCRIPTION                                                        ENABLED                             
+ {NAME => 't1', coprocessor$1 => 'hdfs:///foo.jar|com.foo.FooRegio false                               
+ nObserver|1001|arg1=1,arg2=2', FAMILIES => [{NAME => 'c1', DATA_B                                     
+ LOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE                                     
+  => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS =>                                      
+ '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZ                                     
+ E => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLO                                     
+ CKCACHE => 'true'}, {NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE',                                     
+  BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3'                                     
+ , COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647'                                     
+ , KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY                                      
+ => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]}                                         
+1 row(s) in 0.0190 seconds</computeroutput>
+        </screen>
+      </example>
+
+      <para>The coprocessor framework will try to read the class information from the coprocessor
+        table attribute value. The value contains four pieces of information which are separated by
+        the <literal>|</literal> character.</para>
+
+      <itemizedlist>
+        <listitem>
+          <para>File path: The jar file containing the coprocessor implementation must be in a
+            location where all region servers can read it. You could copy the file onto the local
+            disk on each region server, but it is recommended to store it in HDFS.</para>
+        </listitem>
+        <listitem>
+          <para>Class name: The full class name of the coprocessor.</para>
+        </listitem>
+        <listitem>
+          <para>Priority: An integer. The framework will determine the execution sequence of all
+            configured observers registered at the same hook using priorities. This field can be
+            left blank. In that case the framework will assign a default priority value.</para>
+        </listitem>
+        <listitem>
+          <para>Arguments: This field is passed to the coprocessor implementation.</para>
+        </listitem>
+      </itemizedlist>
+      <example>
+        <title>Unload a Coprocessor From a Table Using HBase Shell</title>
+        <screen>
+hbase(main):007:0> <userinput>alter 't1', METHOD => 'table_att_unset',</userinput> 
+hbase(main):008:0*   <userinput>NAME => 'coprocessor$1'</userinput>
+<computeroutput>Updating all regions with the new schema...
+1/1 regions updated.
+Done.
+0 row(s) in 1.1130 seconds</computeroutput>
+
+hbase(main):009:0> <userinput>describe 't1'</userinput>
+<computeroutput>DESCRIPTION                                                        ENABLED                             
+ {NAME => 't1', FAMILIES => [{NAME => 'c1', DATA_BLOCK_ENCODING => false                               
+  'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSION                                     
+ S => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '214                                     
+ 7483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN                                     
+ _MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true                                     
+ '}, {NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER =>                                      
+ 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION =>                                     
+  'NONE', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_C                                     
+ ELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCO                                     
+ DE_ON_DISK => 'true', BLOCKCACHE => 'true'}]}                                                         
+1 row(s) in 0.0180 seconds          </computeroutput>
+        </screen>
+      </example>
+      <warning>
+        <para>There is no guarantee that the framework will load a given coprocessor successfully.
+          For example, the shell command neither guarantees a jar file exists at a particular
+          location nor verifies whether the given class is actually contained in the jar file.
+        </para>
+      </warning>
+    </section>
+  </section>
+  <section>
+    <title>Check the Status of a Coprocessor</title>
+    <para>To check the status of a coprocessor after it has been configured, use the
+        <command>status</command> HBase Shell command.</para>
+    <screen>
+hbase(main):020:0> <userinput>status 'detailed'</userinput>
+<computeroutput>version 0.92-tm-6
+0 regionsInTransition
+master coprocessors: []
+1 live servers
+    localhost:52761 1328082515520
+        requestsPerSecond=3, numberOfOnlineRegions=3, usedHeapMB=32, maxHeapMB=995
+        -ROOT-,,0
+            numberOfStores=1, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, 
+storefileIndexSizeMB=0, readRequestsCount=54, writeRequestsCount=1, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, 
+totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[]
+        .META.,,1
+            numberOfStores=1, numberOfStorefiles=0, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, 
+storefileIndexSizeMB=0, readRequestsCount=97, writeRequestsCount=4, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, 
+totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[]
+        t1,,1328082575190.c0491168a27620ffe653ec6c04c9b4d1.
+            numberOfStores=2, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, 
+storefileIndexSizeMB=0, readRequestsCount=0, writeRequestsCount=0, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, 
+totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, 
+coprocessors=[AggregateImplementation]
+0 dead servers      </computeroutput>
+    </screen>
+  </section>
+  <section>
+    <title>Status of Coprocessors in HBase</title>
+    <para> Coprocessors and the coprocessor framework are evolving rapidly and work is ongoing on
+      several different JIRAs. </para>
+  </section>
 </chapter>