HBASE-11285 Expand coprocessor info in ref guide (Misty Stanley-Jones)

2014-06-27 11:11:40 -07:00 · 2014-06-27 11:11:40 -07:00 · bf827a0331
parent 7c1135b3e3
commit bf827a0331
1 changed files with 376 additions and 4 deletions
--- a/src/main/docbkx/cp.xml
+++ b/src/main/docbkx/cp.xml
@ -29,9 +29,381 @@
 */
 -->
  <title>Apache HBase Coprocessors</title>
-  <para>The idea of HBase coprocessors was inspired by Google's BigTable coprocessors. The <link
+  <para> HBase coprocessors are modeled after the coprocessors which are part of Google's BigTable<footnote>
-      xlink:href="https://blogs.apache.org/hbase/entry/coprocessor_introduction">Apache HBase Blog
+      <para><link
-      on Coprocessor</link> is a very good documentation on that. It has detailed information about
+          xlink:href="http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009" />, pages
-    the coprocessor framework, terminology, management, and so on. </para>
+        66-67.</para>
    </footnote>. Coprocessors function in a similar way to Linux kernel modules. They provide a way
    to run server-level code against locally-stored data. The functionality they provide is very
    powerful, but also carries great risk and can have adverse effects on the system, at the level
    of the operating system. The information in this chapter is primarily sourced and heavily reused
    from Mingjie Lai's blog post at <link
      xlink:href="https://blogs.apache.org/hbase/entry/coprocessor_introduction" />. </para>
  <para> Coprocessors are not designed to be used by end users of HBase, but by HBase developers who
    need to add specialized functionality to HBase. One example of the use of coprocessors is
    pluggable compaction and scan policies, which are provided as coprocessors in <link
      xlink:href="HBASE-6427">HBASE-6427</link>. </para>
  <section>
    <title>Coprocessor Framework</title>
    <para>The implementation of HBase coprocessors diverges from the BigTable implementation. The
      HBase framework provides a library and runtime environment for executing user code within the
      HBase region server and master processes. </para>
    <para> The framework API is provided in the <link
        xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/package-summary.html">coprocessor</link>
      package.</para>
    <para>Two different types of coprocessors are provided by the framework, based on their
      scope.</para>
    <variablelist>
      <title>Types of Coprocessors</title>
      <varlistentry>
        <term>System Coprocessors</term>
        <listitem>
          <para>System coprocessors are loaded globally on all tables and regions hosted by a region
            server.</para>
        </listitem>
      </varlistentry>
      <varlistentry>
        <term>Table Coprocessors</term>
        <listitem>
          <para>You can specify which coprocessors should be loaded on all regions for a table on a
            per-table basis.</para>
        </listitem>
      </varlistentry>
    </variablelist>
    <para>The framework provides two different aspects of extensions as well:
        <firstterm>observers</firstterm> and <firstterm>endpoints</firstterm>.</para>
    <variablelist>
      <varlistentry>
        <term>Observers</term>
        <listitem>
          <para>Observers are analogous to triggers in conventional databases. They allow you to
            insert user code by overriding upcall methods provided by the coprocessor framework.
            Callback functions are executed from core HBase code when events occur. Callbacks are
            handled by the framework, and the coprocessor itself only needs to insert the extended
            or alternate functionality.</para>
          <variablelist>
            <title>Provided Observer Interfaces</title>
            <varlistentry>
              <term><link
                  xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html">RegionObserver</link></term>
              <listitem>
                <para>A RegionObserver provides hooks for data manipulation events, such as Get,
                  Put, Delete, Scan. An instance of a RegionObserver coprocessor exists for each
                  table region. The scope of the observations a RegionObserver can make is
                  constrained to that region. </para>
              </listitem>
            </varlistentry>
            <varlistentry>
              <term><link
                  xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/RegionServerObserver.html">RegionServerObserver</link></term>
              <listitem>
                <para>A RegionServerObserver provides for operations related to the RegionServer,
                  such as stopping the RegionServer and performing operations before or after
                  merges, commits, or rollbacks.</para>
              </listitem>
            </varlistentry>
            <varlistentry>
              <term><link
                  xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/WALObserver.html">WALObserver</link></term>
              <listitem>
                <para>A WALObserver provides hooks for operations related to the write-ahead log
                  (WAL). You can observe or intercept WAL writing and reconstruction events. A
                  WALObserver runs in the context of WAL processing. A single WALObserver exists on
                  a single region server.</para>
              </listitem>
            </varlistentry>
            <varlistentry>
              <term><link
                  xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/MasterObserver.html">MasterObserver</link></term>
              <listitem>
                <para>A MasterObserver provides hooks for DDL-type operation, such as create,
                  delete, modify table. The MasterObserver runs within the context of the HBase
                  master. </para>
              </listitem>
            </varlistentry>
          </variablelist>
          <para>More than one observer of a given type can be loaded at once. Multiple observers are
            chained to execute sequentially by order of assigned priority. Nothing prevents a
            coprocessor implementor from communicating internally among its installed
            observers.</para>
          <para>An observer of a higher priority can preempt lower-priority observers by throwing an
            IOException or a subclass of IOException.</para>
        </listitem>
      </varlistentry>
      <varlistentry>
        <term>Endpoints (HBase 0.96.x and later)</term>
        <listitem>
          <para>The implementation for endpoints changed significantly in HBase 0.96.x due to the
            introduction of protocol buffers (protobufs) (<link
              xlink:href="https://issues.apache.org/jira/browse/HBASE-5448">HBASE-5488</link>). If
            you created endpoints before 0.96.x, you will need to rewrite them. Endpoints are now
            defined and callable as protobuf services, rather than endpoint invocations passed
            through as Writable blobs</para>
          <para>Dynamic RPC endpoints resemble stored procedures. An endpoint can be invoked at any
            time from the client. When it is invoked, it is executed remotely at the target region
            or regions, and results of the executions are returned to the client.</para>
          <para>The endpoint implementation is installed on the server and is invoked using HBase
            RPC. The client library provides convenience methods for invoking these dynamic
            interfaces. </para>
          <para>An endpoint, like an observer, can communicate with any installed observers. This
            allows you to plug new features into HBase without modifying or recompiling HBase
            itself.</para>
          <itemizedlist>
            <title>Steps to Implement an Endpoint</title>
            <listitem><para>Define the coprocessor service and related messages in a <filename>.proto</filename> file</para></listitem>
            <listitem><para>Run the <command>protoc</command> command to generate the code.</para></listitem>
            <listitem><para>Write code to implement the following:</para>
            <itemizedlist>
              <listitem><para>the generated protobuf Service interface</para></listitem>
              <listitem>
                  <para>the new <link
                      xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#coprocessorService(byte[])">org.apache.hadoop.hbase.coprocessor.CoprocessorService</link>
                    interface (required for the <link
                      xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.html">RegionCoprocessorHost</link>
                    to register the exposed service)</para></listitem>
            </itemizedlist>
            </listitem>
            <listitem><para>The client calls the new HTable.coprocessorService() methods to perform the endpoint RPCs.</para></listitem>
          </itemizedlist>
          <para>For more information and examples, refer to the API documentation for the <link
            xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/coprocessor/package-summary.html">coprocessor</link>
          package, as well as the included RowCount example in the
            <filename>/hbase-examples/src/test/java/org/apache/hadoop/hbase/coprocessor/example/</filename>
            of the HBase source.</para>
        </listitem>
      </varlistentry>
      <varlistentry>
        <term>Endpoints (HBase 0.94.x and earlier)</term>
        <listitem>
          <para>Dynamic RPC endpoints resemble stored procedures. An endpoint can be invoked at any
            time from the client. When it is invoked, it is executed remotely at the target region
            or regions, and results of the executions are returned to the client.</para>
          <para>The endpoint implementation is installed on the server and is invoked using HBase
            RPC. The client library provides convenience methods for invoking these dynamic
            interfaces. </para>
          <para>An endpoint, like an observer, can communicate with any installed observers. This
            allows you to plug new features into HBase without modifying or recompiling HBase
            itself.</para>
          <itemizedlist>
            <title>Steps to Implement an Endpoint</title>
            <listitem>
              <bridgehead>Server-Side</bridgehead>
              <itemizedlist>
                <listitem>
                  <para>Create new protocol interface which extends CoprocessorProtocol.</para>
                </listitem>
                <listitem>
                  <para>Implement the Endpoint interface. The implementation will be loaded into and
                    executed from the region context.</para>
                </listitem>
                <listitem>
                  <para>Extend the abstract class BaseEndpointCoprocessor. This convenience class
                    hides some internal details that the implementer does not need to be concerned
                    about, ˆ such as coprocessor framework class loading.</para>
                </listitem>
              </itemizedlist>
            </listitem>
            <listitem>
              <bridgehead>Client-Side</bridgehead>
              <para>Endpoint can be invoked by two new HBase client APIs:</para>
              <itemizedlist>
                <listitem>
                  <para><code>HTableInterface.coprocessorProxy(Class&lt;T&gt; protocol, byte[]
                      row)</code> for executing against a single region</para>
                </listitem>
                <listitem>
                  <para><code>HTableInterface.coprocessorExec(Class&lt;T&gt; protocol, byte[]
                      startKey, byte[] endKey, Batch.Call&lt;T,R&gt; callable)</code> for executing
                    over a range of regions</para>
                </listitem>
              </itemizedlist>
            </listitem>
          </itemizedlist>
        </listitem>
      </varlistentry>
    </variablelist>
  </section>
  <section>
    <title>Examples</title>
    <para>An example of an observer is included in
      <filename>hbase-examples/src/test/java/org/apache/hadoop/hbase/coprocessor/example/TestZooKeeperScanPolicyObserver.java</filename>.
    Several endpoint examples are included in the same directory.</para>
  </section>
  <section>
    <title>Building A Coprocessor</title>
    <para>Before you can build a processor, it must be developed, compiled, and packaged in a JAR
      file. The next step is to configure the coprocessor framework to use your coprocessor. You can
      load the coprocessor from your HBase configuration, so that the coprocessor starts with HBase,
      or you can configure the coprocessor from the HBase shell, as a table attribute, so that it is
      loaded dynamically when the table is opened or reopened.</para>
    <section>
      <title>Load from Configuration</title>
      <para> To configure a coprocessor to be loaded when HBase starts, modify the RegionServer's
          <filename>hbase-site.xml</filename> and configure one of the following properties, based
        on the type of observer you are configuring: </para>
      <itemizedlist>
        <listitem>
          <para><code>hbase.coprocessor.region.classes</code>for RegionObservers and
            Endpoints</para>
        </listitem>
        <listitem>
          <para><code>hbase.coprocessor.wal.classes</code>for WALObservers</para>
        </listitem>
        <listitem>
          <para><code>hbase.coprocessor.master.classes</code>for MasterObservers</para>
        </listitem>
      </itemizedlist>
      <example>
        <title>Example RegionObserver Configuration</title>
        <para>In this example, one RegionObserver is configured for all the HBase tables.</para>
        <screen><![CDATA[
 <property>
    <name>hbase.coprocessor.region.classes</name>
    <value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value>
 </property>  ]]>        
        </screen>
      </example>
      <para> If multiple classes are specified for loading, the class names must be comma-separated.
        The framework attempts to load all the configured classes using the default class loader.
        Therefore, the jar file must reside on the server-side HBase classpath.</para>
      <para>Coprocessors which are loaded in this way will be active on all regions of
        all tables. These are the system coprocessor introduced earlier. The first listed
        coprocessors will be assigned the priority <literal>Coprocessor.Priority.SYSTEM</literal>.
        Each subsequent coprocessor in the list will have its priority value incremented by one
        (which reduces its priority, because priorities have the natural sort order of Integers). </para>
      <para>When calling out to registered observers, the framework executes their callbacks methods
        in the sorted order of their priority. Ties are broken arbitrarily.</para>
    </section>
    <section>
      <title>Load from the HBase Shell</title>
      <para> You can load a coprocessor on a specific table via a table attribute. The following
        example will load the <systemitem>FooRegionObserver</systemitem> observer when table
          <systemitem>t1</systemitem> is read or re-read. </para>
      <example>
        <title>Load a Coprocessor On a Table Using HBase Shell</title>
        <screen>
 hbase(main):005:0>  <userinput>alter 't1', METHOD => 'table_att', 
  'coprocessor'=>'hdfs:///foo.jar|com.foo.FooRegionObserver|1001|arg1=1,arg2=2'</userinput>
 <computeroutput>Updating all regions with the new schema...
 1/1 regions updated.
 Done.
 0 row(s) in 1.0730 seconds</computeroutput>
 hbase(main):006:0> <userinput>describe 't1'</userinput>
 <computeroutput>DESCRIPTION                                                        ENABLED                             
 {NAME => 't1', coprocessor$1 => 'hdfs:///foo.jar|com.foo.FooRegio false                               
 nObserver|1001|arg1=1,arg2=2', FAMILIES => [{NAME => 'c1', DATA_B                                     
 LOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE                                     
  => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS =>                                      
 '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZ                                     
 E => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLO                                     
 CKCACHE => 'true'}, {NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE',                                     
  BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3'                                     
 , COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647'                                     
 , KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY                                      
 => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]}                                         
 1 row(s) in 0.0190 seconds</computeroutput>
        </screen>
      </example>
      <para>The coprocessor framework will try to read the class information from the coprocessor
        table attribute value. The value contains four pieces of information which are separated by
        the <literal>|</literal> character.</para>
      <itemizedlist>
        <listitem>
          <para>File path: The jar file containing the coprocessor implementation must be in a
            location where all region servers can read it. You could copy the file onto the local
            disk on each region server, but it is recommended to store it in HDFS.</para>
        </listitem>
        <listitem>
          <para>Class name: The full class name of the coprocessor.</para>
        </listitem>
        <listitem>
          <para>Priority: An integer. The framework will determine the execution sequence of all
            configured observers registered at the same hook using priorities. This field can be
            left blank. In that case the framework will assign a default priority value.</para>
        </listitem>
        <listitem>
          <para>Arguments: This field is passed to the coprocessor implementation.</para>
        </listitem>
      </itemizedlist>
      <example>
        <title>Unload a Coprocessor From a Table Using HBase Shell</title>
        <screen>
 hbase(main):007:0> <userinput>alter 't1', METHOD => 'table_att_unset',</userinput> 
 hbase(main):008:0*   <userinput>NAME => 'coprocessor$1'</userinput>
 <computeroutput>Updating all regions with the new schema...
 1/1 regions updated.
 Done.
 0 row(s) in 1.1130 seconds</computeroutput>
 hbase(main):009:0> <userinput>describe 't1'</userinput>
 <computeroutput>DESCRIPTION                                                        ENABLED                             
 {NAME => 't1', FAMILIES => [{NAME => 'c1', DATA_BLOCK_ENCODING => false                               
  'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSION                                     
 S => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '214                                     
 7483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN                                     
 _MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true                                     
 '}, {NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER =>                                      
 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION =>                                     
  'NONE', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_C                                     
 ELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCO                                     
 DE_ON_DISK => 'true', BLOCKCACHE => 'true'}]}                                                         
 1 row(s) in 0.0180 seconds          </computeroutput>
        </screen>
      </example>
      <warning>
        <para>There is no guarantee that the framework will load a given coprocessor successfully.
          For example, the shell command neither guarantees a jar file exists at a particular
          location nor verifies whether the given class is actually contained in the jar file.
        </para>
      </warning>
    </section>
  </section>
  <section>
    <title>Check the Status of a Coprocessor</title>
    <para>To check the status of a coprocessor after it has been configured, use the
        <command>status</command> HBase Shell command.</para>
    <screen>
 hbase(main):020:0> <userinput>status 'detailed'</userinput>
 <computeroutput>version 0.92-tm-6
 0 regionsInTransition
 master coprocessors: []
 1 live servers
    localhost:52761 1328082515520
        requestsPerSecond=3, numberOfOnlineRegions=3, usedHeapMB=32, maxHeapMB=995
        -ROOT-,,0
            numberOfStores=1, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, 
 storefileIndexSizeMB=0, readRequestsCount=54, writeRequestsCount=1, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, 
 totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[]
        .META.,,1
            numberOfStores=1, numberOfStorefiles=0, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, 
 storefileIndexSizeMB=0, readRequestsCount=97, writeRequestsCount=4, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, 
 totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[]
        t1,,1328082575190.c0491168a27620ffe653ec6c04c9b4d1.
            numberOfStores=2, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, 
 storefileIndexSizeMB=0, readRequestsCount=0, writeRequestsCount=0, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, 
 totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, 
 coprocessors=[AggregateImplementation]
 0 dead servers      </computeroutput>
    </screen>
  </section>
  <section>
    <title>Status of Coprocessors in HBase</title>
    <para> Coprocessors and the coprocessor framework are evolving rapidly and work is ongoing on
      several different JIRAs. </para>
  </section>
 </chapter>