HBASE-13907 Document how to deploy a coprocessor
This commit is contained in:
parent
7a4590dfdb
commit
f8eab44dcd
|
@ -27,251 +27,209 @@
|
||||||
:icons: font
|
:icons: font
|
||||||
:experimental:
|
:experimental:
|
||||||
|
|
||||||
HBase Coprocessors are modeled after the Coprocessors which are part of Google's BigTable
|
HBase Coprocessors are modeled after Google BigTable's coprocessor implementation
|
||||||
(http://research.google.com/people/jeff/SOCC2010-keynote-slides.pdf pages 41-42.). +
|
(http://research.google.com/people/jeff/SOCC2010-keynote-slides.pdf pages 41-42.).
|
||||||
Coprocessor is a framework that provides an easy way to run your custom code directly on
|
|
||||||
Region Server.
|
The coprocessor framework provides mechanisms for running your custom code directly on
|
||||||
The information in this chapter is primarily sourced and heavily reused from:
|
the RegionServers managing your data. Efforts are ongoing to bridge gaps between HBase's
|
||||||
|
implementation and BigTable's architecture. For more information see
|
||||||
|
link:https://issues.apache.org/jira/browse/HBASE-4047[HBASE-4047].
|
||||||
|
|
||||||
|
The information in this chapter is primarily sourced and heavily reused from the following
|
||||||
|
resources:
|
||||||
|
|
||||||
. Mingjie Lai's blog post
|
. Mingjie Lai's blog post
|
||||||
link:https://blogs.apache.org/hbase/entry/coprocessor_introduction[Coprocessor Introduction].
|
link:https://blogs.apache.org/hbase/entry/coprocessor_introduction[Coprocessor Introduction].
|
||||||
. Gaurav Bhardwaj's blog post
|
. Gaurav Bhardwaj's blog post
|
||||||
link:http://www.3pillarglobal.com/insights/hbase-coprocessors[The How To Of HBase Coprocessors].
|
link:http://www.3pillarglobal.com/insights/hbase-coprocessors[The How To Of HBase Coprocessors].
|
||||||
|
|
||||||
|
[WARNING]
|
||||||
|
.Use Coprocessors At Your Own Risk
|
||||||
== Coprocessor Framework
|
====
|
||||||
|
Coprocessors are an advanced feature of HBase and are intended to be used by system
|
||||||
When working with any data store (like RDBMS or HBase) you fetch the data (in case of RDBMS you
|
developers only. Because coprocessor code runs directly on the RegionServer and has
|
||||||
might use SQL query and in case of HBase you use either Get or Scan). To fetch only relevant data
|
direct access to your data, they introduce the risk of data corruption, man-in-the-middle
|
||||||
you filter it (for RDBMS you put conditions in 'WHERE' predicate and in HBase you use
|
attacks, or other malicious data access. Currently, there is no mechanism to prevent
|
||||||
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.html[Filter]).
|
data corruption by coprocessors, though work is underway on
|
||||||
After fetching the desired data, you perform your business computation on the data.
|
|
||||||
This scenario is close to ideal for "small data", where few thousand rows and a bunch of columns
|
|
||||||
are returned from the data store. Now imagine a scenario where there are billions of rows and
|
|
||||||
millions of columns and you want to perform some computation which requires all the data, like
|
|
||||||
calculating average or sum. Even if you are interested in just few columns, you still have to
|
|
||||||
fetch all the rows. There are a few drawbacks in this approach as described below:
|
|
||||||
|
|
||||||
. In this approach the data transfer (from data store to client side) will become the bottleneck,
|
|
||||||
and the time required to complete the operation is limited by the rate at which data transfer
|
|
||||||
takes place.
|
|
||||||
. It's not always possible to hold so much data in memory and perform computation.
|
|
||||||
. Bandwidth is one of the most precious resources in any data center. Operations like this may
|
|
||||||
saturate your data center’s bandwidth and will severely impact the performance of your cluster.
|
|
||||||
. Your client code is becoming thick as you are maintaining the code for calculating average or
|
|
||||||
summation on client side. Not a major drawback when talking of severe issues like
|
|
||||||
performance/bandwidth but still worth giving consideration.
|
|
||||||
|
|
||||||
In a scenario like this it's better to move the computation (i.e. user's custom code) to the data
|
|
||||||
itself (Region Server). Coprocessor helps you achieve this but you can do more than that.
|
|
||||||
There is another advantage that your code runs in parallel (i.e. on all Regions).
|
|
||||||
To give an idea of Coprocessor's capabilities, different people give different analogies.
|
|
||||||
The three most famous analogies for Coprocessor are:
|
|
||||||
[[cp_analogies]]
|
|
||||||
Triggers and Stored Procedure:: This is the most common analogy for Coprocessor. Observer
|
|
||||||
Coprocessor is compared to triggers because like triggers they execute your custom code when
|
|
||||||
certain event occurs (like Get or Put etc.). Similarly Endpoints Coprocessor is compared to the
|
|
||||||
stored procedures and you can perform custom computation on data directly inside the region server.
|
|
||||||
|
|
||||||
MapReduce:: As in MapReduce you move the computation to the data in the same way. Coprocessor
|
|
||||||
executes your custom computation directly on Region Servers, i.e. where data resides. That's why
|
|
||||||
some people compare Coprocessor to a small MapReduce jobs.
|
|
||||||
|
|
||||||
AOP:: Some people compare it to _Aspect Oriented Programming_ (AOP). As in AOP, you apply advice
|
|
||||||
(on occurrence of specific event) by intercepting the request and then running some custom code
|
|
||||||
(probably cross-cutting concerns) and then forwarding the request on its path as if nothing
|
|
||||||
happened (or even return it back). Similarly in Coprocessor you have this facility of intercepting
|
|
||||||
the request and running custom code and then forwarding it on its path (or returning it).
|
|
||||||
|
|
||||||
|
|
||||||
Although Coprocessor derives its roots from Google's Bigtable but it deviates from it largely in
|
|
||||||
its design. Currently there are efforts going on to bridge this gap. For more information see
|
|
||||||
link:https://issues.apache.org/jira/browse/HBASE-4047[HBASE-4047].
|
link:https://issues.apache.org/jira/browse/HBASE-4047[HBASE-4047].
|
||||||
|
+
|
||||||
|
In addition, there is no resource isolation, so a well-intentioned but misbehaving
|
||||||
|
coprocessor can severely degrade cluster performance and stability.
|
||||||
|
====
|
||||||
|
|
||||||
In HBase, to implement a Coprocessor certain steps must be followed as described below:
|
== Coprocessor Overview
|
||||||
|
|
||||||
. Either your class should extend one of the Coprocessor classes (like
|
In HBase, you fetch data using a `Get` or `Scan`, whereas in an RDBMS you use a SQL
|
||||||
// Below URL is more than 100 characters long.
|
query. In order to fetch only the relevant data, you filter it using a HBase
|
||||||
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.html[BaseRegionObserver]
|
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.html[Filter]
|
||||||
) or it should implement Coprocessor interfaces (like
|
, whereas in an RDBMS you use a `WHERE` predicate.
|
||||||
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/Coprocessor.html[Coprocessor],
|
|
||||||
// Below URL is more than 100 characters long.
|
|
||||||
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/CoprocessorService.html[CoprocessorService]).
|
|
||||||
|
|
||||||
. Load the Coprocessor: Currently there are two ways to load the Coprocessor. +
|
After fetching the data, you perform computations on it. This paradigm works well
|
||||||
Static:: Loading from configuration
|
for "small data" with a few thousand rows and several columns. However, when you scale
|
||||||
Dynamic:: Loading via 'hbase shell' or via Java code using HTableDescriptor class). +
|
to billions of rows and millions of columns, moving large amounts of data across your
|
||||||
For more details see <<cp_loading,Loading Coprocessors>>.
|
network will create bottlenecks at the network layer, and the client needs to be powerful
|
||||||
|
enough and have enough memory to handle the large amounts of data and the computations.
|
||||||
|
In addition, the client code can grow large and complex.
|
||||||
|
|
||||||
. Finally your client-side code to call the Coprocessor. This is the easiest step, as HBase
|
In this scenario, coprocessors might make sense. You can put the business computation
|
||||||
handles the Coprocessor transparently and you don't have to do much to call the Coprocessor.
|
code into a coprocessor which runs on the RegionServer, in the same location as the
|
||||||
|
data, and returns the result to the client.
|
||||||
|
|
||||||
|
This is only one scenario where using coprocessors can provide benefit. Following
|
||||||
|
are some analogies which may help to explain some of the benefits of coprocessors.
|
||||||
|
|
||||||
|
[[cp_analogies]]
|
||||||
|
=== Coprocessor Analogies
|
||||||
|
|
||||||
|
Triggers and Stored Procedure::
|
||||||
|
An Observer coprocessor is similar to a trigger in a RDBMS in that it executes
|
||||||
|
your code either before or after a specific event (such as a `Get` or `Put`)
|
||||||
|
occurs. An endpoint coprocessor is similar to a stored procedure in a RDBMS
|
||||||
|
because it allows you to perform custom computations on the data on the
|
||||||
|
RegionServer itself, rather than on the client.
|
||||||
|
|
||||||
|
MapReduce::
|
||||||
|
MapReduce operates on the principle of moving the computation to the location of
|
||||||
|
the data. Coprocessors operate on the same principal.
|
||||||
|
|
||||||
|
AOP::
|
||||||
|
If you are familiar with Aspect Oriented Programming (AOP), you can think of a coprocessor
|
||||||
|
as applying advice by intercepting a request and then running some custom code,
|
||||||
|
before passing the request on to its final destination (or even changing the destination).
|
||||||
|
|
||||||
|
|
||||||
|
=== Coprocessor Implementation Overview
|
||||||
|
|
||||||
|
. Either your class should extend one of the Coprocessor classes, such as
|
||||||
|
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.html[BaseRegionObserver],
|
||||||
|
or it should implement the link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/Coprocessor.html[Coprocessor]
|
||||||
|
or
|
||||||
|
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/CoprocessorService.html[CoprocessorService]
|
||||||
|
interface.
|
||||||
|
|
||||||
|
. Load the coprocessor, either statically (from the configuration) or dynamically,
|
||||||
|
using HBase Shell. For more details see <<cp_loading,Loading Coprocessors>>.
|
||||||
|
|
||||||
|
. Call the coprocessor from your client-side code. HBase handles the coprocessor
|
||||||
|
trapsparently.
|
||||||
|
|
||||||
The framework API is provided in the
|
The framework API is provided in the
|
||||||
// Below URL is more than 100 characters long.
|
|
||||||
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/package-summary.html[coprocessor]
|
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/package-summary.html[coprocessor]
|
||||||
package. +
|
package.
|
||||||
Coprocessors are not designed to be used by the end users but by developers. Coprocessors are
|
|
||||||
executed directly on region server; therefore a faulty/malicious code can bring your region server
|
|
||||||
down. Currently there is no mechanism to prevent this, but there are efforts going on for this.
|
|
||||||
For more, see link:https://issues.apache.org/jira/browse/HBASE-4047[HBASE-4047]. +
|
|
||||||
Two different types of Coprocessors are provided by the framework, based on their functionality.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
== Types of Coprocessors
|
== Types of Coprocessors
|
||||||
|
|
||||||
Coprocessor can be broadly divided into two categories: Observer and Endpoint.
|
=== Observer Coprocessors
|
||||||
|
|
||||||
=== Observer
|
Observer coprocessors are triggered either before or after a specific event occurs.
|
||||||
Observer Coprocessor are easy to understand. People coming from RDBMS background can compare them
|
Observers that happen before an event use methods that start with a `pre` prefix,
|
||||||
to the triggers available in relational databases. Folks coming from programming background can
|
such as link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#prePut%28org.apache.hadoop.hbase.coprocessor.ObserverContext,%20org.apache.hadoop.hbase.client.Put,%20org.apache.hadoop.hbase.regionserver.wal.WALEdit,%20org.apache.hadoop.hbase.client.Durability%29[`prePut`]. Observers that happen just after an event override methods that start
|
||||||
visualize it like advice (before and after only) available in AOP (Aspect Oriented Programming).
|
with a `post` prefix, such as link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#postPut%28org.apache.hadoop.hbase.coprocessor.ObserverContext,%20org.apache.hadoop.hbase.client.Put,%20org.apache.hadoop.hbase.regionserver.wal.WALEdit,%20org.apache.hadoop.hbase.client.Durability%29[`postPut`].
|
||||||
See <<cp_analogies, Coprocessor Analogy>> +
|
|
||||||
Coprocessors allows you to hook your custom code in two places during the life cycle of an event. +
|
|
||||||
First is just _before_ the occurrence of the event (just like 'before' advice in AOP or triggers
|
|
||||||
like 'before update'). All methods providing this kind feature will start with the prefix `pre`. +
|
|
||||||
For example if you want your custom code to get executed just before the `Put` operation, you can
|
|
||||||
use the override the
|
|
||||||
// Below URL is more than 100 characters long.
|
|
||||||
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#prePut%28org.apache.hadoop.hbase.coprocessor.ObserverContext,%20org.apache.hadoop.hbase.client.Put,%20org.apache.hadoop.hbase.regionserver.wal.WALEdit,%20org.apache.hadoop.hbase.client.Durability%29[`prePut`]
|
|
||||||
method of
|
|
||||||
// Below URL is more than 100 characters long.
|
|
||||||
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html[RegionCoprocessor].
|
|
||||||
This method has following signature:
|
|
||||||
[source,java]
|
|
||||||
----
|
|
||||||
public void prePut (final ObserverContext e, final Put put, final WALEdit edit,final Durability
|
|
||||||
durability) throws IOException;
|
|
||||||
----
|
|
||||||
|
|
||||||
Secondly, the Observer Coprocessor also provides hooks for your code to get executed just _after_
|
|
||||||
the occurrence of the event (similar to after advice in AOP terminology or 'after update' triggers
|
|
||||||
). The methods giving this functionality will start with the prefix `post`. For example, if you
|
|
||||||
want your code to be executed after the 'Put' operation, you should consider overriding
|
|
||||||
// Below URL is more than 100 characters long.
|
|
||||||
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#postPut%28org.apache.hadoop.hbase.coprocessor.ObserverContext,%20org.apache.hadoop.hbase.client.Put,%20org.apache.hadoop.hbase.regionserver.wal.WALEdit,%20org.apache.hadoop.hbase.client.Durability%29[`postPut`]
|
|
||||||
method of
|
|
||||||
// Below URL is more than 100 characters long.
|
|
||||||
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html[RegionCoprocessor]:
|
|
||||||
[source,java]
|
|
||||||
----
|
|
||||||
public void postPut(final ObserverContext e, final Put put, final WALEdit edit, final Durability
|
|
||||||
durability) throws IOException;
|
|
||||||
----
|
|
||||||
|
|
||||||
In short, the following conventions are generally followed: +
|
==== Use Cases for Observer Coprocessors
|
||||||
Override _preXXX()_ method if you want your code to be executed just before the occurrence of the
|
Security::
|
||||||
event. +
|
Before performing a `Get` or `Put` operation, you can check for permission using
|
||||||
Override _postXXX()_ method if you want your code to be executed just after the occurrence of the
|
`preGet` or `prePut` methods.
|
||||||
event. +
|
|
||||||
|
|
||||||
.Use Cases for Observer Coprocessors:
|
Referential Integrity::
|
||||||
Few use cases of the Observer Coprocessor are:
|
HBase does not directly support the RDBMS concept of refential integrity, also known
|
||||||
|
as foreign keys. You can use a coprocessor to enforce such integrity. For instance,
|
||||||
|
if you have a business rule that every insert to the `users` table must be followed
|
||||||
|
by a corresponding entry in the `user_daily_attendance` table, you could implement
|
||||||
|
a coprocessor to use the `prePut` method on `user` to insert a record into `user_daily_attendance`.
|
||||||
|
|
||||||
. *Security*: Before performing any operation (like 'Get', 'Put') you can check for permission in
|
Secondary Indexes::
|
||||||
the 'preXXX' methods.
|
You can use a coprocessor to maintain secondary indexes. For more information, see
|
||||||
|
link:http://wiki.apache.org/hadoop/Hbase/SecondaryIndexing[SecondaryIndexing].
|
||||||
. *Referential Integrity*: Unlike traditional RDBMS, HBase doesn't have the concept of referential
|
|
||||||
integrity (foreign key). Suppose for example you have a requirement that whenever you insert a
|
|
||||||
record in 'users' table, a corresponding entry should also be created in 'user_daily_attendance'
|
|
||||||
table. One way you could solve this is by using two 'Put' one for each table, this way you are
|
|
||||||
throwing the responsibility (of the referential integrity) to the user. A better way is to use
|
|
||||||
Coprocessor and overriding 'postPut' method in which you write the code to insert the record in
|
|
||||||
'user_daily_attendance' table. This way client code is more lean and clean.
|
|
||||||
|
|
||||||
. *Secondary Index*: Coprocessor can be used to maintain secondary indexes. For more information
|
|
||||||
see link:http://wiki.apache.org/hadoop/Hbase/SecondaryIndexing[SecondaryIndexing].
|
|
||||||
|
|
||||||
|
|
||||||
==== Types of Observer Coprocessor
|
==== Types of Observer Coprocessor
|
||||||
|
|
||||||
Observer Coprocessor comes in following flavors:
|
RegionObserver::
|
||||||
|
A RegionObserver coprocessor allows you to observe events on a region, such as `Get`
|
||||||
|
and `Put` operations. See
|
||||||
|
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html[RegionObserver].
|
||||||
|
Consider overriding the convenience class
|
||||||
|
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.html[BaseRegionObserver],
|
||||||
|
which implements the `RegionObserver` interface and will not break if new methods are added.
|
||||||
|
|
||||||
. *RegionObserver*: This Coprocessor provides the facility to hook your code when the events on
|
RegionServerObserver::
|
||||||
region are triggered. Most common example include 'preGet' and 'postGet' for 'Get' operation and
|
A RegionServerObserver allows you to observe events related to the RegionServer's
|
||||||
'prePut' and 'postPut' for 'Put' operation. For exhaustive list of supported methods (events) see
|
operation, such as starting, stopping, or performing merges, commits, or rollbacks.
|
||||||
// Below URL is more than 100 characters long.
|
See
|
||||||
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html[RegionObserver].
|
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionServerObserver.html[RegionServerObserver].
|
||||||
|
Consider overriding the convenience class
|
||||||
|
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/BaseMasterRegionServerObserver.html[BaseMasterRegionServerObserver]
|
||||||
|
which implements both `MasterObserver` and `RegionServerObserver` interfaces and
|
||||||
|
will not break if new methods are added.
|
||||||
|
|
||||||
. *Region Server Observer*: Provides hook for the events related to the RegionServer, such as
|
MasterOvserver::
|
||||||
stopping the RegionServer and performing operations before or after merges, commits, or rollbacks.
|
A MasterObserver allows you to observe events related to the HBase Master, such
|
||||||
For more details please refer
|
as table creation, deletion, or schema modification. See
|
||||||
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionServerObserver.html[RegionServerObserver].
|
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/MasterObserver.html[MasterObserver].
|
||||||
|
Consider overriding the convenience class
|
||||||
|
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/BaseMasterRegionServerObserver.html[BaseMasterRegionServerObserver],
|
||||||
|
which implements both `MasterObserver` and `RegionServerObserver` interfaces and
|
||||||
|
will not break if new methods are added.
|
||||||
|
|
||||||
. *Master Observer*: This observer provides hooks for DDL like operation, such as create, delete,
|
WalObserver::
|
||||||
modify table. For entire list of available methods see
|
A WalObserver allows you to observe events related to writes to the Write-Ahead
|
||||||
// Below URL is more than 100 characters long.
|
Log (WAL). See
|
||||||
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/MasterObserver.html[MasterObserver].
|
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/WALObserver.html[WALObserver].
|
||||||
|
Consider overriding the convenience class
|
||||||
|
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/BaseWALObserver.html[BaseWALObserver],
|
||||||
|
which implements the `WalObserver` interface and will not break if new methods are added.
|
||||||
|
|
||||||
. *WAL Observer*: Provides hooks for WAL (Write-Ahead-Log) related operation. It has only two
|
<<cp_example,Examples>> provides working examples of observer coprocessors.
|
||||||
method 'preWALWrite()' and 'postWALWrite()'. For more details see
|
|
||||||
// Below URL is more than 100 characters long.
|
|
||||||
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/WALObserver.html[WALObserver].
|
|
||||||
|
|
||||||
For example see <<cp_example,Examples>>
|
|
||||||
|
|
||||||
|
|
||||||
=== Endpoint Coprocessor
|
=== Endpoint Coprocessor
|
||||||
|
|
||||||
Endpoint Coprocessor can be compared to stored procedure found in RDBMS.
|
Endpoint processors allow you to perform computation at the location of the data.
|
||||||
See <<cp_analogies, Coprocessor Analogy>>. They help in performing computation which is not
|
See <<cp_analogies, Coprocessor Analogy>>. An example is the need to calculate a running
|
||||||
possible either through Observer Coprocessor or otherwise. For example, calculating average or
|
average or summation for an entire table which spans hundreds of regions.
|
||||||
summation over the entire table that spans across multiple regions. They do so by providing a hook
|
|
||||||
for your custom code and then running it across all regions. +
|
In contract to observer coprocessors, where your code is run transparently, endpoint
|
||||||
With Endpoints Coprocessor you can create your own dynamic RPC protocol and thus can provide
|
coprocessors must be explicitly invoked using the
|
||||||
communication between client and region server, hence enabling you to run your custom code on
|
|
||||||
region server (on each region of a table). +
|
|
||||||
Unlike observer Coprocessor (where your custom code is
|
|
||||||
executed transparently when events like 'Get' operation occurs), in Endpoint Coprocessor you have
|
|
||||||
to explicitly invoke the Coprocessor by using the
|
|
||||||
// Below URL is more than 100 characters long.
|
|
||||||
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/Table.html#coprocessorService%28java.lang.Class,%20byte%5B%5D,%20byte%5B%5D,%20org.apache.hadoop.hbase.client.coprocessor.Batch.Call%29[CoprocessorService()]
|
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/Table.html#coprocessorService%28java.lang.Class,%20byte%5B%5D,%20byte%5B%5D,%20org.apache.hadoop.hbase.client.coprocessor.Batch.Call%29[CoprocessorService()]
|
||||||
method available in
|
method available in
|
||||||
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/Table.html[Table]
|
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/Table.html[Table],
|
||||||
(or
|
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/HTableInterface.html[HTableInterface],
|
||||||
// Below URL is more than 100 characters long.
|
|
||||||
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/HTableInterface.html[HTableInterface]
|
|
||||||
or
|
or
|
||||||
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/HTable.html[HTable]).
|
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/HTable.html[HTable].
|
||||||
|
|
||||||
From version 0.96, implementing Endpoint Coprocessor is not straight forward. Now it is done with
|
Starting with HBase 0.96, endpoint coprocessors are implemented using Google Protocol
|
||||||
the help of Google's Protocol Buffer. For more details on Protocol Buffer, please see
|
Buffers (protobuf). For more details on protobuf, see Google's
|
||||||
link:https://developers.google.com/protocol-buffers/docs/proto[Protocol Buffer Guide].
|
link:https://developers.google.com/protocol-buffers/docs/proto[Protocol Buffer Guide].
|
||||||
Endpoints Coprocessor written in version 0.94 are not compatible with version 0.96 or later
|
Endpoints Coprocessor written in version 0.94 are not compatible with version 0.96 or later.
|
||||||
(for more details, see
|
See
|
||||||
link:https://issues.apache.org/jira/browse/HBASE-5448[HBASE-5448]),
|
link:https://issues.apache.org/jira/browse/HBASE-5448[HBASE-5448]). To upgrade your
|
||||||
so if you are upgrading your HBase cluster from version 0.94 (or before) to 0.96 (or later) you
|
HBase cluster from 0.94 or earlier to 0.96 or later, you need to reimplement your
|
||||||
have to rewrite your Endpoint coprocessor.
|
coprocessor.
|
||||||
|
|
||||||
For example see <<cp_example,Examples>>
|
|
||||||
|
|
||||||
|
<<cp_example,Examples>> provides working examples of endpoint coprocessors.
|
||||||
|
|
||||||
[[cp_loading]]
|
[[cp_loading]]
|
||||||
== Loading Coprocessors
|
== Loading Coprocessors
|
||||||
|
|
||||||
_Loading of Coprocessor refers to the process of making your custom Coprocessor implementation
|
To make your coprocessor available to HBase, it must be _loaded_, either statically
|
||||||
available to HBase, so that when a request comes in or an event takes place the desired
|
(through the HBase configuration) or dynamically (using HBase Shell or the Java API).
|
||||||
functionality implemented in your custom code gets executed. +
|
|
||||||
Coprocessor can be loaded broadly in two ways. One is static (loading through configuration files)
|
|
||||||
and the other one is dynamic loading (using hbase shell or java code).
|
|
||||||
|
|
||||||
=== Static Loading
|
=== Static Loading
|
||||||
Static loading means that your Coprocessor will take effect only when you restart your HBase and
|
|
||||||
there is a reason for it. In this you make changes 'hbase-site.xml' and therefore have to restart
|
|
||||||
HBase for your changes to take place. +
|
|
||||||
Following are the steps for loading Coprocessor statically.
|
|
||||||
|
|
||||||
. Define the Coprocessor in hbase-site.xml: Define a <property> element which consist of two
|
Follow these steps to statically load your coprocessor. Keep in mind that you must
|
||||||
sub elements <name> and <value> respectively.
|
restart HBase to unload a coprocessor that has been loaded statically.
|
||||||
|
|
||||||
|
. Define the Coprocessor in _hbase-site.xml_, with a <property> element with a <name>
|
||||||
|
and a <value> sub-element. The <name> should be one of the following:
|
||||||
+
|
+
|
||||||
.. <name> can have one of the following values:
|
- `hbase.coprocessor.region.classes` for RegionObservers and Endpoints.
|
||||||
|
- `hbase.coprocessor.wal.classes` for WALObservers.
|
||||||
|
- `hbase.coprocessor.master.classes` for MasterObservers.
|
||||||
+
|
+
|
||||||
... 'hbase.coprocessor.region.classes' for RegionObservers and Endpoints.
|
<value> must contain the fully-qualified class name of your coprocessor's implementation
|
||||||
... 'hbase.coprocessor.wal.classes' for WALObservers.
|
class.
|
||||||
... 'hbase.coprocessor.master.classes' for MasterObservers.
|
|
||||||
.. <value> must contain the fully qualified class name of your class implementing the Coprocessor.
|
|
||||||
+
|
+
|
||||||
For example to load a Coprocessor (implemented in class SumEndPoint.java) you have to create
|
For example to load a Coprocessor (implemented in class SumEndPoint.java) you have to create
|
||||||
following entry in RegionServer's 'hbase-site.xml' file (generally located under 'conf' directory):
|
following entry in RegionServer's 'hbase-site.xml' file (generally located under 'conf' directory):
|
||||||
|
@ -283,6 +241,7 @@ following entry in RegionServer's 'hbase-site.xml' file (generally located under
|
||||||
<value>org.myname.hbase.coprocessor.endpoint.SumEndPoint</value>
|
<value>org.myname.hbase.coprocessor.endpoint.SumEndPoint</value>
|
||||||
</property>
|
</property>
|
||||||
----
|
----
|
||||||
|
+
|
||||||
If multiple classes are specified for loading, the class names must be comma-separated.
|
If multiple classes are specified for loading, the class names must be comma-separated.
|
||||||
The framework attempts to load all the configured classes using the default class loader.
|
The framework attempts to load all the configured classes using the default class loader.
|
||||||
Therefore, the jar file must reside on the server-side HBase classpath.
|
Therefore, the jar file must reside on the server-side HBase classpath.
|
||||||
|
@ -297,34 +256,32 @@ When calling out to registered observers, the framework executes their callbacks
|
||||||
sorted order of their priority. +
|
sorted order of their priority. +
|
||||||
Ties are broken arbitrarily.
|
Ties are broken arbitrarily.
|
||||||
|
|
||||||
. Put your code on classpath of HBase: There are various ways to do so, like adding jars on
|
. Put your code HBase's classpath. One easy way to do this is to drop the jar
|
||||||
classpath etc. One easy way to do this is to drop the jar (containing you code and all the
|
(containing you code and all the dependencies) into the `lib/` directory in the
|
||||||
dependencies) in 'lib' folder of the HBase installation.
|
HBase installation.
|
||||||
|
|
||||||
. Restart the HBase.
|
. Restart HBase.
|
||||||
|
|
||||||
|
|
||||||
==== Unloading Static Coprocessor
|
=== Static Unloading
|
||||||
Unloading static Coprocessor is easy. Following are the steps:
|
|
||||||
|
|
||||||
. Delete the Coprocessor's entry from the 'hbase-site.xml' i.e. remove the <property> tag.
|
. Delete the coprocessor's <property> element, including sub-elements, from `hbase-site.xml`.
|
||||||
|
. Restart HBase.
|
||||||
|
. Optionally, remove the coprocessor's JAR file from the classpath or HBase's `lib/`
|
||||||
|
directory.
|
||||||
|
|
||||||
. Restart the Hbase.
|
|
||||||
|
|
||||||
. Optionally remove the Coprocessor jar file from the classpath (or from the lib directory if you
|
|
||||||
copied it over there). Removing the coprocessor JARs from HBase’s classpath is a good practice.
|
|
||||||
|
|
||||||
=== Dynamic Loading
|
=== Dynamic Loading
|
||||||
Dynamic loading refers to the process of loading Coprocessor without restarting HBase. This may
|
|
||||||
sound better than the static loading (and in some scenarios it may) but there is a caveat, dynamic
|
You can also load a coprocessor dynamically, without restarting HBase. This may seem
|
||||||
loaded Coprocessor applies to the table only for which it was loaded while same is not true for
|
preferable to static loading, but dynamically loaded coprocessors are loaded on a
|
||||||
static loading as it applies to all the tables. Due to this difference sometimes dynamically
|
per-table basis, and are only available to the table for which they were loaded. For
|
||||||
loaded Coprocessor are also called *Table Coprocessor* (as they applies only to a single table)
|
this reason, dynamically loaded tables are sometimes called *Table Coprocessor*.
|
||||||
while statically loaded Coprocessor are called *System Coprocessor* (as they applies to all the
|
|
||||||
tables). +
|
In addition, dynamically loading a coprocessor acts as a schema change on the table,
|
||||||
To dynamically load the Coprocessor you have to take the table offline hence during this time you
|
and the table must be taken offline to load the coprocessor.
|
||||||
won't be able to process any request involving this table. +
|
|
||||||
There are three ways to dynamically load Coprocessor as shown below:
|
There are three ways to dynamically load Coprocessor.
|
||||||
|
|
||||||
[NOTE]
|
[NOTE]
|
||||||
.Assumptions
|
.Assumptions
|
||||||
|
@ -332,26 +289,25 @@ There are three ways to dynamically load Coprocessor as shown below:
|
||||||
The below mentioned instructions makes the following assumptions:
|
The below mentioned instructions makes the following assumptions:
|
||||||
|
|
||||||
* A JAR called `coprocessor.jar` contains the Coprocessor implementation along with all of its
|
* A JAR called `coprocessor.jar` contains the Coprocessor implementation along with all of its
|
||||||
dependencies if any.
|
dependencies.
|
||||||
* The JAR is available in HDFS in some location like
|
* The JAR is available in HDFS in some location like
|
||||||
`hdfs://<namenode>:<port>/user/<hadoop-user>/coprocessor.jar`.
|
`hdfs://<namenode>:<port>/user/<hadoop-user>/coprocessor.jar`.
|
||||||
====
|
====
|
||||||
|
|
||||||
. *Using Shell*: You can load the Coprocessor using the HBase shell as follows:
|
==== Using HBase Shell
|
||||||
.. Disable Table: Take table offline by disabling it. Suppose if the table name is 'users', then
|
|
||||||
to disable it enter following command:
|
. Disable the table using HBase Shell:
|
||||||
+
|
+
|
||||||
[source]
|
[source]
|
||||||
----
|
----
|
||||||
hbase(main):001:0> disable 'users'
|
hbase> disable 'users'
|
||||||
----
|
----
|
||||||
|
|
||||||
.. Load the Coprocessor: The Coprocessor jar should be on HDFS and should be accessible to HBase,
|
. Load the Coprocessor, using a command like the following:
|
||||||
to load the Coprocessor use following command:
|
|
||||||
+
|
+
|
||||||
[source]
|
[source]
|
||||||
----
|
----
|
||||||
hbase(main):002:0> alter 'users', METHOD => 'table_att', 'Coprocessor'=>'hdfs://<namenode>:<port>/
|
hbase alter 'users', METHOD => 'table_att', 'Coprocessor'=>'hdfs://<namenode>:<port>/
|
||||||
user/<hadoop-user>/coprocessor.jar| org.myname.hbase.Coprocessor.RegionObserverExample|1073741823|
|
user/<hadoop-user>/coprocessor.jar| org.myname.hbase.Coprocessor.RegionObserverExample|1073741823|
|
||||||
arg1=1,arg2=2'
|
arg1=1,arg2=2'
|
||||||
----
|
----
|
||||||
|
@ -370,30 +326,25 @@ observers registered at the same hook using priorities. This field can be left b
|
||||||
case the framework will assign a default priority value.
|
case the framework will assign a default priority value.
|
||||||
* Arguments (Optional): This field is passed to the Coprocessor implementation. This is optional.
|
* Arguments (Optional): This field is passed to the Coprocessor implementation. This is optional.
|
||||||
|
|
||||||
.. Enable the table: To enable table type following command:
|
. Enable the table.
|
||||||
+
|
+
|
||||||
----
|
----
|
||||||
hbase(main):003:0> enable 'users'
|
hbase(main):003:0> enable 'users'
|
||||||
----
|
----
|
||||||
.. Verification: This is optional but generally good practice to see if your Coprocessor is
|
|
||||||
loaded successfully. Enter following command:
|
. Verify that the coprocessor loaded:
|
||||||
+
|
+
|
||||||
----
|
----
|
||||||
hbase(main):04:0> describe 'users'
|
hbase(main):04:0> describe 'users'
|
||||||
----
|
----
|
||||||
+
|
+
|
||||||
You must see some output like this:
|
The coprocessor should be listed in the `TABLE_ATTRIBUTES`.
|
||||||
+
|
|
||||||
----
|
|
||||||
DESCRIPTION ENABLED
|
|
||||||
'users', {TABLE_ATTRIBUTES => {coprocessor$1 => true 'hdfs://<namenode>:<port>/user/<hadoop-user>/
|
|
||||||
coprocessor.jar| org.myname.hbase.Coprocessor.RegionObserverExample|1073741823|'}, {NAME =>
|
|
||||||
'personalDet'.....
|
|
||||||
----
|
|
||||||
|
|
||||||
|
==== Using the Java API (all HBase versions)
|
||||||
|
|
||||||
|
The following Java code shows how to use the `setValue()` method of `HTableDescriptor`
|
||||||
|
to load a coprocessor on the `users` table.
|
||||||
|
|
||||||
. *Using setValue()* method of HTableDescriptor: This is done entirely in Java as follows:
|
|
||||||
+
|
|
||||||
[source,java]
|
[source,java]
|
||||||
----
|
----
|
||||||
TableName tableName = TableName.valueOf("users");
|
TableName tableName = TableName.valueOf("users");
|
||||||
|
@ -416,9 +367,11 @@ admin.modifyTable(tableName, hTableDescriptor);
|
||||||
admin.enableTable(tableName);
|
admin.enableTable(tableName);
|
||||||
----
|
----
|
||||||
|
|
||||||
. *Using addCoprocessor()* method of HTableDescriptor: This method is available from 0.96 version
|
==== Using the Java API (HBase 0.96+ only)
|
||||||
onwards.
|
|
||||||
+
|
In HBase 0.96 and newer, the `addCoprocessor()` method of `HTableDescriptor` provides
|
||||||
|
an easier way to load a coprocessor dynamically.
|
||||||
|
|
||||||
[source,java]
|
[source,java]
|
||||||
----
|
----
|
||||||
TableName tableName = TableName.valueOf("users");
|
TableName tableName = TableName.valueOf("users");
|
||||||
|
@ -439,26 +392,42 @@ admin.modifyTable(tableName, hTableDescriptor);
|
||||||
admin.enableTable(tableName);
|
admin.enableTable(tableName);
|
||||||
----
|
----
|
||||||
|
|
||||||
====
|
|
||||||
WARNING: There is no guarantee that the framework will load a given Coprocessor successfully.
|
WARNING: There is no guarantee that the framework will load a given Coprocessor successfully.
|
||||||
For example, the shell command neither guarantees a jar file exists at a particular location nor
|
For example, the shell command neither guarantees a jar file exists at a particular location nor
|
||||||
verifies whether the given class is actually contained in the jar file.
|
verifies whether the given class is actually contained in the jar file.
|
||||||
====
|
|
||||||
|
|
||||||
|
|
||||||
==== Unloading Dynamic Coprocessor
|
=== Dynamic Unloading
|
||||||
. Using shell: Run following command from HBase shell to remove Coprocessor from a table.
|
|
||||||
|
==== Using HBase Shell
|
||||||
|
|
||||||
|
. Disable the table.
|
||||||
+
|
+
|
||||||
[source]
|
[source]
|
||||||
----
|
----
|
||||||
hbase(main):003:0> alter 'users', METHOD => 'table_att_unset',
|
hbase> disable 'users'
|
||||||
hbase(main):004:0* NAME => 'coprocessor$1'
|
|
||||||
----
|
----
|
||||||
|
|
||||||
. Using HTableDescriptor: Simply reload the table definition _without_ setting the value of
|
. Alter the table to remove the coprocessor.
|
||||||
Coprocessor either in setValue() or addCoprocessor() methods. This will remove the Coprocessor
|
|
||||||
attached to this table, if any. For example:
|
|
||||||
+
|
+
|
||||||
|
[source]
|
||||||
|
----
|
||||||
|
hbase> alter 'users', METHOD => 'table_att_unset', NAME => 'coprocessor$1'
|
||||||
|
----
|
||||||
|
|
||||||
|
. Enable the table.
|
||||||
|
+
|
||||||
|
[source]
|
||||||
|
----
|
||||||
|
hbase> enable 'users'
|
||||||
|
----
|
||||||
|
|
||||||
|
==== Using the Java API
|
||||||
|
|
||||||
|
Reload the table definition without setting the value of the coprocessor either by
|
||||||
|
using `setValue()` or `addCoprocessor()` methods. This will remove any coprocessor
|
||||||
|
attached to the table.
|
||||||
|
|
||||||
[source,java]
|
[source,java]
|
||||||
----
|
----
|
||||||
TableName tableName = TableName.valueOf("users");
|
TableName tableName = TableName.valueOf("users");
|
||||||
|
@ -477,26 +446,23 @@ hTableDescriptor.addFamily(columnFamily2);
|
||||||
admin.modifyTable(tableName, hTableDescriptor);
|
admin.modifyTable(tableName, hTableDescriptor);
|
||||||
admin.enableTable(tableName);
|
admin.enableTable(tableName);
|
||||||
----
|
----
|
||||||
+
|
|
||||||
Optionally you can also use removeCoprocessor() method of HTableDescriptor class.
|
|
||||||
|
|
||||||
|
In HBase 0.96 and newer, you can instead use the `removeCoprocessor()` method of the
|
||||||
|
`HTableDescriptor` class.
|
||||||
|
|
||||||
|
|
||||||
[[cp_example]]
|
[[cp_example]]
|
||||||
== Examples
|
== Examples
|
||||||
HBase ships Coprocessor examples for Observer Coprocessor see
|
HBase ships examples for Observer Coprocessor in
|
||||||
// Below URL is more than 100 characters long.
|
|
||||||
link:http://hbase.apache.org/xref/org/apache/hadoop/hbase/coprocessor/example/ZooKeeperScanPolicyObserver.html[ZooKeeperScanPolicyObserver]
|
link:http://hbase.apache.org/xref/org/apache/hadoop/hbase/coprocessor/example/ZooKeeperScanPolicyObserver.html[ZooKeeperScanPolicyObserver]
|
||||||
and for Endpoint Coprocessor see
|
and for Endpoint Coprocessor in
|
||||||
// Below URL is more than 100 characters long.
|
|
||||||
link:http://hbase.apache.org/xref/org/apache/hadoop/hbase/coprocessor/example/RowCountEndpoint.html[RowCountEndpoint]
|
link:http://hbase.apache.org/xref/org/apache/hadoop/hbase/coprocessor/example/RowCountEndpoint.html[RowCountEndpoint]
|
||||||
|
|
||||||
A more detailed example is given below.
|
A more detailed example is given below.
|
||||||
|
|
||||||
For the sake of example let's take an hypothetical case. Suppose there is a HBase table called
|
These examples assume a table called `users`, which has two column families `personalDet`
|
||||||
'users'. The table has two column families 'personalDet' and 'salaryDet' containing personal
|
and `salaryDet`, containing personal and salary details. Below is the graphical representation
|
||||||
details and salary details respectively. Below is the graphical representation of the 'users'
|
of the `users` table.
|
||||||
table.
|
|
||||||
|
|
||||||
.Users Table
|
.Users Table
|
||||||
[width="100%",cols="7",options="header,footer"]
|
[width="100%",cols="7",options="header,footer"]
|
||||||
|
@ -509,26 +475,22 @@ table.
|
||||||
|====================
|
|====================
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
=== Observer Example
|
=== Observer Example
|
||||||
For the purpose of demonstration of Coprocessor we are assuming that 'admin' is a special person
|
|
||||||
and his details shouldn't be visible or returned to any client querying the 'users' table. +
|
The following Observer coprocessor prevents the details of the user `admin` from being
|
||||||
To implement this functionality we will take the help of Observer Coprocessor.
|
returned in a `Get` or `Scan` of the `users` table.
|
||||||
Following are the implementation steps:
|
|
||||||
|
|
||||||
. Write a class that extends the
|
. Write a class that extends the
|
||||||
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.html[BaseRegionObserver]
|
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.html[BaseRegionObserver]
|
||||||
class.
|
class.
|
||||||
|
|
||||||
. Override the 'preGetOp()' method (Note that 'preGet()' method is now deprecated). The reason for
|
. Override the `preGetOp()` method (the `preGet()` method is deprecated) to check
|
||||||
overriding this method is to check if the client has queried for the rowkey with value 'admin' or
|
whether the client has queried for the rowkey with value `admin`. If so, return an
|
||||||
not. If the client has queried rowkey with 'admin' value then return the call without allowing the
|
empty result. Otherwise, process the request as normal.
|
||||||
system to perform the get operation thus saving on performance, otherwise process the request as
|
|
||||||
normal.
|
|
||||||
|
|
||||||
. Put your code and dependencies in the jar file.
|
. Put your code and dependencies in a JAR file.
|
||||||
|
|
||||||
. Place the jar in HDFS where HBase can locate it.
|
. Place the JAR in HDFS where HBase can locate it.
|
||||||
|
|
||||||
. Load the Coprocessor.
|
. Load the Coprocessor.
|
||||||
|
|
||||||
|
@ -536,8 +498,7 @@ normal.
|
||||||
|
|
||||||
Following are the implementation of the above steps:
|
Following are the implementation of the above steps:
|
||||||
|
|
||||||
. For Step 1 and Step 2, below is the code.
|
|
||||||
+
|
|
||||||
[source,java]
|
[source,java]
|
||||||
----
|
----
|
||||||
public class RegionObserverExample extends BaseRegionObserver {
|
public class RegionObserverExample extends BaseRegionObserver {
|
||||||
|
@ -568,10 +529,10 @@ public class RegionObserverExample extends BaseRegionObserver {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
----
|
----
|
||||||
Overriding the 'preGetOp()' will only work for 'Get' operation. For 'Scan' operation it won't help
|
|
||||||
you. To deal with it you have to override another method called 'preScannerOpen()' method, and
|
Overriding the `preGetOp()` will only work for `Get` operations. You also need to override
|
||||||
add a Filter explicitly for admin as shown below:
|
the `preScannerOpen()` method to filter the `admin` row from scan results.
|
||||||
+
|
|
||||||
[source,java]
|
[source,java]
|
||||||
----
|
----
|
||||||
@Override
|
@Override
|
||||||
|
@ -583,12 +544,11 @@ final RegionScanner s) throws IOException {
|
||||||
return s;
|
return s;
|
||||||
}
|
}
|
||||||
----
|
----
|
||||||
+
|
|
||||||
This method works but there is a _side effect_. If the client has used any Filter in his scan,
|
This method works but there is a _side effect_. If the client has used a filter in
|
||||||
then that Filter won't have any effect because our filter has replaced it. +
|
its scan, that filter will be replaced by this filter. Instead, you can explicitly
|
||||||
Another option you can try is to deliberately remove the admin from result. This approach is
|
remove any `admin` results from the scan:
|
||||||
shown below:
|
|
||||||
+
|
|
||||||
[source,java]
|
[source,java]
|
||||||
----
|
----
|
||||||
@Override
|
@Override
|
||||||
|
@ -597,9 +557,9 @@ final List results, final int limit, final boolean hasMore) throws IOException {
|
||||||
Result result = null;
|
Result result = null;
|
||||||
Iterator iterator = results.iterator();
|
Iterator iterator = results.iterator();
|
||||||
while (iterator.hasNext()) {
|
while (iterator.hasNext()) {
|
||||||
result = iterator.next();
|
result = iterator.next();
|
||||||
if (Bytes.equals(result.getRow(), ROWKEY)) {
|
if (Bytes.equals(result.getRow(), ROWKEY)) {
|
||||||
iterator.remove();
|
iterator.remove();
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -607,76 +567,12 @@ final List results, final int limit, final boolean hasMore) throws IOException {
|
||||||
}
|
}
|
||||||
----
|
----
|
||||||
|
|
||||||
. Step 3: It's pretty convenient to export the above program in a jar file. Let's assume that was
|
|
||||||
exported in a file called 'coprocessor.jar'.
|
|
||||||
|
|
||||||
. Step 4: Copy the jar to HDFS. You may use command like this:
|
|
||||||
+
|
|
||||||
[source]
|
|
||||||
----
|
|
||||||
hadoop fs -copyFromLocal coprocessor.jar coprocessor.jar
|
|
||||||
----
|
|
||||||
|
|
||||||
. Step 5: Load the Coprocessor, see <<cp_loading,Loading of Coprocessor>>.
|
|
||||||
|
|
||||||
. Step 6: Run the following program to test. The first part is testing 'Get' and second 'Scan'.
|
|
||||||
+
|
|
||||||
[source,java]
|
|
||||||
----
|
|
||||||
Configuration conf = HBaseConfiguration.create();
|
|
||||||
// Use below code for HBase version 1.x.x or above.
|
|
||||||
Connection connection = ConnectionFactory.createConnection(conf);
|
|
||||||
TableName tableName = TableName.valueOf("users");
|
|
||||||
Table table = connection.getTable(tableName);
|
|
||||||
|
|
||||||
//Use below code HBase version 0.98.xx or below.
|
|
||||||
//HConnection connection = HConnectionManager.createConnection(conf);
|
|
||||||
//HTableInterface table = connection.getTable("users");
|
|
||||||
|
|
||||||
Get get = new Get(Bytes.toBytes("admin"));
|
|
||||||
Result result = table.get(get);
|
|
||||||
for (Cell c : result.rawCells()) {
|
|
||||||
System.out.println(Bytes.toString(CellUtil.cloneRow(c))
|
|
||||||
+ "==> " + Bytes.toString(CellUtil.cloneFamily(c))
|
|
||||||
+ "{" + Bytes.toString(CellUtil.cloneQualifier(c))
|
|
||||||
+ ":" + Bytes.toLong(CellUtil.cloneValue(c)) + "}");
|
|
||||||
}
|
|
||||||
Scan scan = new Scan();
|
|
||||||
ResultScanner scanner = table.getScanner(scan);
|
|
||||||
for (Result res : scanner) {
|
|
||||||
for (Cell c : res.rawCells()) {
|
|
||||||
System.out.println(Bytes.toString(CellUtil.cloneRow(c))
|
|
||||||
+ " ==> " + Bytes.toString(CellUtil.cloneFamily(c))
|
|
||||||
+ " {" + Bytes.toString(CellUtil.cloneQualifier(c))
|
|
||||||
+ ":" + Bytes.toLong(CellUtil.cloneValue(c))
|
|
||||||
+ "}");
|
|
||||||
}
|
|
||||||
}
|
|
||||||
----
|
|
||||||
|
|
||||||
=== Endpoint Example
|
=== Endpoint Example
|
||||||
|
|
||||||
In our hypothetical example (See Users Table), to demonstrate the Endpoint Coprocessor we see a
|
Still using the `users` table, this example implements a coprocessor to calculate
|
||||||
trivial use case in which we will try to calculate the total (Sum) of gross salary of all
|
the sum of all employee salaries, using an endpoint coprocessor.
|
||||||
employees. One way of implementing Endpoint Coprocessor (for version 0.96 and above) is as follows:
|
|
||||||
|
|
||||||
. Create a '.proto' file defining your service.
|
. Create a '.proto' file defining your service.
|
||||||
|
|
||||||
. Execute the 'protoc' command to generate the Java code from the above '.proto' file.
|
|
||||||
|
|
||||||
. Write a class that should:
|
|
||||||
.. Extend the above generated service class.
|
|
||||||
.. It should also implement two interfaces Coprocessor and CoprocessorService.
|
|
||||||
.. Override the service method.
|
|
||||||
|
|
||||||
. Load the Coprocessor.
|
|
||||||
|
|
||||||
. Write a client code to call Coprocessor.
|
|
||||||
|
|
||||||
Implementation detail of the above steps is as follows:
|
|
||||||
|
|
||||||
. Step 1: Create a 'proto' file to define your service, request and response. Let's call this file
|
|
||||||
"sum.proto". Below is the content of the 'sum.proto' file.
|
|
||||||
+
|
+
|
||||||
[source]
|
[source]
|
||||||
----
|
----
|
||||||
|
@ -700,26 +596,25 @@ service SumService {
|
||||||
}
|
}
|
||||||
----
|
----
|
||||||
|
|
||||||
. Step 2: Compile the proto file using proto compiler (for detailed instructions see the
|
. Execute the `protoc` command to generate the Java code from the above .proto' file.
|
||||||
link:https://developers.google.com/protocol-buffers/docs/overview[official documentation]).
|
|
||||||
+
|
+
|
||||||
[source]
|
[source]
|
||||||
----
|
----
|
||||||
|
$ mkdir src
|
||||||
$ protoc --java_out=src ./sum.proto
|
$ protoc --java_out=src ./sum.proto
|
||||||
----
|
----
|
||||||
+
|
+
|
||||||
[note]
|
This will generate a class call `Sum.java`.
|
||||||
----
|
|
||||||
(Note: It is necessary for you to create the src folder).
|
|
||||||
This will generate a class call "Sum.java".
|
|
||||||
----
|
|
||||||
|
|
||||||
. Step 3: Write your Endpoint Coprocessor: Firstly your class should extend the service just
|
. Write a class that extends the generated service class, implement the `Coprocessor`
|
||||||
defined above (i.e. Sum.SumService). Second it should implement Coprocessor and CoprocessorService
|
and `CoprocessorService` classes, and override the service method.
|
||||||
interfaces. Third, override the 'getService()', 'start()', 'stop()' and 'getSum()' methods.
|
|
||||||
Below is the full code:
|
|
||||||
+
|
+
|
||||||
[source,java]
|
WARNING: If you load a coprocessor from `hbase-site.xml` and then load the same coprocessor
|
||||||
|
again using HBase Shell, it will be loaded a second time. The same class will
|
||||||
|
exist twice, and the second instance will have a higher ID (and thus a lower priority).
|
||||||
|
The effect is that the duplicate coprocessor is effectively ignored.
|
||||||
|
+
|
||||||
|
[source, java]
|
||||||
----
|
----
|
||||||
public class SumEndPoint extends SumService implements Coprocessor, CoprocessorService {
|
public class SumEndPoint extends SumService implements Coprocessor, CoprocessorService {
|
||||||
|
|
||||||
|
@ -779,15 +674,9 @@ public class SumEndPoint extends SumService implements Coprocessor, CoprocessorS
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
----
|
----
|
||||||
|
|
||||||
. Step 4: Load the Coprocessor. See <<cp_loading,loading of Coprocessor>>.
|
|
||||||
|
|
||||||
. Step 5: Now we have to write the client code to test it. To do so in your main method, write the
|
|
||||||
following code as shown below:
|
|
||||||
+
|
+
|
||||||
[source,java]
|
[source, java]
|
||||||
----
|
----
|
||||||
|
|
||||||
Configuration conf = HBaseConfiguration.create();
|
Configuration conf = HBaseConfiguration.create();
|
||||||
// Use below code for HBase version 1.x.x or above.
|
// Use below code for HBase version 1.x.x or above.
|
||||||
Connection connection = ConnectionFactory.createConnection(conf);
|
Connection connection = ConnectionFactory.createConnection(conf);
|
||||||
|
@ -821,6 +710,86 @@ e.printStackTrace();
|
||||||
}
|
}
|
||||||
----
|
----
|
||||||
|
|
||||||
|
. Load the Coprocessor.
|
||||||
|
|
||||||
|
. Write a client code to call the Coprocessor.
|
||||||
|
|
||||||
|
|
||||||
|
== Guidelines For Deploying A Coprocessor
|
||||||
|
|
||||||
|
Bundling Coprocessors::
|
||||||
|
You can bundle all classes for a coprocessor into a
|
||||||
|
single JAR on the RegionServer's classpath, for easy deployment. Otherwise,
|
||||||
|
place all dependencies on the RegionServer's classpath so that they can be
|
||||||
|
loaded during RegionServer start-up. The classpath for a RegionServer is set
|
||||||
|
in the RegionServer's `hbase-env.sh` file.
|
||||||
|
Automating Deployment::
|
||||||
|
You can use a tool such as Puppet, Chef, or
|
||||||
|
Ansible to ship the JAR for the coprocessor to the required location on your
|
||||||
|
RegionServers' filesystems and restart each RegionServer, to automate
|
||||||
|
coprocessor deployment. Details for such set-ups are out of scope of this
|
||||||
|
document.
|
||||||
|
Updating a Coprocessor::
|
||||||
|
Deploying a new version of a given coprocessor is not as simple as disabling it,
|
||||||
|
replacing the JAR, and re-enabling the coprocessor. This is because you cannot
|
||||||
|
reload a class in a JVM unless you delete all the current references to it.
|
||||||
|
Since the current JVM has reference to the existing coprocessor, you must restart
|
||||||
|
the JVM, by restarting the RegionServer, in order to replace it. This behavior
|
||||||
|
is not expected to change.
|
||||||
|
Coprocessor Logging::
|
||||||
|
The Coprocessor framework does not provide an API for logging beyond standard Java
|
||||||
|
logging.
|
||||||
|
Coprocessor Configuration::
|
||||||
|
If you do not want to load coprocessors from the HBase Shell, you can add their configuration
|
||||||
|
properties to `hbase-site.xml`. In <<load_coprocessor_in_shell>>, two arguments are
|
||||||
|
set: `arg1=1,arg2=2`. These could have been added to `hbase-site.xml` as follows:
|
||||||
|
[source,xml]
|
||||||
|
----
|
||||||
|
<property>
|
||||||
|
<name>arg1</name>
|
||||||
|
<value>1</value>
|
||||||
|
</property>
|
||||||
|
<property>
|
||||||
|
<name>arg2</name>
|
||||||
|
<value>2</value>
|
||||||
|
</property>
|
||||||
|
----
|
||||||
|
Then you can read the configuration using code like the following:
|
||||||
|
[source,java]
|
||||||
|
----
|
||||||
|
Configuration conf = HBaseConfiguration.create();
|
||||||
|
// Use below code for HBase version 1.x.x or above.
|
||||||
|
Connection connection = ConnectionFactory.createConnection(conf);
|
||||||
|
TableName tableName = TableName.valueOf("users");
|
||||||
|
Table table = connection.getTable(tableName);
|
||||||
|
|
||||||
|
//Use below code HBase version 0.98.xx or below.
|
||||||
|
//HConnection connection = HConnectionManager.createConnection(conf);
|
||||||
|
//HTableInterface table = connection.getTable("users");
|
||||||
|
|
||||||
|
Get get = new Get(Bytes.toBytes("admin"));
|
||||||
|
Result result = table.get(get);
|
||||||
|
for (Cell c : result.rawCells()) {
|
||||||
|
System.out.println(Bytes.toString(CellUtil.cloneRow(c))
|
||||||
|
+ "==> " + Bytes.toString(CellUtil.cloneFamily(c))
|
||||||
|
+ "{" + Bytes.toString(CellUtil.cloneQualifier(c))
|
||||||
|
+ ":" + Bytes.toLong(CellUtil.cloneValue(c)) + "}");
|
||||||
|
}
|
||||||
|
Scan scan = new Scan();
|
||||||
|
ResultScanner scanner = table.getScanner(scan);
|
||||||
|
for (Result res : scanner) {
|
||||||
|
for (Cell c : res.rawCells()) {
|
||||||
|
System.out.println(Bytes.toString(CellUtil.cloneRow(c))
|
||||||
|
+ " ==> " + Bytes.toString(CellUtil.cloneFamily(c))
|
||||||
|
+ " {" + Bytes.toString(CellUtil.cloneQualifier(c))
|
||||||
|
+ ":" + Bytes.toLong(CellUtil.cloneValue(c))
|
||||||
|
+ "}");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
----
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
== Monitor Time Spent in Coprocessors
|
== Monitor Time Spent in Coprocessors
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue