HBASE-17730 Migration to 2.0 for coprocessors. Upload AsciiDoc for coprocessor design improvements made in HBASE-17732.
This commit is contained in:
parent
40924bb4af
commit
40fbecd97c
Binary file not shown.
|
@ -0,0 +1,236 @@
|
|||
= Coprocessor Design Improvements (link:https://issues.apache.org/jira/browse/HBASE-17732[HBASE-17732])
|
||||
|
||||
Author: Apekshit Sharma
|
||||
|
||||
== Introduction
|
||||
|
||||
This doc explains current design of Coprocessor feature in brief, few issues I noticed, and
|
||||
suggestions on how to fix them & further improve overall design.
|
||||
|
||||
*TL;DR* +
|
||||
We are moving from
|
||||
|
||||
* Observer *is* Coprocessor
|
||||
* FooService *is* CoprocessorService
|
||||
|
||||
To
|
||||
|
||||
* Coprocessor *has* Observer
|
||||
* Coprocessor *has* Service
|
||||
|
||||
See code example in <<main.design.change.suggestions>>.
|
||||
|
||||
== Terminology
|
||||
|
||||
hooks = functions in observers. Named because third-party use these functions to “hook up” custom
|
||||
logic to internal code paths.
|
||||
|
||||
[[background]]
|
||||
== Background
|
||||
|
||||
Coprocessors are well link:http://hbase.apache.org/book.html#cp[documented in the refguide].
|
||||
|
||||
Here we give a little background information on involved classes, their responsibilities, and
|
||||
relationship to each other.
|
||||
|
||||
* Main classes
|
||||
** Coprocessor (interface)
|
||||
*** All *Observer* interfaces derive from Coprocessor interface.
|
||||
**** Coprocessor Interface is a _Marker _Interface. It just has start/stop methods and enums for
|
||||
stages in the Coprocessor Lifecycle.
|
||||
** http://hbase.apache.org/book.html#_observer_coprocessors[Observers] (interface)
|
||||
*** Contain hooks which third-party programs can override to inject functionality in various
|
||||
internal code paths. For e.g preCreateTable(...) will be called just before any table is created.
|
||||
*** Current set of observers: _MasterObserver, RegionObserver, RegionServerObserver, WALObserver,
|
||||
EndpointObserver, BulkLoadObserver._
|
||||
** CoprocessorEnvironment (interface)
|
||||
*** Encapsulates a coprocessor instance and other information like versions, priority, etc.
|
||||
*** Coprocessor implementations use it to get access to tables.
|
||||
*** Four main implementations: _MasterEnvironment, RegionEnvironment, RegionServerEnvironment,
|
||||
WALEnvironment._
|
||||
** CoprocessorHost (abstract class)
|
||||
*** Responsible for loading coprocessors
|
||||
*** Four concrete sub-classes: MasterCoprocessorHost, RegionCoprocessorHost,
|
||||
RegionServerCoprocessorHost, WALCoprocessorHost
|
||||
*** Each host is tied to corresponding environment type using template argument ‘E’.
|
||||
|
||||
== Problems
|
||||
|
||||
* CoprocessorEnvironment has `Coprocessor getInstance()`. Since Observer types which can be
|
||||
handled by an environment are not statically tied to it, coprocessor hosts (which are statically
|
||||
tied to Environment) don’t know which kind of coprocessors are relevant to them, i.e.
|
||||
MasterCoprocessorHost is tied to MasterEnvironment, but it doesn’t know that it can only handle
|
||||
MasterObserver(s). As a result:
|
||||
** Problem 1: All hosts load all observers i.e. MasterCoprocessorHost will also load RegionObserver
|
||||
and other observers.
|
||||
** Problem 2: Hosts use runtime checks likes `observer instanceOf ExpectedObserver` in
|
||||
execOperation and other functions to filter out incompatible observers.
|
||||
** Problem 3: Many redundant functions in every implementation of coprocessor host.
|
||||
* Observer *extends* Coprocessor (inheritance)
|
||||
** Problem 4: Any third-party coprocessor which wants to use many observers will have to extend all
|
||||
of them in same class. For eg.
|
||||
class AccessController implements MasterObserver,
|
||||
RegionObserver, RegionServerObserver, EndpointObserver,
|
||||
BulkLoadObserver, AccessControlService.Interface,
|
||||
CoprocessorService
|
||||
That results in big classes with 100+ functions.
|
||||
|
||||
== Proposed Solutions
|
||||
|
||||
* There are 6 types of observers (listed in <<background>> section above), but just 4 types of
|
||||
CoprocessorEnvironment. So some XEnvironment has to be handling multiple Observers
|
||||
(RegionEnvironment serves RegionObserver, EndpointObserver and BulkLoadObservers). Our aim is to
|
||||
statically tie environment to types of observers it can serve. There are two alternative choices
|
||||
here:
|
||||
** Option 1: Limit to 4 types of Observers. That fits nicely in our pattern-of-4
|
||||
(4 hosts, 4 environments, 4 observers) and will make the overall design simpler. Although it may
|
||||
look simple at surface, it’ll actually lead to a single large observer interface which will only
|
||||
grow and may contain 100s of hooks in future (master already has 100+)
|
||||
** Option 2: Use multiple observers to group together similar kinds of hooks.
|
||||
Like we have RegionObserver, EndpointObserver and BulkLoadObserver; we can have ScannerObserver,
|
||||
AMObserver, etc instead of single MasterObserver. Benefits being
|
||||
*** Proper encapsulation of related hooks and separation from unrelated hooks
|
||||
*** We can give different Stability guarantees for different set of hooks. Something which’ll make
|
||||
our CP compat management much easier.
|
||||
|
||||
I believe Option 2 to be better than Option 1, and the design changes suggested later are based on
|
||||
Option 2.
|
||||
|
||||
* For problem 4, we should replace inheritance with composition, so developers have choice to
|
||||
break out observer implementations into separate classes.
|
||||
|
||||
[[main.design.change.suggestions]]
|
||||
== Main Design Change suggestions
|
||||
|
||||
* Extend pattern-of-4 up to coprocessor.
|
||||
+
|
||||
CoprocessorHost → Env → Coprocessor
|
||||
* Tie each CoprocessorEnvironment to corresponding Coprocessor
|
||||
* Use composition instead of inheritance between Coprocessor and Observers.
|
||||
|
||||
=== Current design
|
||||
|
||||
Only changing parts are mentioned here. Anything not changing is represented by “...”
|
||||
|
||||
[source,java]
|
||||
----
|
||||
interface Coprocessor {
|
||||
...
|
||||
}
|
||||
|
||||
interface CoprocessorEnvironment {
|
||||
Coprocessor getInstance();
|
||||
...
|
||||
}
|
||||
|
||||
interface CoprocessorService {
|
||||
Service getService();
|
||||
}
|
||||
|
||||
abstract class CoprocessorHost<E extends CoprocessorEnvironment> {
|
||||
...
|
||||
}
|
||||
|
||||
interface RegionObserver extends Coprocessor {
|
||||
...
|
||||
}
|
||||
|
||||
interface BulkLoadObserver extends Coprocessor {
|
||||
...
|
||||
}
|
||||
|
||||
interface EndpointObserver extends Coprocessor {
|
||||
...
|
||||
}
|
||||
----
|
||||
|
||||
=== New design
|
||||
|
||||
[source,java]
|
||||
----
|
||||
interface Coprocessor {
|
||||
...
|
||||
}
|
||||
|
||||
// Extend pattern-of-4 to coprocessors.
|
||||
interface RegionCoprocessor extends Coprocessor {
|
||||
RegionObserver getRegionObserver();
|
||||
BulkLoadObserver getBulkLoadObserver();
|
||||
EndpointObserver getEndpointObserver();
|
||||
Service getService();
|
||||
}
|
||||
|
||||
// Tie coprocessor to environment
|
||||
interface CoprocessorEnvironment<C extends Coprocessor> {
|
||||
C getInstance();
|
||||
...
|
||||
}
|
||||
|
||||
abstract class CoprocessorHost<C extends Coprocessor, E extends CoprocessorEnvironment<C>> {
|
||||
...
|
||||
}
|
||||
|
||||
// Doesn’t extend coprocessor
|
||||
interface RegionObserver extends Coprocessor {
|
||||
…
|
||||
}
|
||||
|
||||
// Doesn’t extend coprocessor
|
||||
interface BulkLoadObserver extends Coprocessor {
|
||||
…
|
||||
}
|
||||
----
|
||||
|
||||
|
||||
== How does it fix our issues?
|
||||
|
||||
* Fix #1: CoprocessorHost is now tied to types of coprocessors it can serve by template argument C.
|
||||
During load time, it can ignore any coprocessors which don’t match.
|
||||
* Fix #2 and #3: Pull the duplicate functions into CoprocessorHost class. Individual host subclasses
|
||||
can use these directly. One interesting part here is ObserverGetter<C, O>. For any specific hook,
|
||||
say in observer O, subclasses specify ObserverGetter<C, O> so that the shared methods can extract
|
||||
observer O from coprocessor C.
|
||||
* Fix #4: Choosing composition over inheritance, by adding getter functions in coprocessors (e.g.
|
||||
getRegionObserver()), implementations can now break up observer implementations into separate
|
||||
classes. For e.g. our AccessController will now just be:
|
||||
`class AccessController implements AccessControlService.Interface, CoprocessorService`
|
||||
|
||||
[[migrating.existing.cps.to.new.design]]
|
||||
== Migrating existing CPs to new design
|
||||
|
||||
There’s a simple and small fix that can migrate existing coprocessors to the new design. +
|
||||
If we had the following observer in the old design: +
|
||||
----
|
||||
class FooObserver implements RegionObserver {
|
||||
...
|
||||
...
|
||||
}
|
||||
----
|
||||
|
||||
It can be “made to work” with the new design like this: +
|
||||
----
|
||||
class FooObserver implements RegionCoprocessor, RegionObserver {
|
||||
public RegionObserver getRegionObserver() { return this; }
|
||||
...
|
||||
...
|
||||
}
|
||||
----
|
||||
|
||||
However, note that this is only a workaround to quickly migrate ~100 CPs in our code base to new
|
||||
design without creating new classes and files. *New CPs should NOT follow this pattern.*
|
||||
|
||||
== Additional Benefit
|
||||
|
||||
* Cleaner solution to https://issues.apache.org/jira/browse/HBASE-17106[HBASE-17106].
|
||||
* We can have multiple observer interfaces for each environment now. For e.g We can break our single
|
||||
monolithic MasterObsever (~150 functions) to multiple observer interfaces - ScannerObserver,
|
||||
AMObserver, etc.
|
||||
* These observers can be assigned different compatibility guarantees. For instances, new hooks by
|
||||
Backup feature could have been put into separate Observer and marked IS.Unstable, while features
|
||||
which have hardened over years can be marked IS.Stable.
|
||||
* Only the coprocessors corresponding to hosts which support endpoint service will have
|
||||
“getService()” method. So WALCoprocessor which cannot support service doesn’t have one. That may
|
||||
look minor thing. But if our design can cleanly convey what is and isn’t supported, that’s beautiful
|
||||
and powerful and helpful for downstream developers.
|
||||
|
||||
|
Loading…
Reference in New Issue