diff --git a/dev-support/design-docs/Coprocessor_Design_Improvements-Use_composition_instead_of_inheritance-HBASE-17732-2017_09_27.pdf b/dev-support/design-docs/Coprocessor_Design_Improvements-Use_composition_instead_of_inheritance-HBASE-17732-2017_09_27.pdf deleted file mode 100644 index 30a6d542b36..00000000000 Binary files a/dev-support/design-docs/Coprocessor_Design_Improvements-Use_composition_instead_of_inheritance-HBASE-17732-2017_09_27.pdf and /dev/null differ diff --git a/dev-support/design-docs/Coprocessor_Design_Improvements-Use_composition_instead_of_inheritance-HBASE-17732.adoc b/dev-support/design-docs/Coprocessor_Design_Improvements-Use_composition_instead_of_inheritance-HBASE-17732.adoc new file mode 100644 index 00000000000..a61b37b2159 --- /dev/null +++ b/dev-support/design-docs/Coprocessor_Design_Improvements-Use_composition_instead_of_inheritance-HBASE-17732.adoc @@ -0,0 +1,236 @@ += Coprocessor Design Improvements (link:https://issues.apache.org/jira/browse/HBASE-17732[HBASE-17732]) + +Author: Apekshit Sharma + +== Introduction + +This doc explains current design of Coprocessor feature in brief, few issues I noticed, and +suggestions on how to fix them & further improve overall design. + +*TL;DR* + +We are moving from + +* Observer *is* Coprocessor +* FooService *is* CoprocessorService + +To + +* Coprocessor *has* Observer +* Coprocessor *has* Service + +See code example in <>. + +== Terminology + +hooks = functions in observers. Named because third-party use these functions to “hook up” custom +logic to internal code paths. + +[[background]] +== Background + +Coprocessors are well link:http://hbase.apache.org/book.html#cp[documented in the refguide]. + +Here we give a little background information on involved classes, their responsibilities, and +relationship to each other. + +* Main classes +** Coprocessor (interface) +*** All *Observer* interfaces derive from Coprocessor interface. +**** Coprocessor Interface is a _Marker _Interface. It just has start/stop methods and enums for +stages in the Coprocessor Lifecycle. +** http://hbase.apache.org/book.html#_observer_coprocessors[Observers] (interface) +*** Contain hooks which third-party programs can override to inject functionality in various +internal code paths. For e.g preCreateTable(...) will be called just before any table is created. +*** Current set of observers: _MasterObserver, RegionObserver, RegionServerObserver, WALObserver, +EndpointObserver, BulkLoadObserver._ +** CoprocessorEnvironment (interface) +*** Encapsulates a coprocessor instance and other information like versions, priority, etc. +*** Coprocessor implementations use it to get access to tables. +*** Four main implementations: _MasterEnvironment, RegionEnvironment, RegionServerEnvironment, +WALEnvironment._ +** CoprocessorHost (abstract class) +*** Responsible for loading coprocessors +*** Four concrete sub-classes: MasterCoprocessorHost, RegionCoprocessorHost, +RegionServerCoprocessorHost, WALCoprocessorHost +*** Each host is tied to corresponding environment type using template argument ‘E’. + +== Problems + +* CoprocessorEnvironment has `Coprocessor getInstance()`. Since Observer types which can be +handled by an environment are not statically tied to it, coprocessor hosts (which are statically +tied to Environment) don’t know which kind of coprocessors are relevant to them, i.e. +MasterCoprocessorHost is tied to MasterEnvironment, but it doesn’t know that it can only handle +MasterObserver(s). As a result: +** Problem 1: All hosts load all observers i.e. MasterCoprocessorHost will also load RegionObserver +and other observers. +** Problem 2: Hosts use runtime checks likes `observer instanceOf ExpectedObserver` in +execOperation and other functions to filter out incompatible observers. +** Problem 3: Many redundant functions in every implementation of coprocessor host. +* Observer *extends* Coprocessor (inheritance) +** Problem 4: Any third-party coprocessor which wants to use many observers will have to extend all +of them in same class. For eg. +class AccessController implements MasterObserver, + RegionObserver, RegionServerObserver, EndpointObserver, + BulkLoadObserver, AccessControlService.Interface, + CoprocessorService +That results in big classes with 100+ functions. + +== Proposed Solutions + +* There are 6 types of observers (listed in <> section above), but just 4 types of +CoprocessorEnvironment. So some XEnvironment has to be handling multiple Observers +(RegionEnvironment serves RegionObserver, EndpointObserver and BulkLoadObservers). Our aim is to +statically tie environment to types of observers it can serve. There are two alternative choices +here: +** Option 1: Limit to 4 types of Observers. That fits nicely in our pattern-of-4 +(4 hosts, 4 environments, 4 observers) and will make the overall design simpler. Although it may +look simple at surface, it’ll actually lead to a single large observer interface which will only +grow and may contain 100s of hooks in future (master already has 100+) +** Option 2: Use multiple observers to group together similar kinds of hooks. +Like we have RegionObserver, EndpointObserver and BulkLoadObserver; we can have ScannerObserver, +AMObserver, etc instead of single MasterObserver. Benefits being +*** Proper encapsulation of related hooks and separation from unrelated hooks +*** We can give different Stability guarantees for different set of hooks. Something which’ll make +our CP compat management much easier. + +I believe Option 2 to be better than Option 1, and the design changes suggested later are based on +Option 2. + +* For problem 4, we should replace inheritance with composition, so developers have choice to +break out observer implementations into separate classes. + +[[main.design.change.suggestions]] +== Main Design Change suggestions + +* Extend pattern-of-4 up to coprocessor. ++ +CoprocessorHost → Env → Coprocessor +* Tie each CoprocessorEnvironment to corresponding Coprocessor +* Use composition instead of inheritance between Coprocessor and Observers. + +=== Current design + +Only changing parts are mentioned here. Anything not changing is represented by “...” + +[source,java] +---- +interface Coprocessor { + ... +} + +interface CoprocessorEnvironment { + Coprocessor getInstance(); + ... +} + +interface CoprocessorService { + Service getService(); +} + +abstract class CoprocessorHost { + ... +} + +interface RegionObserver extends Coprocessor { + ... +} + +interface BulkLoadObserver extends Coprocessor { + ... +} + +interface EndpointObserver extends Coprocessor { + ... +} +---- + +=== New design + +[source,java] +---- +interface Coprocessor { + ... +} + +// Extend pattern-of-4 to coprocessors. +interface RegionCoprocessor extends Coprocessor { + RegionObserver getRegionObserver(); + BulkLoadObserver getBulkLoadObserver(); + EndpointObserver getEndpointObserver(); + Service getService(); +} + +// Tie coprocessor to environment +interface CoprocessorEnvironment { + C getInstance(); + ... +} + +abstract class CoprocessorHost> { + ... +} + +// Doesn’t extend coprocessor +interface RegionObserver extends Coprocessor { + … +} + +// Doesn’t extend coprocessor +interface BulkLoadObserver extends Coprocessor { + … +} +---- + + +== How does it fix our issues? + +* Fix #1: CoprocessorHost is now tied to types of coprocessors it can serve by template argument C. +During load time, it can ignore any coprocessors which don’t match. +* Fix #2 and #3: Pull the duplicate functions into CoprocessorHost class. Individual host subclasses +can use these directly. One interesting part here is ObserverGetter. For any specific hook, +say in observer O, subclasses specify ObserverGetter so that the shared methods can extract +observer O from coprocessor C. +* Fix #4: Choosing composition over inheritance, by adding getter functions in coprocessors (e.g. +getRegionObserver()), implementations can now break up observer implementations into separate +classes. For e.g. our AccessController will now just be: +`class AccessController implements AccessControlService.Interface, CoprocessorService` + +[[migrating.existing.cps.to.new.design]] +== Migrating existing CPs to new design + +There’s a simple and small fix that can migrate existing coprocessors to the new design. + +If we had the following observer in the old design: + +---- +class FooObserver implements RegionObserver { +... +... +} +---- + +It can be “made to work” with the new design like this: + +---- +class FooObserver implements RegionCoprocessor, RegionObserver { + public RegionObserver getRegionObserver() { return this; } + ... + ... +} +---- + +However, note that this is only a workaround to quickly migrate ~100 CPs in our code base to new +design without creating new classes and files. *New CPs should NOT follow this pattern.* + +== Additional Benefit + +* Cleaner solution to https://issues.apache.org/jira/browse/HBASE-17106[HBASE-17106]. +* We can have multiple observer interfaces for each environment now. For e.g We can break our single + monolithic MasterObsever (~150 functions) to multiple observer interfaces - ScannerObserver, + AMObserver, etc. +* These observers can be assigned different compatibility guarantees. For instances, new hooks by +Backup feature could have been put into separate Observer and marked IS.Unstable, while features +which have hardened over years can be marked IS.Stable. +* Only the coprocessors corresponding to hosts which support endpoint service will have +“getService()” method. So WALCoprocessor which cannot support service doesn’t have one. That may +look minor thing. But if our design can cleanly convey what is and isn’t supported, that’s beautiful +and powerful and helpful for downstream developers. + +