Commit Graph

61 Commits

Author SHA1 Message Date
Yi Liang 49f707fba7 HBASE-17933: [hbase-spark] Support Java api for bulkload
Signed-off-by: Sean Busbey <busbey@apache.org>
2017-04-24 11:48:29 -05:00
zhangduo 66b616d7a3 HBASE-17914 Create a new reader instead of cloning a new StoreFile when compaction 2017-04-19 09:26:33 +08:00
Yi Liang d7ddc79198 HBASE-17905 [hbase-spark] bulkload does not work when table not exist
Signed-off-by: tedyu <yuzhihong@gmail.com>
2017-04-11 17:18:49 -07:00
tedyu 02da5a6104 HBASE-17905: [hbase-spark] bulkload does not work when table not exist - revert due to misspelling 2017-04-11 17:18:37 -07:00
Yi Liang 22f602cab5 HBASE-17905: [hbase-spark] bulkload does not work when table not exist
Signed-off-by: tedyu <yuzhihong@gmail.com>
2017-04-11 17:01:07 -07:00
zhangduo a66d491892 HBASE-17857 Remove IS annotations from IA.Public classes 2017-04-05 15:34:06 +08:00
Jerry He 35d7a0cd07 HBASE-15597 Clean up configuration keys used in hbase-spark module (Yi Liang) 2017-03-13 12:02:07 -07:00
Jerry He 6bb5938226 HBASE-14375 Define public API for spark integration module 2017-03-04 14:10:34 -08:00
Jerry He a95570cfa0 Revert "Define public API for spark integration module" for missing JIRA number.
This reverts commit 58b6d9759e.
2017-03-04 14:08:38 -08:00
Jerry He 58b6d9759e Define public API for spark integration module 2017-03-04 12:53:21 -08:00
Andrew Purtell 404a2883f2 HBASE-17722 Metrics subsystem stop/start messages add a lot of useless bulk to operational logging 2017-03-03 12:40:06 -08:00
chetkhatri 92fc4c0cc8 HBase-17549 HBase-Spark Module: Corrected - Incorrect log at println and unwanted comment code
Signed-off-by: Michael Stack <stack@apache.org>
2017-01-26 21:58:25 -08:00
dskskv 59cd8e510c HBASE-17547 Bug Resolved - TableCatelog doesn't supports multiple columns from Single Column family 2017-01-26 11:58:01 -08:00
tedyu 81d3e25a75 HBASE-17547 HBase-Spark Module : TableCatelog doesn't support multiple columns from Single Column family - revert due to not using git am 2017-01-26 11:57:04 -08:00
tedyu 0cdea03460 HBASE-17547 HBase-Spark Module : TableCatelog doesn't support multiple columns from Single Column family (Chetan Khatri) 2017-01-26 10:51:43 -08:00
Yu Li 07e0a30efa HBASE-17491 Remove all setters from HTable interface and introduce a TableBuilder to build Table instance 2017-01-23 13:57:01 +08:00
Jan Hentschel 55a1aa1e73 HBASE-10699 Set capacity on ArrayList where possible and use isEmpty instead of size() == 0
Signed-off-by: Michael Stack <stack@apache.org>
2017-01-20 22:58:20 -08:00
tedyu ccb8d671d5 HBASE-17371 Enhance 'HBaseContextSuite @ distributedScan to test HBase client' with filter 2016-12-27 15:44:18 -08:00
tedyu 81623a353c HBASE-17047 Add an API to get HBase connection cache statistics (Weiqing Yang) 2016-11-11 06:50:01 -08:00
tedyu 444dc866c0 HBASE-16823 Add examples in HBase Spark module (Weiqing Yang) 2016-10-14 10:19:54 -07:00
tedyu a68c0e2a34 HBASE-16818 Avoid multiple copies of binary data during the conversion from Result to Row (Weiqing Yang) 2016-10-14 10:16:43 -07:00
tedyu 07086036a5 HBASE-16638 Reduce the number of Connection's created in classes of hbase-spark module - addendum 2 (Weiqing Yang) 2016-10-14 09:00:38 -07:00
tedyu ee6f0ddef6 HBASE-16638 Reduce the number of Connection's created in classes of hbase-spark module - Addendum (Weiqing Yang) 2016-10-11 10:22:50 -07:00
tedyu 9d304d3b2d HBASE-16638 Reduce the number of Connection's created in classes of hbase-spark module (Weiqing Yang) 2016-10-11 09:04:26 -07:00
tedyu 83fc59d5c9 HBASE-16804 JavaHBaseContext.streamBulkGet is void but should be JavaDStream (Igor Yurinok) 2016-10-10 19:34:21 -07:00
stack 95c1dc93fb HBASE-15638 Shade protobuf
Which includes

    HBASE-16742 Add chapter for devs on how we do protobufs going forward

    HBASE-16741 Amend the generate protobufs out-of-band build step
    to include shade, pulling in protobuf source and a hook for patching protobuf

    Removed ByteStringer from hbase-protocol-shaded. Use the protobuf-3.1.0
    trick directly instead. Makes stuff cleaner. All under 'shaded' dir is
    now generated.

    HBASE-16567 Upgrade to protobuf-3.1.x
    Regenerate all protos in this module with protoc3.
    Redo ByteStringer to use new pb3.1.0 unsafebytesutil
    instead of HBaseZeroCopyByteString

    HBASE-16264 Figure how to deal with endpoints and shaded pb Shade our protobufs.
    Do it in a manner that makes it so we can still have in our API references to
    com.google.protobuf (and in REST). The c.g.p in API is for Coprocessor Endpoints (CPEP)

            This patch is Tactic #4 from Shading Doc attached to the referenced issue.
            Figuring an appoach took a while because we have Coprocessor Endpoints
            mixed in with the core of HBase that are tough to untangle (FIX).

            Tactic #4 (the fourth attempt at addressing this issue) is COPY all but
            the CPEP .proto files currently in hbase-protocol to a new module named
            hbase-protocol-shaded. Generate .protos again in the new location and
            then relocate/shade the generated files. Let CPEPs keep on with the
            old references at com.google.protobuf.* and
            org.apache.hadoop.hbase.protobuf.* but change the hbase core so all
            instead refer to the relocated files in their new location at
            org.apache.hadoop.hbase.shaded.com.google.protobuf.*.

            Let the new module also shade protobufs themselves and change hbase
            core to pick up this shaded protobuf rather than directly reference
            com.google.protobuf.

            This approach allows us to explicitly refer to either the shaded or
            non-shaded version of a protobuf class in any particular context (though
            usually context dictates one or the other). Core runs on shaded protobuf.
            CPEPs continue to use whatever is on the classpath with
            com.google.protobuf.* which is pb2.5.0 for the near future at least.

            See above cited doc for follow-ons and downsides. In short, IDEs will complain
            about not being able to find the shaded protobufs since shading happens at package
            time; will fix by checking in all generated classes and relocated protobuf in
            a follow-on. Also, CPEPs currently suffer an extra-copy as marshalled from
            non-shaded to shaded. To fix. Finally, our .protos are duplicated; once
            shaded, and once not. Pain, but how else to reveal our protos to CPEPs or
            C++ client that wants to talk with HBase AND shade protobuf.

            Details:

            Add a new hbase-protocol-shaded module. It is a copy of hbase-protocol
    i       with all relocated offset from o.a.h.h. to o.a.h.h.shaded. The new module
            also includes the relocated pb. It does not include CPEPs. They stay in
            their old location.

            Add another module hbase-endpoint which has in it all the endpoints
            that ship as part of hbase -- at least the ones that are not
            entangled with core such as AccessControl and Auth. Move all protos
            for these CPEPs here as well as their unit tests (mostly moving a
            bunch of stuff out of hbase-server module)

            Much of the change looks like this:

                 -import org.apache.hadoop.hbase.protobuf.ProtobufUtil;
                 -import org.apache.hadoop.hbase.protobuf.generated.ClusterIdProtos;
                 +import org.apache.hadoop.hbase.protobuf.shaded.ProtobufUtil;
                 +import org.apache.hadoop.hbase.shaded.protobuf.generated.ClusterIdProtos;

            In HTable and in HBaseAdmin, regularize the way Callables are used and also hide
            protobuf usage as much as possible moving it up into Callable super classes or out
            to utility classes. Still TODO is adding in of retries, etc., but can wait on
            procedure which will redo all this.

            Also in HTable and HBaseAdmin as well as in HRegionServer and Server, be explicit
            when using non-shaded protobuf. Do the full-path so it is clear. This is around
            endpoint coprocessors registration of services and execution of CPEP methods.

            Shrunk ProtobufUtil by moving methods used by one CPEP only back to the CPEP either
            into Client class or as new Util class; e.g. AccessControlUtil.

            There are actually two versions of ProtobufUtil now; a shaded one and a subset
            that is used by CPEPs doing non-shaded work.

            Made it so hbase-common no longer depends on hbase-protocol (with Matteo's help)

            R*Converter classes got moved down under shaded package -- they are for internal
            use only. There are no non-shaded versions of these classes.

            D hbase-client/src/main/java/org/apache/hadoop/hbase/client/AbstractRegionServerCallable
            D RetryingCallableBase
             Not used anymore and we have too many tiers of Callables so removed/cleaned-up.

            A ClientServicecallable
             Had to add this one. RegionServerCallable was made generic so it could be used
             for a few Interfaces (Client and Admin). Then added ClientServiceCallable to
             implement RegionServerCallable with the Client Interface.
2016-10-03 21:37:32 -07:00
stack 45bb6180a3 REVERT of revert of "HBASE-16308 Contain protobuf references Gather up the pb references into a few locations only rather than have pb references distributed all about the code base."
This is a revert of a revert; i.e. we are adding back the change only adding
back with fixes for the broken unit test; was a real issue on a test that
went in just at same time as this commit; I was getting a new nonce on each
retry rather than getting one for the mutation.

Other changes since revert are more hiding of RpcController. Use
accessor method rather than always pass in a RpcController

Walked back retrying operations that used to be single-shot (though
code comment said need a retry) because it opens a can of worms where
we retry stuff like bad column family when we shouldn't (needs
work adding in DoNotRetryIOEs)

Changed name of class from PayloadCarryingServerCallable to
CancellableRegionServerCallable.

Fix javadoc and findbugs warnings.

Fix case of not initializing the ScannerCallable RpcController.

Below is original commit message:

 Remove mention of ServiceException and other protobuf classes from all over the codebase.
 Purge TimeLimitedRpcController. Lets just have one override of RpcController.
        M hbase-client/src/main/java/org/apache/hadoop/hbase/client/AbstractRegionServerCallable.java
         Cleanup. Make it clear this is an odd class for async hbase intro.
        M hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java
         Refactor of RegionServerCallable allows me clean up a bunch of
         boilerplate in here and remove protobuf references.
        M hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
          Purge protobuf references everywhere except a reference to a throw of a
          ServiceException in method checkHBaseAvailable. I deprecated it in favor
          of new available method (the SE is not actually needed)
        M hbase-client/src/main/java/org/apache/hadoop/hbase/client/PayloadCarryingServerCallable.java
         Move the RetryingTimeTracker instance in here from HTable.
         Allows me to contain tracker and remove a repeated code in HTable.
        M hbase-client/src/main/java/org/apache/hadoop/hbase/client/RegionServerCallable.java
         Clean up move set up of rpc in here rather than have it repeat in HTable.
         Allows me to remove protobuf references from a bunch of places.
    M hbase-client/src/main/java/org/apache/hadoop/hbase/client/FlushRegionCallable.java
     Make use of the push of boilerplate up into RegionServerCallable
    M hbase-client/src/main/java/org/apache/hadoop/hbase/client/MultiServerCallable.java
    M hbase-client/src/main/java/org/apache/hadoop/hbase/client/PayloadCarryingServerCallable.java
    M hbase-client/src/main/java/org/apache/hadoop/hbase/client/RegionAdminServiceCallable.java
    M hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java
    M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
    M hbase-client/src/main/java/org/apache/hadoop/hbase/client/SecureBulkLoadClient.java
    M hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
     Move boilerplate up into superclass.
    M hbase-client/src/main/java/org/apache/hadoop/hbase/client/RetryingTimeTracker.java
     Cleanup
    M hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/PayloadCarryingRpcController.java
    M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
    M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEditsReplaySink.java
    M hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/RegionReplicaReplicationEndpoint.java
     Factor in TimeLimitedRpcController. Just have one RpcController override.
    D hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/TimeLimitedRpcController.java
     Removed. Lets have one override of pb rpccontroller only.
    M hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
     (handleRemoteException) added
     (toText) added
2016-08-10 10:12:06 -07:00
stack 0206dc67d6 Revert "HBASE-16308 Contain protobuf references Gather up the pb references into a few locations only rather than have pb references distributed all about the code base."
This reverts commit ed87a81b4b.
2016-08-05 15:18:48 -07:00
stack ed87a81b4b HBASE-16308 Contain protobuf references Gather up the pb references into a few locations only rather than have pb references distributed all about the code base.
Purge ServiceException from Callable subclasses by pushing SE handling
up into the parent Callable class (varies by context but this is basic
patten). Allows us remove a bunch of boilerplate.
Do this in the public facing classes in particular (though if
an API has SE in it -- which a few do, this patch leaves these
untouched -- for now.) Make it so HBaseAdmin and HTable have no
direct pb imports (except for endpoint processor API).

Change a few of the HBaseAdmin calls to be retrying where comments
ask that we do retry rather than one time.

Purge TimeLimitedRpcController. Lets just have one override of RpcController.

        M hbase-client/src/main/java/org/apache/hadoop/hbase/client/AbstractRegionServerCallable.java
         Cleanup. Make it clear this is an odd class for async hbase intro.

        M hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java
         Refactor of RegionServerCallable allows me clean up a bunch of
         boilerplate in here and remove protobuf references.

        M hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
          Purge protobuf references everywhere except a reference to a throw of a
          ServiceException in method checkHBaseAvailable. I deprecated it in favor
          of new available method (the SE is not actually needed)

        M hbase-client/src/main/java/org/apache/hadoop/hbase/client/PayloadCarryingServerCallable.java
         Move the RetryingTimeTracker instance in here from HTable.
         Allows me to contain tracker and remove a repeated code in HTable.

        M hbase-client/src/main/java/org/apache/hadoop/hbase/client/RegionServerCallable.java
         Clean up move set up of rpc in here rather than have it repeat in HTable.
         Allows me to remove protobuf references from a bunch of places.

    M hbase-client/src/main/java/org/apache/hadoop/hbase/client/FlushRegionCallable.java
     Make use of the push of boilerplate up into RegionServerCallable

    M hbase-client/src/main/java/org/apache/hadoop/hbase/client/MultiServerCallable.java
    M hbase-client/src/main/java/org/apache/hadoop/hbase/client/PayloadCarryingServerCallable.java
    M hbase-client/src/main/java/org/apache/hadoop/hbase/client/RegionAdminServiceCallable.java
    M hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java
    M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
    M hbase-client/src/main/java/org/apache/hadoop/hbase/client/SecureBulkLoadClient.java
    M hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
     Move boilerplate up into superclass.

    M hbase-client/src/main/java/org/apache/hadoop/hbase/client/RetryingTimeTracker.java
     Cleanup

    M hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/PayloadCarryingRpcController.java
    M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
    M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEditsReplaySink.java
    M hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/RegionReplicaReplicationEndpoint.java
     Factor in TimeLimitedRpcController. Just have one RpcController override.

    D hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/TimeLimitedRpcController.java
     Removed. Lets have one override of pb rpccontroller only.

    M hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
     (handleRemoteException) added
     (toText) added

Signed-off-by: stack <stack@apache.org>
2016-08-05 10:13:58 -07:00
stack ff7d0082b8 HBASE-16306 Add specific imports to avoid namespace clash in defaultSource.scala (Sai Teja Ranuva) 2016-07-29 13:43:16 -07:00
stack 9d740f7b8b HBASE-16263 Move all to do w/ protobuf -- *.proto files and generated classes -- under hbase-protocol
Signed-off-by: stack <stack@apache.org>
2016-07-21 10:02:05 -07:00
Apekshit 4ffea7711a HBASE-15944 Spark test flooding mvn output. Redirect test logs to file. This doesn't fix the problem fully as I still see few logs being dumped in stdout. But it cleans up majority of the earlier dump. (Apekshit)
Change-Id: I6893301d154078a7cfb6b9af2eedc744deafb8d7

Signed-off-by: stack <stack@apache.org>
2016-06-02 08:44:08 -07:00
Jurriaan Mous cdd532da8a HBASE-15610 Remove deprecated HConnection for 2.0 thus removing all PB references for 2.0
Signed-off-by: stack <stack@apache.org>
2016-05-29 07:50:55 -07:00
tedyu 084b036cb2 HBASE-15825 Fix the null pointer in DynamicLogicExpressionSuite (Zhan Zhang) 2016-05-16 09:04:50 -07:00
Jonathan M Hsieh b353e388bb HBASE-15333 [hbase-spark] Enhance the dataframe filters to handle naively encoded short, integer, long, float and double (Zhan Zhang) 2016-05-13 07:00:41 -07:00
tedyu ebb5d421f9 HBASE-15707 ImportTSV bulk output does not support tags with hfile.format.version=3 (huaxiang sun) 2016-04-26 11:21:29 -07:00
Apekshit 7efb9edecb HBASE-15296 Break out writer and reader from StoreFile. Done using Intellij15 Refactor > Move. (Apekshit)
Change-Id: Ie719569cc3393e0b5361e9d462c3cf125ad5144e

Signed-off-by: stack <stack@apache.org>
2016-04-13 22:43:03 -07:00
Weiqing Yang 58177c103f HBASE-15572 Adding optional timestamp semantics to HBase-Spark
4 parameters, "timestamp", "minTimestamp", "maxiTimestamp" and
"maxVersions" are added to HBaseSparkConf. Users can select a
timestamp, they can also select a time range with minimum timestamp and
maximum timestamp.

Signed-off-by: Sean Busbey <busbey@apache.org>
Signed-off-by: Ted Yu <tedyu@apache.org>
Signed-off-by: Jerry He <jerryjch@apache.org>
2016-04-13 22:39:14 -05:00
Sean Busbey 6905d272d3 Revert "HBASE-15572 Adding optional timestamp semantics to HBase-Spark (Weiqing Yang)"
This reverts commit eec27ad7ef.
2016-03-31 21:40:50 -05:00
tedyu eec27ad7ef HBASE-15572 Adding optional timestamp semantics to HBase-Spark (Weiqing Yang) 2016-03-31 19:08:33 -07:00
tedyu 2b8a7f8d7b HBASE-15334 Add avro support for spark hbase connector (Zhan Zhang) 2016-03-17 09:11:38 -07:00
tedyu f6945c4631 HBASE-15336 Support Dataframe writer to the spark connector (Zhan Zhang) 2016-03-10 06:44:29 -08:00
Jonathan M Hsieh 97cce850fe HBASE-14801 Enhance the Spark-HBase connector catalog with json format (Zhan Zhang) 2016-03-09 10:41:56 -08:00
Ted Malaska b29ce7f114 HBASE-15271 Spark bulk load should write to temporary location and then rename on success.
Signed-off-by: Ted Yu <tedyu@apache.org>
Signed-off-by: Jonathan Hsieh <jon@cloudera.com>
Signed-off-by: Sean Busbey <busbey@apache.org>
2016-03-08 17:28:38 -08:00
tedyu 00248656ee HBASE-15184 SparkSQL Scan operation doesn't work on kerberos cluster (Ted Malaska) 2016-02-23 16:52:13 -08:00
Jonathan M Hsieh f352f3c371 HBASE-15282 Bump hbase-spark to use Spark 1.6.0 2016-02-18 17:31:42 -08:00
tedyu 6868c63660 HBASE-14796 Enhance the Gets in the connector (Zhan Zhang) 2015-12-28 15:48:10 -08:00
tedyu e75e26e3c6 HBASE-14849 Add option to set block cache to false on SparkSQL executions (Zhan Zhang) 2015-12-19 15:14:58 -08:00
Jonathan M Hsieh bbfff0d072 HBASE-14991 Fix the '-feature' warning in scala build (Zhan Zhang) 2015-12-17 14:19:36 -08:00
tedyu 676ce01c82 HBASE-14795 Enhance the spark-hbase scan operations (Zhan Zhang) 2015-12-13 18:26:54 -08:00