Commit Graph

933 Commits

Author SHA1 Message Date
Andrew Purtell 6ad5b9e569
HBASE-25824 IntegrationTestLoadCommonCrawl (#3208)
* HBASE-25824 IntegrationTestLoadCommonCrawl

This integration test loads successful resource retrieval records from
the Common Crawl (https://commoncrawl.org/) public dataset into an HBase
table and writes records that can be used to later verify the presence
and integrity of those records.

Run like:

  ./bin/hbase org.apache.hadoop.hbase.test.IntegrationTestLoadCommonCrawl \
    -Dfs.s3n.awsAccessKeyId=<AWS access key> \
    -Dfs.s3n.awsSecretAccessKey=<AWS secret key> \
    /path/to/test-CC-MAIN-2021-10-warc.paths.gz \
    /path/to/tmp/warc-loader-output

Access to the Common Crawl dataset in S3 is made available to anyone by
Amazon AWS, but Hadoop's S3N filesystem still requires valid access
credentials to initialize.

The input path can either specify a directory or a file. The file may
optionally be compressed with gzip. If a directory, the loader expects
the directory to contain one or more WARC files from the Common Crawl
dataset. If a file, the loader expects a list of Hadoop S3N URIs which
point to S3 locations for one or more WARC files from the Common Crawl
dataset, one URI per line. Lines should be terminated with the UNIX line
terminator.

Included in hbase-it/src/test/resources/CC-MAIN-2021-10-warc.paths.gz
is a list of all WARC files comprising the Q1 2021 crawl archive. There
are 64,000 WARC files in this data set, each containing ~1GB of gzipped
data. The WARC files contain several record types, such as metadata,
request, and response, but we only load the response record types. If
the HBase table schema does not specify compression (by default) there
is roughly a 10x expansion. Loading the full crawl archive results in a
table approximately 640 TB in size.

The hadoop-aws jar will be needed at runtime to instantiate the S3N
filesystem. Use the -files ToolRunner argument to add it.

You can also split the Loader and Verify stages:

Load with:

  ./bin/hbase 'org.apache.hadoop.hbase.test.IntegrationTestLoadCommonCrawl$Loader' \
    -files /path/to/hadoop-aws.jar \
    -Dfs.s3n.awsAccessKeyId=<AWS access key> \
    -Dfs.s3n.awsSecretAccessKey=<AWS secret key> \
    /path/to/test-CC-MAIN-2021-10-warc.paths.gz \
    /path/to/tmp/warc-loader-output

Verify with:

  ./bin/hbase 'org.apache.hadoop.hbase.test.IntegrationTestLoadCommonCrawl$Verify' \
    /path/to/tmp/warc-loader-output

Signed-off-by: Michael Stack <stack@apache.org>
2021-05-03 17:59:00 -07:00
Duo Zhang a4d954e606
HBASE-25757 Move BaseLoadBalancer to hbase-balancer module (#3191)
Signed-off-by: Yulin Niu <niuyulin@apache.org>
2021-04-26 12:03:25 +08:00
Duo Zhang b714889989 HBASE-25733 Upgrade opentelemetry to 1.0.1 (#3122)
Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: Yulin Niu <niuyulin@apache.org>
2021-04-25 09:23:23 +08:00
Duo Zhang 8399293e21 HBASE-25616 Upgrade opentelemetry to 1.0.0 (#3034)
Signed-off-by: Yulin Niu <niuyulin@apache.org>
2021-04-25 09:23:23 +08:00
Duo Zhang f6ff519dd0 HBASE-25591 Upgrade opentelemetry to 0.17.1 (#2971)
Signed-off-by: Guanghao Zhang <zghao@apache.org>
2021-04-25 09:23:23 +08:00
Duo Zhang 805b2ae2ad HBASE-23898 Add trace support for simple apis in async client (#2813)
Signed-off-by: Guanghao Zhang <zghao@apache.org>
2021-04-25 09:23:23 +08:00
Duo Zhang 57960fa8fa HBASE-25424 Find a way to config OpenTelemetry tracing without direct… (#2808)
Signed-off-by: Guanghao Zhang <zghao@apache.org>
2021-04-25 09:23:23 +08:00
Duo Zhang 2420286715 HBASE-25401 Add trace support for async call in rpc client (#2790)
Signed-off-by: Guanghao Zhang <zghao@apache.org>
2021-04-25 09:23:23 +08:00
Duo Zhang 302d9ea8b8 HBASE-25373 Remove HTrace completely in code base and try to make use of OpenTelemetry
Signed-off-by: stack <stack@apache.org>
2021-04-25 09:23:23 +08:00
Peter Somogyi 75494108f8
HBASE-25755 Exclude tomcat-embed-core from libthrift (#3141)
Exclude tomcat-embed-core transitive dependency
Remove outdated exclude rule for slf4j

Signed-off-by: Pankaj <pankajkumar@apache.org>
Signed-off-by: Kevin Risden <krisden@apache.org>
2021-04-10 09:20:53 +02:00
Geoffrey Jacoby 6aab1341a1 Add Geoffrey Jacoby to developers list in pom.xml 2021-04-09 11:32:00 -07:00
Duo Zhang 024248994f
HBASE-25696 Need to initialize SLF4JBridgeHandler in jul-to-slf4j for redirecting jul to slf4j (#3093)
Signed-off-by: Michael Stack <stack@apache.org>
2021-03-30 15:54:18 +08:00
Duo Zhang ba3610d097
HBASE-19577 Use log4j2 instead of log4j for logging (#1708)
Signed-off-by: stack <stack@apache.org>
2021-03-20 09:21:25 +08:00
Pankaj d74ae15fa7
HBASE-25568 Upgrade Thrift jar to fix CVE-2020-13949 (#3043)
Signed-off-by: stack <stack@apache.com>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
2021-03-16 17:16:20 +05:30
Duo Zhang 92fe6090c2
HBASE-25604 Upgrade spotbugs to 4.x (#3029)
Signed-off-by: Yulin Niu <niuyulin@apache.org>
2021-03-10 14:52:56 +08:00
Akshay Sudheer 5d9a6ed1fe
HBASE-25367 Sort broken after Change 'State time' in UI (#2964)
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Pankaj Kumar<pankajkumar@apache.org>
2021-03-03 13:57:42 +05:30
Duo Zhang b522d2a33e Revert "HBASE-25604 Upgrade spotbugs to 4.x (#2986)"
This reverts commit d5df99999a.
2021-03-02 21:26:28 +08:00
Duo Zhang d5df99999a
HBASE-25604 Upgrade spotbugs to 4.x (#2986)
Signed-off-by: XinSun <ddupgs@gmail.com>
2021-03-02 15:54:54 +08:00
Josh Elser a7d0445a21 HBASE-25601 Use ASF-official mailing list archives
Signed-off-by: Peter Somogyi <psomogyi@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>

Closes #2983
2021-02-25 11:08:38 -05:00
Duo Zhang 4925a6422b
HBASE-25333 Add maven enforcer rule to ban VisibleForTesting imports (#2854)
Signed-off-by: Peter Somogyi <psomogyi@apache.org>
2021-01-09 08:50:11 +08:00
Duo Zhang fbf00f9c28
HBASE-25451 Upgrade commons-io to 2.8.0 (#2825)
Signed-off-by: Guanghao Zhang <zghao@apache.org>
Signed-off-by: stack <stack@apache.org>
2020-12-31 16:57:27 +08:00
Duo Zhang f098461a55 HBASE-25370 Addendum fix checkstyle issue and dependencies 2020-12-12 21:08:51 +08:00
lixiaobao c853c99b20
HBASE-25372 Fix typo in ban-jersey section of the enforcer plugin in pom.xml (#2749)
Signed-off-by: Wei-Chiu Chuang <weichiu@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
2020-12-09 21:34:36 +08:00
XinSun 979ad0f3fc
Add Xin Sun as a developer 2020-12-08 10:49:39 +08:00
niuyulin 4b8d3624f4
Add niuyulin as committer 2020-12-08 10:41:25 +08:00
Duo Zhang 107f738049
HBASE-25342 Upgrade error prone to 2.4.0 (#2725)
Have to disable MutablePublicArray because of a bug in error prone
https://github.com/google/error-prone/issues/1645

Signed-off-by: stack <stack@apache.org>
2020-12-02 22:23:03 +08:00
Duo Zhang e40c626ae1
HBASE-25320 Upgrade hbase-thirdparty dependency to 3.4.1 (#2693)
Signed-off-by: stack <stack@apache.org>
2020-12-01 08:16:03 -08:00
Peter Somogyi 035c192eb6
HBASE-25275 Upgrade asciidoctor (#2647)
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
2020-11-12 15:37:12 +01:00
Norbert Kalmar f0c430aed2
HBASE-20598 Upgrade to JRuby 9.2
- upgrade our default jruby to 9.2.13.0
- this major JRuby version update changes the Ruby compatibility from Ruby 2.3 to Ruby 2.5
- use a custom IRB prompt to convey similar information to before
- update the joni and jcoding dependencies to match this version of jruby-complete

closes #2308

Signed-off-by: stack <stack@apache.org>
Signed-off-by: Josh Elser <elserj@apache.org>
Signed-off-by: Sean Busbey <busbey@apache.org>
2020-11-09 16:31:44 -06:00
Semen Komissarov 2fc79e22b5
HBASE-23959 Fix javadoc for JDK11 (#2500)
Signed-off-by: Jan Hentschel <jan.hentschel@ultratendency.com>
Signed-off-by: stack <stack@apache.org>
2020-10-07 21:42:35 -07:00
Duo Zhang 81f2cc5089
HBASE-25154 Set java.io.tmpdir to project build directory to avoid writing std*deferred files to /tmp (#2502)
Signed-off-by: stack <stack@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Sean Busbey <busbey@apache.org>
2020-10-06 08:35:16 +08:00
bsglz 4e59014bed
Add Zheng Wang to developers list. (#2457) 2020-09-25 19:24:08 +08:00
Andrew Purtell 8bfa2cb2ee
HBASE-25079 Upgrade Bootstrap to 3.3.7 (#2442)
Signed-off-by: Viraj Jasani<virajjasani@apache.org>
2020-09-23 14:56:50 -07:00
Duo Zhang 57e49b3959
HBASE-23834 HBase fails to run on Hadoop 3.3.0/3.2.2/3.1.4 due to jetty version mismatch (#2222)
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Josh Elser <elserj@apache.org>
Signed-off-by: Peter Somogyi <psomogyi@apache.org>
2020-08-25 12:05:52 +08:00
Duo Zhang 0b85729da4
HBASE-24762 Purge protobuf java 2.5.0 dependency (#2128)
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: stack <stack@apache.org>
2020-07-24 11:48:35 +08:00
Sean Busbey 06949ff6a6
HBASE-22033 Update to maven-javadoc-plugin 3.2.0 and switch to non-forking aggregate goals
closes #1796

Signed-off-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: Jan Hentschel <janh@apache.org>
2020-05-29 15:18:47 -05:00
Sean Busbey f0d66273cd
HBASE-19663 javadoc creation needs jsr305.
Some javadoc invocations require that annotations we reference can have any
classes they reference resolved. This includes annotations _they_ have,
even though annotations are normally optional.

In some cases this showed up as javax.annotation.meta.TypeQualifierNickname
not found, because some findbugs annotations use it. Other times it was
javax.annotation.concurrent.Immutable not found, because some old guava
versions use it.

(updated for master branch by doing the config in report config instead of plugin)

Signed-off-by: Peter Somogyi <psomogyi@apache.org>
Signed-off-by: Michael Stack <stack@apache.org>
2020-05-29 15:15:28 -05:00
Duo Zhang 8601416ee8
HBASE-24309 Avoid introducing log4j and slf4j-log4j dependencies for modules other than hbase-assembly (#1640)
Signed-off-by: stack <stack@apache.org>
2020-05-12 12:03:30 +08:00
Duo Zhang a9a1b9524d
HBASE-24304 Separate a hbase-asyncfs module (#1628)
Signed-off-by: stack <stack@apache.org>
2020-05-06 14:40:21 +08:00
Duo Zhang d29bdd3558
HBASE-24315 Remove hadoop-two-compat.xml in hbase-assembly (#1638)
Signed-off-by: stack <stack@apache.org>
Signed-off-by: Jan Hentschel <jan.hentschel@ultratendency.com>
2020-05-05 19:49:39 +08:00
Wei-Chiu Chuang 1878db843c
HBASE-24238 Clean up root pom after removing hadoop-2.0 profile (#1571) 2020-05-04 14:05:24 -07:00
Jan Hentschel 4f9eecbe61
HBASE-24301 Updated Apache POM to version 23
Signed-off-by: stack <stack@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
2020-05-02 14:35:16 +02:00
Peter Somogyi e26a2b5bf3
HBASE-24285 Move to hbase-thirdparty-3.3.0 (#1605)
Signed-off-by: stack <stack@apache.org>
Signed-off-by: Jan Hentschel <jan.hentschel@ultratendency.com>
2020-04-30 10:06:54 +02:00
stack 9cff7a4ff4 HBASE-24215 [Flakey Tests] [ERROR] TestSecureRESTServer java.lang.NoClassDefFoundError: com/sun/jersey/core/spi/factory/AbstractRuntimeDelegate
Addendum: Add jersey-servlet to hadoop3 profile.

Made the hadoop3 profile in top-level pom same as it is for hadoop2
when it comes to exclusions. Then backed out previous attempt mostly.
Made the failing test medium-sized so it ran in its own jvm.
2020-04-29 07:39:59 -07:00
Duo Zhang 6928674eb8
HBASE-24228 Merge the code in hbase-hadoop2-compat module to hbase-hadoop-compat (#1563)
Signed-off-by: stack <stack@apache.org>
2020-04-29 10:34:53 +08:00
niuyulin bc9184ee00
HBASE-23933 Separate a hbase-balancer module (#1436)
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
2020-04-22 14:55:38 +08:00
Tamas Penzes 9f5a8bc3e4 HBASE-24148: Upgrade Thrift to 0.13.0: 0.12.0 has outstanding CVEs.
Upgrade thrift from 0.12 to 0.13.

Signed-off-by: Jan Hentschel <jan.hentschel@ultratendency.com>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
2020-04-17 08:31:36 -07:00
Duo Zhang 1f66806c96
HBASE-24170 Remove hadoop-2.0 profile (#1495)
Signed-off-by: stack <stack@apache.org>
2020-04-16 18:57:40 +08:00
Michael Stack 5f08311b23
HBASE-24134 Down forked JVM heap size from 2800m to 2200m for jdk8 and jdk11 (#1451) (#1503)
Down jdk8 forked jvm heap from 2800 to 2200 and the jdk11 heap from
3200 to 2200. Down the mvn size from 4G to 3.6G

Change how many puts done by TestMultiRespectsLimits because made
the test run the forked heap over 2.5G in size.

Signed-off-by: Sean Busbey <busbey@apache.org>
2020-04-13 21:35:43 -07:00
Jianfei Jiang ba34a2ca30
HBASE-24132 Upgrade to Apache ZooKeeper 3.5.7 (#1453)
* HBASE-24132 Upgrade to Apache ZooKeeper 3.5.7

* HBASE-24132 Upgrade to Apache ZooKeeper 3.5.7

Co-authored-by: 姜建飞 10222269 <jiang.jianfei@zte.com.cn>

Signed-off-by: Mate Szalay-Beko <szalay.beko.mate@gmail.com>
Signed-off-by: Norbert Kalmar <nkalmar@cloudera.com>
Signed-off-by: stack <stack@apache.org>
Signed-off-by: Duo Zhang <zhangdo@apache.org>
2020-04-13 13:46:37 -07:00