# RELEASENOTES # HBASE 2.4.8 Release Notes These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements. --- * [HBASE-26362](https://issues.apache.org/jira/browse/HBASE-26362) | *Major* | **Upload mvn site artifacts for nightly build to nightlies** Now we will upload the site artifacts to nightlies for nightly build as well as pre commit build. --- * [HBASE-26329](https://issues.apache.org/jira/browse/HBASE-26329) | *Major* | **Upgrade commons-io to 2.11.0** Upgraded commons-io to 2.11.0. --- * [HBASE-26186](https://issues.apache.org/jira/browse/HBASE-26186) | *Major* | **jenkins script for caching artifacts should verify cached file before relying on it** Add a '--verify-tar-gz' option to cache-apache-project-artifact.sh for verifying whether the cached file can be parsed as a gzipped tarball. Use this option in our nightly job to avoid failures on broken cached hadoop tarballs. --- * [HBASE-26339](https://issues.apache.org/jira/browse/HBASE-26339) | *Major* | **SshPublisher will skip uploading artifacts if the build is failure** Now we will mark build as unstable instead of failure when the yetus script returns error. This is used to solve the problem that the SshPublisher jenkins plugin will skip uploading artifacts if the build is marked as failure. In fact, the test output will be more important when there are UT failures. # HBASE 2.4.7 Release Notes These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements. --- * [HBASE-26274](https://issues.apache.org/jira/browse/HBASE-26274) | *Major* | **Create an option to reintroduce BlockCache to mapreduce job** Introduce \`hfile.onheap.block.cache.fixed.size\` and default to disable. When using ClientSideRegionScanner, it will be enabled with a fixed size for caching INDEX/LEAF\_INDEX block when a client, e.g. snapshot scanner, scans the entire HFile and does not need to seek/reseek to index block multiple times. --- * [HBASE-26270](https://issues.apache.org/jira/browse/HBASE-26270) | *Minor* | **Provide getConfiguration method for Region and Store interface** Provide 'getReadOnlyConfiguration' method for Store and Region interface --- * [HBASE-26273](https://issues.apache.org/jira/browse/HBASE-26273) | *Major* | **TableSnapshotInputFormat/TableSnapshotInputFormatImpl should use ReadType.STREAM for scanning HFiles** HBase's MapReduce API which can operate over HBase snapshots will now default to using ReadType.STREAM instead of ReadType.DEFAULT (which is PREAD) as a result of this change. HBase developers expect that STREAM will perform significantly better for average Snapshot-based batch jobs. Users can restore the previous functionality (using PREAD) by updating their code to explicitly set a value of \`ReadType.PREAD\` on the \`Scan\` object they provide to TableSnapshotInputFormat, or by setting the configuration property "hbase.TableSnapshotInputFormat.scanner.readtype" to "PREAD" in hbase-site.xml. --- * [HBASE-26276](https://issues.apache.org/jira/browse/HBASE-26276) | *Major* | **Allow HashTable/SyncTable to perform rawScan when comparing cells** Added --rawScan option to HashTable job, which allows HashTable/SyncTable to perform raw scans. If this property is omitted, it defaults to false. When used together with --versions set to a high value, SyncTable will fabricate delete markers to all old versions still hanging (not cleaned yet by major compaction), avoiding the inconsistencies reported in HBASE-21596. # HBASE 2.4.6 Release Notes These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements. --- * [HBASE-26204](https://issues.apache.org/jira/browse/HBASE-26204) | *Major* | **VerifyReplication should obtain token for peerQuorumAddress too** VerifyReplication obtains tokens even if the peer quorum parameter is used. VerifyReplication with peer quorum can be used for secure clusters also. --- * [HBASE-24652](https://issues.apache.org/jira/browse/HBASE-24652) | *Minor* | **master-status UI make date type fields sortable** Makes RegionServer 'Start time' sortable in the Master UI --- * [HBASE-26200](https://issues.apache.org/jira/browse/HBASE-26200) | *Major* | **Undo 'HBASE-25165 Change 'State time' in UI so sorts (#2508)' in favor of HBASE-24652** Undid showing RegionServer 'Start time' in ISO-8601 format. Revert. --- * [HBASE-6908](https://issues.apache.org/jira/browse/HBASE-6908) | *Major* | **Pluggable Call BlockingQueue for HBaseServer** Can pass in a FQCN to load as the call queue implementation. Standardized arguments to the constructor are the max queue length, the PriorityFunction, and the Configuration. PluggableBlockingQueue abstract class provided to help guide the correct constructor signature. Hard fails with PluggableRpcQueueNotFound if the class fails to load as a BlockingQueue\ Upstreaming on behalf of Hubspot, we are interested in defining our own custom RPC queue and don't want to get involved in necessarily upstreaming internal requirements/iterations. --- * [HBASE-26196](https://issues.apache.org/jira/browse/HBASE-26196) | *Major* | **Support configuration override for remote cluster of HFileOutputFormat locality sensitive** Allow any configuration for the remote cluster in HFileOutputFormat2 that could be useful the different configuration from the job's configuration is necessary to connect the remote cluster, for instance, non-secure vs secure. --- * [HBASE-26160](https://issues.apache.org/jira/browse/HBASE-26160) | *Minor* | **Configurable disallowlist for live editing of loglevels** Adds a new hbase.ui.logLevels.readonly.loggers config which takes a comma-separated list of logger names. Similar to log4j configurations, the logger names can be prefixes or a full logger name. The log level of read only loggers cannot be changed via the logLevel UI or setlevel CLI. This is useful for securing sensitive loggers, such as the SecurityLogger used for audit logs. --- * [HBASE-26154](https://issues.apache.org/jira/browse/HBASE-26154) | *Minor* | **Provide exception metric for quota exceeded and throttling** Adds "exceptions.quotaExceeded" and "exceptions.rpcThrottling" to HBase server and Thrift server metrics. --- * [HBASE-26146](https://issues.apache.org/jira/browse/HBASE-26146) | *Minor* | **Allow custom opts for hbck in hbase bin** Adds HBASE\_HBCK\_OPTS environment variable to bin/hbase for passing extra options to hbck/hbck2. Defaults to HBASE\_SERVER\_JAAS\_OPTS if specified, or HBASE\_REGIONSERVER\_OPTS. # HBASE 2.4.5 Release Notes These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements. --- * [HBASE-26088](https://issues.apache.org/jira/browse/HBASE-26088) | *Critical* | **conn.getBufferedMutator(tableName) leaks thread executors and other problems** The API doc for Connection#getBufferedMutator(TableName) and Connection#getBufferedMutator(BufferedMutatorParams) mentioned that when user dont pass a ThreadPool to be used, we use the ThreadPool in the Connection. But in reality, we were creating new ThreadPool in such cases. We are keeping the behaviour of code as is but corrected the Javadoc and also a bug of not closing this new pool while Closing the BufferedMutator. --- * [HBASE-25986](https://issues.apache.org/jira/browse/HBASE-25986) | *Minor* | **Expose the NORMALIZARION\_ENABLED table descriptor through a property in hbase-site** New config: hbase.table.normalization.enabled Default value: false Description: This config is used to set default behaviour of normalizer at table level. To override this at table level one can set NORMALIZATION\_ENABLED at table descriptor level and that property will be honored. Of course, this property at table level can only work if normalizer is enabled at cluster level using "normalizer\_switch true" command. --- * [HBASE-22923](https://issues.apache.org/jira/browse/HBASE-22923) | *Major* | **hbase:meta is assigned to localhost when we downgrade the hbase version** Introduced new config: hbase.min.version.move.system.tables When the operator uses this configuration option, any version between the current cluster version and the value of "hbase.min.version.move.system.tables" does not trigger any auto-region movement. Auto-region movement here refers to auto-migration of system table regions to newer server versions. It is assumed that the configured range of versions does not require special handling of moving system table regions to higher versioned RegionServer. This auto-migration is done by AssignmentManager#checkIfShouldMoveSystemRegionAsync(). Example: Let's assume the cluster is on version 1.4.0 and we have set "hbase.min.version.move.system.tables" as "2.0.0". Now if we upgrade one RegionServer on 1.4.0 cluster to 1.6.0 (\< 2.0.0), then AssignmentManager will not move hbase:meta, hbase:namespace and other system table regions to newly brought up RegionServer 1.6.0 as part of auto-migration. However, if we upgrade one RegionServer on 1.4.0 cluster to 2.2.0 (\> 2.0.0), then AssignmentManager will move all system table regions to newly brought up RegionServer 2.2.0 as part of auto-migration done by AssignmentManager#checkIfShouldMoveSystemRegionAsync(). Overall, assuming we have system RSGroup where we keep HBase system tables, if we use config "hbase.min.version.move.system.tables" with value x.y.z then while upgrading cluster to version greater than or equal to x.y.z, the first RegionServer that we upgrade must belong to system RSGroup only. --- * [HBASE-25902](https://issues.apache.org/jira/browse/HBASE-25902) | *Critical* | **Add missing CFs in meta during HBase 1 to 2.3+ Upgrade** While upgrading cluster from 1.x to 2.3+ versions, after the active master is done setting it's status as 'Initialized', it attempts to add 'table' and 'repl\_barrier' CFs in meta. Once CFs are added successfully, master is aborted with PleaseRestartMasterException because master has missed certain initialization events (e.g ClusterSchemaService is not initialized and tableStateManager fails to migrate table states from ZK to meta due to missing CFs). Subsequent active master initialization is expected to be smooth. In the presence of multi masters, when one of them becomes active for the first time after upgrading to HBase 2.3+, it is aborted after fixing CFs in meta and one of the other backup masters will take over and become active soon. Hence, overall this is expected to be smooth upgrade if we have backup masters configured. If not, operator is expected to restart same master again manually. --- * [HBASE-25877](https://issues.apache.org/jira/browse/HBASE-25877) | *Major* | **Add access check for compactionSwitch** Now calling RSRpcService.compactionSwitch, i.e, Admin.compactionSwitch at client side, requires ADMIN permission. This is an incompatible change but it is also a bug, as we should not allow any users to disable compaction on a regionserver, so we apply this to all active branches. --- * [HBASE-25984](https://issues.apache.org/jira/browse/HBASE-25984) | *Critical* | **FSHLog WAL lockup with sync future reuse [RS deadlock]** Fixes a WAL lockup issue due to premature reuse of the sync futures by the WAL consumers. The lockup causes the WAL system to hang resulting in blocked appends and syncs thus holding up the RPC handlers from progressing. Only workaround without this fix is to force abort the region server. --- * [HBASE-25993](https://issues.apache.org/jira/browse/HBASE-25993) | *Major* | **Make excluded SSL cipher suites configurable for all Web UIs** Add "ssl.server.exclude.cipher.list" configuration to excluded cipher suites for the http server started by the InfoServer. --- * [HBASE-25969](https://issues.apache.org/jira/browse/HBASE-25969) | *Major* | **Cleanup netty-all transitive includes** We have an (old) netty-all in our produced artifacts. It is transitively included from hadoop. It is needed by MiniMRCluster referenced from a few MR tests in hbase. This commit adds netty-all excludes everywhere else but where tests will fail unless the transitive is allowed through. TODO: move MR and/or MR tests out of hbase core. # HBASE 2.4.4 Release Notes These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements. --- * [HBASE-25963](https://issues.apache.org/jira/browse/HBASE-25963) | *Major* | **HBaseCluster should be marked as IA.Public** Change HBaseCluster to IA.Public as its sub class MiniHBaseCluster is IA.Public. # HBASE 2.4.3 Release Notes These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements. --- * [HBASE-25766](https://issues.apache.org/jira/browse/HBASE-25766) | *Major* | **Introduce RegionSplitRestriction that restricts the pattern of the split point** After HBASE-25766, we can specify a split restriction, "KeyPrefix" or "DelimitedKeyPrefix", to a table with the "hbase.regionserver.region.split\_restriction.type" property. The "KeyPrefix" split restriction groups rows by a prefix of the row-key. And the "DelimitedKeyPrefix" split restriction groups rows by a prefix of the row-key with a delimiter. For example: \`\`\` # Create a table with a "KeyPrefix" split restriction, where the prefix length is 2 bytes hbase\> create 'tbl1', 'fam', {CONFIGURATION =\> {'hbase.regionserver.region.split\_restriction.type' =\> 'KeyPrefix', 'hbase.regionserver.region.split\_restriction.prefix\_length' =\> '2'}} # Create a table with a "DelimitedKeyPrefix" split restriction, where the delimiter is a comma (,) hbase\> create 'tbl2', 'fam', {CONFIGURATION =\> {'hbase.regionserver.region.split\_restriction.type' =\> 'DelimitedKeyPrefix', 'hbase.regionserver.region.split\_restriction.delimiter' =\> ','}} \`\`\` Instead of specifying a split restriction to a table directly, we can also set the properties in hbase-site.xml. In this case, the specified split restriction is applied for all the tables. Note that the split restriction is also applied to a user-specified split point so that we don't allow users to break the restriction, which is different behavior from the existing KeyPrefixRegionSplitPolicy and DelimitedKeyPrefixRegionSplitPolicy. --- * [HBASE-25775](https://issues.apache.org/jira/browse/HBASE-25775) | *Major* | **Use a special balancer to deal with maintenance mode** Introduced a MaintenanceLoadBalancer to be used only under maintenance mode. Typically you should not use it as your balancer implementation. --- * [HBASE-25767](https://issues.apache.org/jira/browse/HBASE-25767) | *Major* | **CandidateGenerator.getRandomIterationOrder is too slow on large cluster** In the actual implementation classes of CandidateGenerator, now we just random select a start point and then iterate sequentially, instead of using the old way, where we will create a big array to hold all the integers in [0, num\_regions\_in\_cluster), shuffle the array, and then iterate on the array. The new implementation is 'random' enough as every time we just select one candidate. The problem for the old implementation is that, it will create an array every time when we want to get a candidate, if we have tens of thousands regions, we will create an array with tens of thousands length everytime, which causes big GC pressure and slow down the balancer execution. --- * [HBASE-25734](https://issues.apache.org/jira/browse/HBASE-25734) | *Minor* | **Backport HBASE-24305 to branch-2.4** The following method was added to ServerName - #valueOf(Address, long) --- * [HBASE-25199](https://issues.apache.org/jira/browse/HBASE-25199) | *Minor* | **Remove HStore#getStoreHomedir** Moved the following methods from HStore to HRegionFileSystem - #getStoreHomedir(Path, RegionInfo, byte[]) - #getStoreHomedir(Path, String, byte[]) --- * [HBASE-25685](https://issues.apache.org/jira/browse/HBASE-25685) | *Major* | **asyncprofiler2.0 no longer supports svg; wants html** If asyncprofiler 1.x, all is good. If asyncprofiler 2.x and it is hbase-2.3.x or hbase-2.4.x, add '?output=html' to get flamegraphs from the profiler. Otherwise, if hbase-2.5+ and asyncprofiler2, all works. If asyncprofiler1 and hbase-2.5+, you may have to add '?output=svg' to the query. --- * [HBASE-25518](https://issues.apache.org/jira/browse/HBASE-25518) | *Major* | **Support separate child regions to different region servers** Config key for enable/disable automatically separate child regions to different region servers in the procedure of split regions. One child will be kept to the server where parent region is on, and the other child will be assigned to a random server. hbase.master.auto.separate.child.regions.after.split.enabled Default setting is false/off. --- * [HBASE-25374](https://issues.apache.org/jira/browse/HBASE-25374) | *Minor* | **Make REST Client connection and socket time out configurable** Configuration parameter to set rest client connection timeout "hbase.rest.client.conn.timeout" Default is 2 \* 1000 "hbase.rest.client.socket.timeout" Default of 30 \* 1000 --- * [HBASE-25587](https://issues.apache.org/jira/browse/HBASE-25587) | *Major* | **[hbck2] Schedule SCP for all unknown servers** Adds scheduleSCPsForUnknownServers to Hbck Service. --- * [HBASE-25636](https://issues.apache.org/jira/browse/HBASE-25636) | *Minor* | **Expose HBCK report as metrics** Expose HBCK repost results in metrics, includes: "orphanRegionsOnRS", "orphanRegionsOnFS", "inconsistentRegions", "holes", "overlaps", "unknownServerRegions" and "emptyRegionInfoRegions". --- * [HBASE-24305](https://issues.apache.org/jira/browse/HBASE-24305) | *Minor* | **Handle deprecations in ServerName** The following methods were removed or made private from ServerName (due to HBASE-17624): - getHostNameMinusDomain(String): Was made private without a replacement. - parseHostname(String): Use #valueOf(String) instead. - parsePort(String): Use #valueOf(String) instead. - parseStartcode(String): Use #valueOf(String) instead. - getServerName(String, int, long): Was made private. Use #valueOf(String, int, long) instead. - getServerName(String, long): Use #valueOf(String, long) instead. - getHostAndPort(): Use #getAddress() instead. - getServerStartcodeFromServerName(String): Use instance of ServerName to pull out start code) - getServerNameLessStartCode(String): Use #getAddress() instead. # HBASE 2.4.2 Release Notes These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements. --- * [HBASE-25492](https://issues.apache.org/jira/browse/HBASE-25492) | *Major* | **Create table with rsgroup info in branch-2** HBASE-25492 added a new interface in TableDescriptor which allows user to define RSGroup name while creating or modifying a table. --- * [HBASE-25460](https://issues.apache.org/jira/browse/HBASE-25460) | *Major* | **Expose drainingServers as cluster metric** Exposed new jmx metrics: "draininigRegionServers" and "numDrainingRegionServers" to provide "comma separated names for regionservers that are put in draining mode" and "num of such regionservers" respectively. --- * [HBASE-25615](https://issues.apache.org/jira/browse/HBASE-25615) | *Major* | **Upgrade java version in pre commit docker file** jdk8u232-b09 -\> jdk8u282-b08 jdk-11.0.6\_10 -\> jdk-11.0.10\_9 --- * [HBASE-23887](https://issues.apache.org/jira/browse/HBASE-23887) | *Major* | **New L1 cache : AdaptiveLRU** Introduced new L1 cache: AdaptiveLRU. This is supposed to provide better performance than default LRU cache. Set config key "hfile.block.cache.policy" to "AdaptiveLRU" in hbase-site in order to start using this new cache. # HBASE 2.4.1 Release Notes These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements. --- * [HBASE-25449](https://issues.apache.org/jira/browse/HBASE-25449) | *Major* | **'dfs.client.read.shortcircuit' should not be set in hbase-default.xml** The presence of HDFS short-circuit read configuration properties in hbase-default.xml inadvertently causes short-circuit reads to not happen inside of RegionServers, despite short-circuit reads being enabled in hdfs-site.xml. --- * [HBASE-25333](https://issues.apache.org/jira/browse/HBASE-25333) | *Major* | **Add maven enforcer rule to ban VisibleForTesting imports** Ban the imports of guava VisiableForTesting, which means you should not use this annotation in HBase any more. For IA.Public and IA.LimitedPrivate classes, typically you should not expose any test related fields/methods there, and if you want to hide something, use IA.Private on the specific fields/methods. For IA.Private classes, if you want to expose something only for tests, use the RestrictedApi annotation from error prone, where it could cause a compilation error if someone break the rule in the future. --- * [HBASE-25441](https://issues.apache.org/jira/browse/HBASE-25441) | *Critical* | **add security check for some APIs in RSRpcServices** RsRpcServices APIs that can be accessed only through Admin rights: - stopServer - updateFavoredNodes - updateConfiguration - clearRegionBlockCache - clearSlowLogsResponses --- * [HBASE-25432](https://issues.apache.org/jira/browse/HBASE-25432) | *Blocker* | **we should add security checks for setTableStateInMeta and fixMeta** setTableStateInMeta and fixMeta can be accessed only through Admin rights --- * [HBASE-25318](https://issues.apache.org/jira/browse/HBASE-25318) | *Minor* | **Configure where IntegrationTestImportTsv generates HFiles** Added IntegrationTestImportTsv.generatedHFileFolder configuration property to override the default location in IntegrationTestImportTsv. Useful for running the integration test when HDFS Transparent Encryption is enabled. --- * [HBASE-25456](https://issues.apache.org/jira/browse/HBASE-25456) | *Critical* | **setRegionStateInMeta need security check** setRegionStateInMeta can be accessed only through Admin rights # HBASE 2.4.0 Release Notes These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements. --- * [HBASE-25127](https://issues.apache.org/jira/browse/HBASE-25127) | *Major* | **Enhance PerformanceEvaluation to profile meta replica performance.** Three new commands are added to PE: metaWrite, metaRandomRead and cleanMeta. Usage example: hbase pe --rows=100000 metaWrite 1 hbase pe --nomapreduce --rows=100000 metaRandomRead 32 hbase pe --rows=100000 cleanMeta 1 metaWrite and cleanMeta should be run with only 1 thread and the same number of rows so all the rows inserted will be cleaned up properly. metaRandomRead can be run with multiple threads. The rows option should set to within the range of rows inserted by metaWrite --- * [HBASE-25237](https://issues.apache.org/jira/browse/HBASE-25237) | *Major* | **'hbase master stop' shuts down the cluster, not the master only** \`hbase master stop\` should shutdown only master by default. 1. Help added to \`hbase master stop\`: To stop cluster, use \`stop-hbase.sh\` or \`hbase master stop --shutDownCluster\` 2. Help added to \`stop-hbase.sh\`: stop-hbase.sh can only be used for shutting down entire cluster. To shut down (HMaster\|HRegionServer) use hbase-daemon.sh stop (master\|regionserver) --- * [HBASE-25242](https://issues.apache.org/jira/browse/HBASE-25242) | *Critical* | **Add Increment/Append support to RowMutations** After HBASE-25242, we can add Increment/Append operations to RowMutations and perform those operations atomically in a single row. HBASE-25242 includes an API change where the mutateRow() API returns a Result object to get the result of the Increment/Append operations. --- * [HBASE-25263](https://issues.apache.org/jira/browse/HBASE-25263) | *Major* | **Change encryption key generation algorithm used in the HBase shell** Since the backward-compatible change we introduced in HBASE-25263, we use the more secure PBKDF2WithHmacSHA384 key generation algorithm (instead of PBKDF2WithHmacSHA1) to generate a secret key for HFile / WalFile encryption, when the user is defining a string encryption key in the hbase shell. --- * [HBASE-24268](https://issues.apache.org/jira/browse/HBASE-24268) | *Minor* | **REST and Thrift server do not handle the "doAs" parameter case insensitively** This change allows the REST and Thrift servers to handle the "doAs" parameter case-insensitively, which is deemed as correct per the "specification" provided by the Hadoop community. --- * [HBASE-25278](https://issues.apache.org/jira/browse/HBASE-25278) | *Minor* | **Add option to toggle CACHE\_BLOCKS in count.rb** A new option, CACHE\_BLOCKS, was added to the \`count\` shell command which will force the data for a table to be loaded into the block cache. By default, the \`count\` command will not cache any blocks. This option can serve as a means to for a table's data to be loaded into block cache on demand. See the help message on the count shell command for usage details. --- * [HBASE-18070](https://issues.apache.org/jira/browse/HBASE-18070) | *Critical* | **Enable memstore replication for meta replica** "Async WAL Replication" [1] was added by HBASE-11183 "Timeline Consistent region replicas - Phase 2 design" but only for user-space tables. This feature adds "Async WAL Replication" for the hbase:meta table. It also adds a client 'LoadBalance' mode that has reads go to replicas first and to the primary only on fail so as to shed read load from the primary to alleviate \*hotspotting\* on the hbase:meta Region. Configuration is as it was for the user-space 'Async WAL Replication'. See [2] and [3] for details on how to enable. 1. http://hbase.apache.org/book.html#async.wal.replication 2. http://hbase.apache.org/book.html#async.wal.replication.meta 3. http://hbase.apache.org/book.html#\_async\_wal\_replication\_for\_meta\_table\_as\_of\_hbase\_2\_4\_0 --- * [HBASE-25126](https://issues.apache.org/jira/browse/HBASE-25126) | *Major* | **Add load balance logic in hbase-client to distribute read load over meta replica regions.** See parent issue, HBASE-18070, release notes for how to enable. --- * [HBASE-25026](https://issues.apache.org/jira/browse/HBASE-25026) | *Minor* | **Create a metric to track full region scans RPCs** Adds a new metric where we collect the number of full region scan requests at the RPC layer. This will be collected under "name" : "Hadoop:service=HBase,name=RegionServer,sub=Server" --- * [HBASE-25253](https://issues.apache.org/jira/browse/HBASE-25253) | *Major* | **Deprecated master carrys regions related methods and configs** Since 2.4.0, deprecated all master carrys regions related methods(LoadBalancer,BaseLoadBalancer,ZNodeClearer) and configs(hbase.balancer.tablesOnMaster, hbase.balancer.tablesOnMaster.systemTablesOnly), they will be removed in 3.0.0. --- * [HBASE-20598](https://issues.apache.org/jira/browse/HBASE-20598) | *Major* | **Upgrade to JRuby 9.2** The HBase shell now relies on JRuby 9.2. This is a new major version change for JRuby. The most significant change is Ruby compatibility changed from Ruby 2.3 to Ruby 2.5. For more detailed changes please see [the JRuby release announcement for the start of the 9.2 series](https://www.jruby.org/2018/05/24/jruby-9-2-0-0.html) as well as the [general release announcement page for updates since that version](https://www.jruby.org/news). The runtime dependency versions present on the server side classpath for the Joni (now 2.1.31) and JCodings (now 1.0.55) libraries have also been updated to match those found in the JRuby version shipped with HBase. These version changes are maintenance releases and should be backwards compatible when updated in tandem. --- * [HBASE-25181](https://issues.apache.org/jira/browse/HBASE-25181) | *Major* | **Add options for disabling column family encryption and choosing hash algorithm for wrapped encryption keys.** This change adds options for disabling column family encryption and choosing hash algorithm for wrapped encryption keys. Changes are done such that defaults will keep the same behavior prior to this issue. Prior to this change HBase always used the MD5 hash algorithm to store a hash for encryption keys. This hash is needed to verify the secret key of the subject. (e.g. making sure that the same secrey key is used during encrypted HFile read and write). The MD5 algorithm is considered weak, and can not be used in some (e.g. FIPS compliant) clusters. Having a configurable hash enables us to use newer and more secure hash algorithms like SHA-384 or SHA-512 (which are FIPS compliant). The hash is set via the configuration option `hbase.crypto.key.hash.algorithm`. It should be set to a JDK `MessageDigest` algorithm like "MD5", "SHA-256" or "SHA-384". The default is "MD5" for backward compatibility. Alternatively, clusters which rely on an encryption at rest mechanism outside of HBase (e.g. those offered by HDFS) and wish to ensure HBase's encryption at rest system is inactive can set `hbase.crypto.enabled` to `false`. --- * [HBASE-25238](https://issues.apache.org/jira/browse/HBASE-25238) | *Critical* | **Upgrading HBase from 2.2.0 to 2.3.x fails because of “Message missing required fields: state”** Fixes master procedure store migration issues going from 2.0.x to 2.2.x and/or 2.3.x. Also fixes failed heartbeat parse during rolling upgrade from 2.0.x. to 2.3.x. --- * [HBASE-25234](https://issues.apache.org/jira/browse/HBASE-25234) | *Major* | **[Upgrade]Incompatibility in reading RS report from 2.1 RS when Master is upgraded to a version containing HBASE-21406** Fixes so auto-migration of master procedure store works again going from 2.0.x =\> 2.2+. Also make it so heartbeats work when rolling upgrading from 2.0.x =\> 2.3+. --- * [HBASE-25212](https://issues.apache.org/jira/browse/HBASE-25212) | *Major* | **Optionally abort requests in progress after deciding a region should close** If hbase.regionserver.close.wait.abort is set to true, interrupt RPC handler threads holding the region close lock. Until requests in progress can be aborted, wait on the region close lock for a configurable interval (specified by hbase.regionserver.close.wait.time.ms, default 60000 (1 minute)). If we have failed to acquire the close lock after this interval elapses, if allowed (also specified by hbase.regionserver.close.wait.abort), abort the regionserver. We will attempt to interrupt any running handlers every hbase.regionserver.close.wait.interval.ms (default 10000 (10 seconds)) until either the close lock is acquired or we reach the maximum wait time. --- * [HBASE-25167](https://issues.apache.org/jira/browse/HBASE-25167) | *Major* | **Normalizer support for hot config reloading** This patch adds [dynamic configuration](https://hbase.apache.org/book.html#dyn_config) support for the following configuration keys related to the normalizer: * hbase.normalizer.throughput.max_bytes_per_sec * hbase.normalizer.split.enabled * hbase.normalizer.merge.enabled * hbase.normalizer.min.region.count * hbase.normalizer.merge.min_region_age.days * hbase.normalizer.merge.min_region_size.mb --- * [HBASE-25224](https://issues.apache.org/jira/browse/HBASE-25224) | *Major* | **Maximize sleep for checking meta and namespace regions availability** Changed the max sleep time during meta and namespace regions availability check to be 60 sec. Previously there was no such cap --- * [HBASE-24628](https://issues.apache.org/jira/browse/HBASE-24628) | *Major* | **Region normalizer now respects a rate limit** Introduces a new configuration, `hbase.normalizer.throughput.max_bytes_per_sec`, for specifying a limit on the throughput of actions executed by the normalizer. Note that while this configuration value is in bytes, the minimum honored valued is `1,000,000`, or `1m`. Supports values configured using the human-readable suffixes honored by [`Configuration.getLongBytes`](https://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html#getLongBytes-java.lang.String-long-) --- * [HBASE-14067](https://issues.apache.org/jira/browse/HBASE-14067) | *Major* | **bundle ruby files for hbase shell into a jar.** The `hbase-shell` artifact now contains the ruby files that implement the hbase shell. There should be no downstream impact for users of the shell that rely on the `hbase shell` command. Folks that wish to include the HBase ruby classes defined for the shell in their own JRuby scripts should add the `hbase-shell.jar` file to their classpath rather than add `${HBASE_HOME}/lib/ruby` to their load paths. --- * [HBASE-24875](https://issues.apache.org/jira/browse/HBASE-24875) | *Major* | **Remove the force param for unassign since it dose not take effect any more** The "force" flag to various unassign commands (java api, shell, etc) has been ignored since HBase 2. As of this change the methods that take it are now deprecated. Downstream users should stop passing/using this flag. The Admin and AsyncAdmin Java APIs will have the deprecated version of the unassign method with a force flag removed in HBase 4. Callers can safely continue to use the deprecated API until then; the internal implementation just calls the new method. The MasterObserver coprocessor API deprecates the `preUnassign` and `postUnassign` methods that include the force parameter and replaces them with versions that omit this parameter. The deprecated methods will be removed from the API in HBase 3. Until then downstream coprocessor implementations can safely continue to *just* implement the deprecated method if they wish; the replacement methods provide a default implementation that calls the deprecated method with force set to `false`. --- * [HBASE-25099](https://issues.apache.org/jira/browse/HBASE-25099) | *Major* | **Change meta replica count by altering meta table descriptor** Now you can change the region replication config for meta table by altering meta table. The old "hbase.meta.replica.count" is deprecated and will be removed in 4.0.0. But if it is set, we will still honor it, which means, when master restart, if we find out that the value of 'hbase.meta.replica.count' is different with the region replication config of meta table, we will schedule an alter table operation to change the region replication config to the value you configured for 'hbase.meta.replica.count'. --- * [HBASE-23834](https://issues.apache.org/jira/browse/HBASE-23834) | *Major* | **HBase fails to run on Hadoop 3.3.0/3.2.2/3.1.4 due to jetty version mismatch** Use shaded json and jersey in HBase. Ban the imports of unshaded json and jersey in code. --- * [HBASE-25163](https://issues.apache.org/jira/browse/HBASE-25163) | *Major* | **Increase the timeout value for nightly jobs** Increase timeout value for nightly jobs to 16 hours since the new build machines are dedicated to hbase project, so we are allowed to use it all the time. --- * [HBASE-22976](https://issues.apache.org/jira/browse/HBASE-22976) | *Major* | **[HBCK2] Add RecoveredEditsPlayer** WALPlayer can replay the content of recovered.edits directories. Side-effect is that WAL filename timestamp is now factored when setting start/end times for WALInputFormat; i.e. wal.start.time and wal.end.time values on a job context. Previous we looked at wal.end.time only. Now we consider wal.start.time too. If a file has a name outside of wal.start.time\<-\>wal.end.time, it'll be by-passed. This change-in-behavior will make it easier on operator crafting timestamp filters processing WALs. --- * [HBASE-25165](https://issues.apache.org/jira/browse/HBASE-25165) | *Minor* | **Change 'State time' in UI so sorts** Start time on the Master UI is now displayed using ISO8601 format instead of java Date#toString(). --- * [HBASE-25124](https://issues.apache.org/jira/browse/HBASE-25124) | *Major* | **Support changing region replica count without disabling table** Now you do not need to disable a table before changing its 'region replication' property. If you are decreasing the replica count, the excess region replicas will be closed before reopening other replicas. If you are increasing the replica count, the new region replicas will be opened after reopening the existing replicas. --- * [HBASE-25154](https://issues.apache.org/jira/browse/HBASE-25154) | *Major* | **Set java.io.tmpdir to project build directory to avoid writing std\*deferred files to /tmp** Change the java.io.tmpdir to project.build.directory in surefire-maven-plugin, to avoid writing std\*deferred files to /tmp which may blow up the /tmp disk on our jenkins build node. --- * [HBASE-25055](https://issues.apache.org/jira/browse/HBASE-25055) | *Major* | **Add ReplicationSource for meta WALs; add enable/disable when hbase:meta assigned to RS** Set hbase.region.replica.replication.catalog.enabled to enable async WAL Replication for hbase:meta region replicas. Its off by default. Defaults to the RegionReadReplicaEndpoint.class shipping edits -- set hbase.region.replica.catalog.replication to target a different endpoint implementation. --- * [HBASE-25109](https://issues.apache.org/jira/browse/HBASE-25109) | *Major* | **Add MR Counters to WALPlayer; currently hard to tell if it is doing anything** Adds a WALPlayer to MR Counter output: org.apache.hadoop.hbase.mapreduce.WALPlayer$Counter CELLS\_READ=89574 CELLS\_WRITTEN=89572 DELETES=64 PUTS=5305 WALEDITS=4375 --- * [HBASE-24896](https://issues.apache.org/jira/browse/HBASE-24896) | *Major* | **'Stuck' in static initialization creating RegionInfo instance** 1. Untangle RegionInfo, RegionInfoBuilder, and MutableRegionInfo static initializations. 2. Undo static initializing references from RegionInfo to RegionInfoBuilder. 3. Mark RegionInfo#UNDEFINED IA.Private and deprecated; it is for internal use only and likely to be removed in HBase4. (sub-task HBASE-24918) 4. Move MutableRegionInfo from inner-class of RegionInfoBuilder to be (package private) standalone. (sub-task HBASE-24918) --- * [HBASE-24956](https://issues.apache.org/jira/browse/HBASE-24956) | *Major* | **ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.** Without this fix there are situations in which locateRegionInMeta() on a client is not bound by a timeout. This happens because of a global lock whose acquisition was not under any lock scope. This affects client facing API calls that rely on this method to locate a table region in meta. This fix brings the lock acquisition under the scope of "hbase.client.meta.operation.timeout" and that guarantees a bounded wait time. --- * [HBASE-24764](https://issues.apache.org/jira/browse/HBASE-24764) | *Minor* | **Add support of adding base peer configs via hbase-site.xml for all replication peers.** Adds a new configuration parameter "hbase.replication.peer.base.config" which accepts a semi-colon separated key=CSV pairs (example: k1=v1;k2=v2_1,v3...). When this configuration is set on the server side, these kv pairs are added to every peer configuration if not already set. Peer specific configuration overrides have precedence over the above default configuration. This is useful in cases when some configuration has to be set for all the peers by default and one does not want to add to every peer definition. --- * [HBASE-24994](https://issues.apache.org/jira/browse/HBASE-24994) | *Minor* | **Add hedgedReadOpsInCurThread metric** Expose Hadoop hedgedReadOpsInCurThread metric to HBase. This metric counts the number of times the hedged reads service executor rejected a read task, falling back to the current thread. This will help determine the proper size of the thread pool (dfs.client.hedged.read.threadpool.size). --- * [HBASE-24776](https://issues.apache.org/jira/browse/HBASE-24776) | *Major* | **[hbtop] Support Batch mode** HBASE-24776 added the following command line parameters to hbtop: \| Argument \| Description \| \|---\|---\| \| -n,--numberOfIterations \ \| The number of iterations \| \| -O,--outputFieldNames \| Print each of the available field names on a separate line, then quit \| \| -f,--fields \ \| Show only the given fields. Specify comma separated fields to show multiple fields \| \| -s,--sortField \ \| The initial sort field. You can prepend a \`+' or \`-' to the field name to also override the sort direction. A leading \`+' will force sorting high to low, whereas a \`-' will ensure a low to high ordering \| \| -i,--filters \ \| The initial filters. Specify comma separated filters to set multiple filters \| \| -b,--batchMode \| Starts hbtop in Batch mode, which could be useful for sending output from hbtop to other programs or to a file. In this mode, hbtop will not accept input and runs until the iterations limit you've set with the \`-n' command-line option or until killed \| --- * [HBASE-24602](https://issues.apache.org/jira/browse/HBASE-24602) | *Major* | **Add Increment and Append support to CheckAndMutate** Summary of the change of HBASE-24602: - Add \`build(Increment)\` and \`build(Append)\` methods to the \`Builder\` class of the \`CheckAndMutate\` class. After this change, we can perform checkAndIncrement/Append operations as follows: \`\`\` // Build a CheckAndMutate object with a Increment object CheckAndMutate checkAndMutate = CheckAndMutate.newBuilder(row) .ifEquals(family, qualifier, value) .build(increment); // Perform a CheckAndIncrement operation CheckAndMutateResult checkAndMutateResult = table.checkAndMutate(checkAndMutate); // Get whether or not the CheckAndIncrement operation is successful boolean success = checkAndMutateResult.isSuccess(); // Get the result of the increment operation Result result = checkAndMutateResult.getResult(); \`\`\` - After this change, \`HRegion.batchMutate()\` is used for increment/append operations. - As the side effect of the above change, the following coprocessor methods of RegionObserver are called when increment/append operations are performed: - preBatchMutate() - postBatchMutate() - postBatchMutateIndispensably() --- * [HBASE-24694](https://issues.apache.org/jira/browse/HBASE-24694) | *Major* | **Support flush a single column family of table** Adds option for the flush command to flush all stores from the specified column family only, among all regions of the given table (stores from other column families on this table would not get flushed). --- * [HBASE-24625](https://issues.apache.org/jira/browse/HBASE-24625) | *Critical* | **AsyncFSWAL.getLogFileSizeIfBeingWritten does not return the expected synced file length.** We add a method getSyncedLength in WALProvider.WriterBase interface for WALFileLengthProvider used for replication, considering the case if we use AsyncFSWAL,we write to 3 DNs concurrently,according to the visibility guarantee of HDFS, the data will be available immediately when arriving at DN since all the DNs will be considered as the last one in pipeline.This means replication may read uncommitted data and replicate it to the remote cluster and cause data inconsistency.The method WriterBase#getLength may return length which just in hdfs client buffer and not successfully synced to HDFS, so we use this method WriterBase#getSyncedLength to return the length successfully synced to HDFS and replication thread could only read writing WAL file limited by this length. see also HBASE-14004 and this document for more details: https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit# Before this patch, replication may read uncommitted data and replicate it to the slave cluster and cause data inconsistency between master and slave cluster, we could use FSHLog instead of AsyncFSWAL to reduce probability of inconsistency without this patch applied. --- * [HBASE-24779](https://issues.apache.org/jira/browse/HBASE-24779) | *Minor* | **Improve insight into replication WAL readers hung on checkQuota** New metrics are exposed, on the global source, for replication which indicate the "WAL entry buffer" that was introduced in HBASE-15995. When this usage reaches the limit, that RegionServer will cease to read more data for the sake of trying to replicate it. This usage (and limit) is local to each RegionServer is shared across all peers being handled by that RegionServer. --- * [HBASE-24404](https://issues.apache.org/jira/browse/HBASE-24404) | *Major* | **Support flush a single column family of region** This adds an extra "flush" command option that allows for specifying an individual family to have its store flushed. Usage: flush 'REGIONNAME','FAMILYNAME' flush 'ENCODED\_REGIONNAME','FAMILYNAME' --- * [HBASE-24805](https://issues.apache.org/jira/browse/HBASE-24805) | *Major* | **HBaseTestingUtility.getConnection should be threadsafe** Users of `HBaseTestingUtility` can now safely call the `getConnection` method from multiple threads. As a consequence of refactoring to improve the thread safety of the HBase testing classes, the protected `conf` member of the `HBaseCommonTestingUtility` class has been marked final. Downstream users who extend from the class hierarchy rooted at this class will need to pass the Configuration instance they want used to their super constructor rather than overwriting the instance variable. --- * [HBASE-24767](https://issues.apache.org/jira/browse/HBASE-24767) | *Major* | **Change default to false for HBASE-15519 per-user metrics** Disables per-user metrics. They were enabled by default for the first time in hbase-2.3.0 but they need some work before they can be on all the time (See HBASE-15519) --- * [HBASE-24704](https://issues.apache.org/jira/browse/HBASE-24704) | *Major* | **Make the Table Schema easier to view even there are multiple families** Improve the layout of column family from vertical to horizontal in table UI. --- * [HBASE-11686](https://issues.apache.org/jira/browse/HBASE-11686) | *Minor* | **Shell code should create a binding / irb workspace instead of polluting the root namespace** In shell, all HBase constants and commands have been moved out of the top-level and into an IRB Workspace. Piped stdin and scripts passed by name to the shell will be evaluated within this workspace. If you absolutely need the top-level definitions, use the new compatibility flag, ie. hbase shell --top-level-defs or hbase shell --top-level-defs script2run.rb. --- * [HBASE-24632](https://issues.apache.org/jira/browse/HBASE-24632) | *Major* | **Enable procedure-based log splitting as default in hbase3** Enables procedure-based distributed WAL splitting as default (HBASE-20610). To use 'classic' zk-coordinated splitting instead, set 'hbase.split.wal.zk.coordinated' to 'true'. --- * [HBASE-24698](https://issues.apache.org/jira/browse/HBASE-24698) | *Major* | **Turn OFF Canary WebUI as default** Flips default for 'HBASE-23994 Add WebUI to Canary' The UI defaulted to on at port 16050. This JIRA changes it so new UI is off by default. To enable the UI, set property 'hbase.canary.info.port' to the port you want the UI to use. --- * [HBASE-24650](https://issues.apache.org/jira/browse/HBASE-24650) | *Major* | **Change the return types of the new checkAndMutate methods introduced in HBASE-8458** HBASE-24650 introduced CheckAndMutateResult class and changed the return type of checkAndMutate methods to this class in order to support CheckAndMutate with Increment/Append. CheckAndMutateResult class has two fields, one is \*success\* that indicates whether the operation is successful or not, and the other one is \*result\* that's the result of the operation and is used for CheckAndMutate with Increment/Append. The new APIs for the Table interface: \`\`\` /\*\* \* checkAndMutate that atomically checks if a row matches the specified condition. If it does, \* it performs the specified action. \* \* @param checkAndMutate The CheckAndMutate object. \* @return A CheckAndMutateResult object that represents the result for the CheckAndMutate. \* @throws IOException if a remote or network exception occurs. \*/ default CheckAndMutateResult checkAndMutate(CheckAndMutate checkAndMutate) throws IOException { return checkAndMutate(Collections.singletonList(checkAndMutate)).get(0); } /\*\* \* Batch version of checkAndMutate. The specified CheckAndMutates are batched only in the sense \* that they are sent to a RS in one RPC, but each CheckAndMutate operation is still executed \* atomically (and thus, each may fail independently of others). \* \* @param checkAndMutates The list of CheckAndMutate. \* @return A list of CheckAndMutateResult objects that represents the result for each \* CheckAndMutate. \* @throws IOException if a remote or network exception occurs. \*/ default List\ checkAndMutate(List\ checkAndMutates) throws IOException { throw new NotImplementedException("Add an implementation!"); } {code} The new APIs for the AsyncTable interface: {code} /\*\* \* checkAndMutate that atomically checks if a row matches the specified condition. If it does, \* it performs the specified action. \* \* @param checkAndMutate The CheckAndMutate object. \* @return A {@link CompletableFuture}s that represent the result for the CheckAndMutate. \*/ CompletableFuture\ checkAndMutate(CheckAndMutate checkAndMutate); /\*\* \* Batch version of checkAndMutate. The specified CheckAndMutates are batched only in the sense \* that they are sent to a RS in one RPC, but each CheckAndMutate operation is still executed \* atomically (and thus, each may fail independently of others). \* \* @param checkAndMutates The list of CheckAndMutate. \* @return A list of {@link CompletableFuture}s that represent the result for each \* CheckAndMutate. \*/ List\\> checkAndMutate( List\ checkAndMutates); /\*\* \* A simple version of batch checkAndMutate. It will fail if there are any failures. \* \* @param checkAndMutates The list of rows to apply. \* @return A {@link CompletableFuture} that wrapper the result list. \*/ default CompletableFuture\\> checkAndMutateAll( List\ checkAndMutates) { return allOf(checkAndMutate(checkAndMutates)); } \`\`\` --- * [HBASE-24671](https://issues.apache.org/jira/browse/HBASE-24671) | *Major* | **Add excludefile and designatedfile options to graceful\_stop.sh** Add excludefile and designatedfile options to graceful\_stop.sh. Designated file with \ per line as unload targets. Exclude file should have \ per line. We do not unload regions to hostnames given in exclude file. Here is a simple example using graceful\_stop.sh with designatedfile option: ./bin/graceful\_stop.sh --maxthreads 4 --designatedfile /path/designatedfile hostname The usage of the excludefile option is the same as the above. --- * [HBASE-24560](https://issues.apache.org/jira/browse/HBASE-24560) | *Major* | **Add a new option of designatedfile in RegionMover** Add a new option "designatedfile" in RegionMover. If designated file is present with some contents, we will unload regions to hostnames provided in designated file. Designated file should have 'host:port' per line. --- * [HBASE-24289](https://issues.apache.org/jira/browse/HBASE-24289) | *Major* | **Heterogeneous Storage for Date Tiered Compaction** Enhance DateTieredCompaction to support HDFS storage policy within one class family. # First you need enable DTCP. To turn on Date Tiered Compaction (It is not recommended to turn on for the whole cluster because that will put meta table on it too and random get on meta table will be impacted): hbase.hstore.compaction.compaction.policy=org.apache.hadoop.hbase.regionserver.compactions.DateTieredCompactionPolicy ## Parameters for Date Tiered Compaction: hbase.hstore.compaction.date.tiered.max.storefile.age.millis: Files with max-timestamp smaller than this will no longer be compacted.Default at Long.MAX\_VALUE. hbase.hstore.compaction.date.tiered.base.window.millis: base window size in milliseconds. Default at 6 hours. hbase.hstore.compaction.date.tiered.windows.per.tier: number of windows per tier. Default at 4. hbase.hstore.compaction.date.tiered.incoming.window.min: minimal number of files to compact in the incoming window. Set it to expected number of files in the window to avoid wasteful compaction. Default at 6. # Then enable HDTCP(Heterogeneous Date Tiered Compaction) as follow example configurations: hbase.hstore.compaction.date.tiered.storage.policy.enable=true hbase.hstore.compaction.date.tiered.hot.window.age.millis=3600000 hbase.hstore.compaction.date.tiered.hot.window.storage.policy=ALL\_SSD hbase.hstore.compaction.date.tiered.warm.window.age.millis=20600000 hbase.hstore.compaction.date.tiered.warm.window.storage.policy=ONE\_SSD hbase.hstore.compaction.date.tiered.cold.window.storage.policy=HOT ## It is better to enable WAL and flushing HFile storage policy with HDTCP. You can tune follow settings as well: hbase.wal.storage.policy=ALL\_SSD create 'table',{NAME=\>'f1',CONFIGURATION=\>{'hbase.hstore.block.storage.policy'=\>'ALL\_SSD'}} # Disable HDTCP as follow: hbase.hstore.compaction.date.tiered.storage.policy.enable=false --- * [HBASE-24648](https://issues.apache.org/jira/browse/HBASE-24648) | *Major* | **Remove the legacy 'forceSplit' related code at region server side** Add a canSplit method to RegionSplitPolicy to determine whether we can split a region. Usually it is not related to RegionSplitPolicy so in the default implementation, it will test whether region is available and does not have reference file, but in DisabledRegionSplitPolicy, we will always return false. --- * [HBASE-24382](https://issues.apache.org/jira/browse/HBASE-24382) | *Major* | **Flush partial stores of region filtered by seqId when archive wal due to too many wals** Change the flush level from region to store when there are too many wals, benefit from this we can reduce unnessary flush tasks and small hfiles. --- * [HBASE-24038](https://issues.apache.org/jira/browse/HBASE-24038) | *Major* | **Add a metric to show the locality of ssd in table.jsp** Add a metric to show the locality of ssd in table.jsp, and move the locality related metrics to a new tab named localities. --- * [HBASE-8458](https://issues.apache.org/jira/browse/HBASE-8458) | *Major* | **Support for batch version of checkAndMutate()** HBASE-8458 introduced CheckAndMutate class that's used to perform CheckAndMutate operations. Use the builder class to instantiate a CheckAndMutate object. This builder class is fluent style APIs, the code are like: \`\`\` // A CheckAndMutate operation where do the specified action if the column (specified by the family and the qualifier) of the row equals to the specified value CheckAndMutate checkAndMutate = CheckAndMutate.newBuilder(row) .ifEquals(family, qualifier, value) .build(put); // A CheckAndMutate operation where do the specified action if the column (specified by the // family and the qualifier) of the row doesn't exist CheckAndMutate checkAndMutate = CheckAndMutate.newBuilder(row) .ifNotExists(family, qualifier) .build(put); // A CheckAndMutate operation where do the specified action if the row matches the filter CheckAndMutate checkAndMutate = CheckAndMutate.newBuilder(row) .ifMatches(filter) .build(delete); \`\`\` And This added new checkAndMutate APIs to the Table and AsyncTable interfaces, and deprecated the old checkAndMutate APIs. The example code for the new APIs are as follows: \`\`\` Table table = ...; CheckAndMutate checkAndMutate = ...; // Perform the checkAndMutate operation boolean success = table.checkAndMutate(checkAndMutate); CheckAndMutate checkAndMutate1 = ...; CheckAndMutate checkAndMutate2 = ...; // Batch version List\ successList = table.checkAndMutate(Arrays.asList(checkAndMutate1, checkAndMutate2)); \`\`\` This also has Protocol Buffers level changes. Old clients without this patch will work against new servers with this patch. However, new clients will break against old servers without this patch for checkAndMutate with RM and mutateRow. So, for rolling upgrade, we will need to upgrade servers first, and then roll out the new clients. --- * [HBASE-24471](https://issues.apache.org/jira/browse/HBASE-24471) | *Major* | **The way we bootstrap meta table is confusing** Move all the meta initialization code in MasterFileSystem and HRegionServer to InitMetaProcedure. Add a new step for InitMetaProcedure called INIT\_META\_WRITE\_FS\_LAYOUT to place the moved code. This is an incompatible change, but should not have much impact. InitMetaProcedure will only be executed once when bootstraping a fresh new cluster, so typically this will not effect rolling upgrading. And even if you hit this problem, as long as InitMetaProcedure has not been finished, we can make sure that there is no user data in the cluster, you can just clean up the cluster and try again. There will be no data loss. --- * [HBASE-24017](https://issues.apache.org/jira/browse/HBASE-24017) | *Major* | **Turn down flakey rerun rate on all but hot branches** Changed master, branch-2, and branch-2.1 to twice a day. Left branch-2.3, branch-2.2, and branch-1 at every 4 hours. Changed branch-1.4 and branch-1.3 to @daily (1.3 was running every hour). # HBASE 2.3.0 Release Notes These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements. --- * [HBASE-24603](https://issues.apache.org/jira/browse/HBASE-24603) | *Critical* | **Zookeeper sync() call is async** Fixes a couple of bugs in ZooKeeper interaction. Firstly, zk sync() call that is used to sync the lagging followers with leader so that the client sees a consistent snapshot state was actually asynchronous under the hood. We make it synchronous for correctness. Second, zookeeper events are now processed in a separate thread rather than doing it in the thread context of zookeeper client connection. This decoupling frees up client connection quickly and avoids deadlocks. --- * [HBASE-24631](https://issues.apache.org/jira/browse/HBASE-24631) | *Major* | **Loosen Dockerfile pinned package versions of the "debian-revision"** Update our package version numbers throughout the Dockerfiles to be pinned to their epic:upstream-version components only. Previously we'd specify the full debian package version number, including the debian-revision. This lead to instability as debian packaging details changed. See also [man deb-version](http://manpages.ubuntu.com/manpages/xenial/en/man5/deb-version.5.html) --- * [HBASE-24205](https://issues.apache.org/jira/browse/HBASE-24205) | *Major* | **Create metric to know the number of reads that happens from memstore** Adds a new metric where we collect the number of read requests (tracked per row) whether the row was fetched completely from memstore or it was pulled from files and memstore. The metric is now collected under the mbean for Tables and under the mbean for regions. Under table mbean ie.- 'name": "Hadoop:service=HBase,name=RegionServer,sub=Tables' The new metrics will be listed as {code} "Namespace\_default\_table\_t3\_columnfamily\_f1\_metric\_memstoreOnlyRowReadsCount": 5, "Namespace\_default\_table\_t3\_columnfamily\_f1\_metric\_mixedRowReadsCount": 1, {code} Where the format is Namespace\_\\_table\_\\_columnfamily\_\\_metric\_memstoreOnlyRowReadsCount Namespace\_\\_table\_\\_columnfamily\_\\_metric\_mixedRowReadsCount {code} The same one under the region ie. "name": "Hadoop:service=HBase,name=RegionServer,sub=Regions", comes as {code} "Namespace\_default\_table\_t3\_region\_75a7846f4ac4a2805071a855f7d0dbdc\_store\_f1\_metric\_memstoreOnlyRowReadsCount": 5, "Namespace\_default\_table\_t3\_region\_75a7846f4ac4a2805071a855f7d0dbdc\_store\_f1\_metric\_mixedRowReadsCount": 1, {code} where Namespace\_\\_region\_\\_store\_\\_metric\_memstoreOnlyRowReadsCount Namespace\_\\_region\_\\_store\_\\_metric\_mixedRowReadsCount This is also an aggregate against every store the number of reads that happened purely from the memstore or it was a mixed read that happened from memstore and file. --- * [HBASE-21773](https://issues.apache.org/jira/browse/HBASE-21773) | *Critical* | **rowcounter utility should respond to pleas for help** This adds [-h\|-help] options to rowcounter. Passing either -h or -help will print rowcounter guide as below: $hbase rowcounter -h usage: hbase rowcounter \ [options] [\ \...] Options: --starttime=\ starting time filter to start counting rows from. --endtime=\ end time filter limit, to only count rows up to this timestamp. --range=\ [startKey],[endKey][;[startKey],[endKey]...]] --expectedCount=\ expected number of rows to be count. For performance, consider the following configuration properties: -Dhbase.client.scanner.caching=100 -Dmapreduce.map.speculative=false --- * [HBASE-24217](https://issues.apache.org/jira/browse/HBASE-24217) | *Major* | **Add hadoop 3.2.x support** CI coverage has been extended to include Hadoop 3.2.x for HBase 2.2+. --- * [HBASE-23055](https://issues.apache.org/jira/browse/HBASE-23055) | *Major* | **Alter hbase:meta** Adds being able to edit hbase:meta table schema. For example, hbase(main):006:0\> alter 'hbase:meta', {NAME =\> 'info', DATA\_BLOCK\_ENCODING =\> 'ROW\_INDEX\_V1'} Updating all regions with the new schema... All regions updated. Done. Took 1.2138 seconds You can even add columnfamilies. Howevert, you cannot delete any of the core hbase:meta column families such as 'info' and 'table'. --- * [HBASE-15161](https://issues.apache.org/jira/browse/HBASE-15161) | *Major* | **Umbrella: Miscellaneous improvements from production usage** This ticket summarizes significant improvements and expansion to the metrics surface area. Interested users should review the individual sub-tasks. --- * [HBASE-24545](https://issues.apache.org/jira/browse/HBASE-24545) | *Major* | **Add backoff to SCP check on WAL split completion** Adds backoff in ServerCrashProcedure wait on WAL split to complete if large backlog of files to split (Its possible to avoid SCP blocking, waiting on WALs to split if you use procedure-based splitting -- set 'hbase.split.wal.zk.coordinated' to false to enable procedure based wal splitting.) --- * [HBASE-24524](https://issues.apache.org/jira/browse/HBASE-24524) | *Minor* | **SyncTable logging improvements** Notice this has changed log level for mismatching row keys, originally those were being logged at INFO level, now it's logged at DEBUG level. This is consistent with the logging of mismatching cells. Also, for missing row keys, it now logs row key values in human readable format, making it more meaningful for operators troubleshooting mismatches. --- * [HBASE-24359](https://issues.apache.org/jira/browse/HBASE-24359) | *Major* | **Optionally ignore edits for deleted CFs for replication.** Introduce a new config hbase.replication.drop.on.deleted.columnfamily, default is false. When config to true, the replication will drop the edits for columnfamily that has been deleted from the replication source and target. --- * [HBASE-24418](https://issues.apache.org/jira/browse/HBASE-24418) | *Major* | **Consolidate Normalizer implementations** This change extends the Normalizer with a handful of new configurations. The configuration points supported are: * `hbase.normalizer.split.enabled` Whether to split a region as part of normalization. Default: `true`. * `hbase.normalizer.merge.enabled` Whether to merge a region as part of normalization. Default `true`. * `hbase.normalizer.min.region.count` The minimum number of regions in a table to consider it for merge normalization. Default: 3. * `hbase.normalizer.merge.min_region_age.days` The minimum age for a region to be considered for a merge, in days. Default: 3. * `hbase.normalizer.merge.min_region_size.mb` The minimum size for a region to be considered for a merge, in whole MBs. Default: 1. --- * [HBASE-24309](https://issues.apache.org/jira/browse/HBASE-24309) | *Major* | **Avoid introducing log4j and slf4j-log4j dependencies for modules other than hbase-assembly** Add a hbase-logging module, put the log4j related code in this module only so other modules do not need to depend on log4j at compile scope. See the comments of Log4jUtils and InternalLog4jUtils for more details. Add a log4j.properties to the test jar of hbase-logging module, so for other sub modules we just need to depend on the test jar of hbase-logging module at test scope to output the log to console, without placing a log4j.properties in the test resources as they all (almost) have the same content. And this test module will not be included in the assembly tarball so it will not mess up the binary distribution. Ban direct commons-logging dependency, and ban commons-logging and log4j imports in non-test code, to avoid mess up the downstream users logging framework. In hbase-logging module we do need to use log4j classes and the trick is to use full class name. Add jcl-over-slf4j and jul-to-slf4j dependencies, as some of our dependencies use jcl or jul as logging framework, we should also redirect their log message to slf4j. --- * [HBASE-21406](https://issues.apache.org/jira/browse/HBASE-21406) | *Minor* | **"status 'replication'" should not show SINK if the cluster does not act as sink** Added new metric to differentiate sink startup time from last OP applied time. Original behaviour was to always set startup time to TimestampsOfLastAppliedOp, and always show it on "status 'replication'" command, regardless if the sink ever applied any OP. This was confusing, specially for scenarios where cluster was just acting as source, the output could lead to wrong interpretations about sink not applying edits or replication being stuck. With the new metric, we now compare the two metrics values, assuming that if both are the same, there's never been any OP shipped to the given sink, so output would reflect it more clearly, to something as for example: SINK: TimeStampStarted=Thu Dec 06 23:59:47 GMT 2018, Waiting for OPs... --- * [HBASE-24132](https://issues.apache.org/jira/browse/HBASE-24132) | *Major* | **Upgrade to Apache ZooKeeper 3.5.7** HBase ships ZooKeeper 3.5.x. Was the EOL'd 3.4.x. 3.5.x client can talk to 3.4.x ensemble. The ZooKeeper project has built a [FAQ](https://cwiki.apache.org/confluence/display/ZOOKEEPER/Upgrade+FAQ) that documents known issues and work-arounds when upgrading existing deployments. --- * [HBASE-22287](https://issues.apache.org/jira/browse/HBASE-22287) | *Major* | **inifinite retries on failed server in RSProcedureDispatcher** Add backoff. Avoid retrying every 100ms. --- * [HBASE-24425](https://issues.apache.org/jira/browse/HBASE-24425) | *Major* | **Run hbck\_chore\_run and catalogjanitor\_run on draw of 'HBCK Report' page** Runs 'catalogjanitor\_run' and 'hbck\_chore\_run' inline with the loading of the 'HBCK Report' page. Pass '?cache=true' to skip inline invocation of 'catalogjanitor\_run' and 'hbck\_chore\_run' drawing the page. --- * [HBASE-24408](https://issues.apache.org/jira/browse/HBASE-24408) | *Blocker* | **Introduce a general 'local region' to store data on master** Introduced a general 'local region' at master side to store the procedure data, etc. The hfile of this region will be stored on the root fs while the wal will be stored on the wal fs. This issue supercedes part of the code for HBASE-23326, as now we store the data in 'MasterData' directory instead of 'MasterProcs'. The old hfiles will be moved to the global hfile archived directory with the suffix $-masterlocalhfile-$. The wal files will be moved to the global old wal directory with the suffix $masterlocalwal$. The TimeToLiveMasterLocalStoreHFileCleaner and TimeToLiveMasterLocalStoreWALCleaner are configured by default for cleaning the old hfiles and wal files, and the default TTLs are both 7 days. --- * [HBASE-24115](https://issues.apache.org/jira/browse/HBASE-24115) | *Major* | **Relocate test-only REST "client" from src/ to test/ and mark Private** Relocate test-only REST RemoteHTable and RemoteAdmin from src/ to test/. And mark them as InterfaceAudience.Private. --- * [HBASE-23938](https://issues.apache.org/jira/browse/HBASE-23938) | *Major* | **Replicate slow/large RPC calls to HDFS** Config key: hbase.regionserver.slowlog.systable.enabled Default value: false This config can be enabled if hbase.regionserver.slowlog.buffer.enabled is already enabled. While hbase.regionserver.slowlog.buffer.enabled ensures that any slow/large RPC logs with complete details are written to ring buffer available at each RegionServer, hbase.regionserver.slowlog.systable.enabled would ensure that all such logs are also persisted in new system table hbase:slowlog. Operator can scan hbase:slowlog with filters to retrieve specific attribute matching records and this table would be useful to capture historical performance of slowness of RPC calls with detailed analysis. hbase:slowlog consists of single ColumnFamily info. info consists of multiple qualifiers similar to the attributes available to query as part of Admin API: get\_slowlog\_responses. One example of a row from hbase:slowlog scan result (Attached a sample screenshot in the Jira) : \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:call\_details, timestamp=2020-05-16T14:59:58.764Z, value=Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest) \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:client\_address, timestamp=2020-05-16T14:59:58.764Z, value=172.20.10.2:57348 \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:method\_name, timestamp=2020-05-16T14:59:58.764Z, value=Scan \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:param, timestamp=2020-05-16T14:59:58.764Z, value=region { type: REGION\_NAME value: "cluster\_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf." } scan { a ttribute { name: "\_isolationlevel\_" value: "\\x5C000" } start\_row: "cccccccc" time\_range { from: 0 to: 9223372036854775807 } max\_versions: 1 cache\_blocks: true max\_result\_size: 2 097152 caching: 2147483647 include\_stop\_row: false } number\_of\_rows: 2147483647 close\_scanner: false client\_handles\_partials: true client\_handles\_heartbeats: true track\_scan\_met rics: false \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:processing\_time, timestamp=2020-05-16T14:59:58.764Z, value=24 \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:queue\_time, timestamp=2020-05-16T14:59:58.764Z, value=0 \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:region\_name, timestamp=2020-05-16T14:59:58.764Z, value=cluster\_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf. \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:response\_size, timestamp=2020-05-16T14:59:58.764Z, value=211227 \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:server\_class, timestamp=2020-05-16T14:59:58.764Z, value=HRegionServer \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:start\_time, timestamp=2020-05-16T14:59:58.764Z, value=1589640743932 \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:type, timestamp=2020-05-16T14:59:58.764Z, value=ALL \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:username, timestamp=2020-05-16T14:59:58.764Z, value=vjasani --- * [HBASE-24271](https://issues.apache.org/jira/browse/HBASE-24271) | *Major* | **Set values in \`conf/hbase-site.xml\` that enable running on \`LocalFileSystem\` out of the box** HBASE-24271 makes changes the the default `conf/hbase-site.xml` such that `bin/hbase` will run directly out of the binary tarball or a compiled source tree without any configuration modifications vs. Hadoop 2.8+. This changes our long-standing history of shipping no configured values in `conf/hbase-site.xml`, so existing processes that assume this file is empty of configuration properties may require attention. --- * [HBASE-24310](https://issues.apache.org/jira/browse/HBASE-24310) | *Major* | **Use Slf4jRequestLog for hbase-http** Use Slf4jRequestLog instead of the log4j HttpRequestLogAppender in HttpServer. The request log is disabled by default in conf/log4j.properties by the following lines: # Disable request log by default, you can enable this by changing the appender log4j.category.http.requests=INFO,NullAppender log4j.additivity.http.requests=false Change the 'NullAppender' to what ever you want if you want to enable request log. Notice that, the logger name for master status http server is 'http.requests.master', and for region server it is 'http.requests.regionserver' --- * [HBASE-24335](https://issues.apache.org/jira/browse/HBASE-24335) | *Major* | **Support deleteall with ts but without column in shell mode** Use a empty string to represent no column specified for deleteall in shell mode. useage: deleteall 'test','r1','',12345 deleteall 'test', {ROWPREFIXFILTER =\> 'prefix'}, '', 12345 --- * [HBASE-24304](https://issues.apache.org/jira/browse/HBASE-24304) | *Major* | **Separate a hbase-asyncfs module** Added a new hbase-asyncfs module to hold the asynchronous dfs output stream implementation for implementing WAL. --- * [HBASE-22710](https://issues.apache.org/jira/browse/HBASE-22710) | *Major* | **Wrong result in one case of scan that use raw and versions and filter together** Make the logic of the versions chosen more reasonable for raw scan, to avoid lose result when using filter. --- * [HBASE-24285](https://issues.apache.org/jira/browse/HBASE-24285) | *Major* | **Move to hbase-thirdparty-3.3.0** Moved to hbase-thirdparty 3.3.0. --- * [HBASE-24252](https://issues.apache.org/jira/browse/HBASE-24252) | *Major* | **Implement proxyuser/doAs mechanism for hbase-http** This feature enables the HBase Web UI's to accept a 'proxyuser' via the HTTP Request's query string. When the parameter \`hbase.security.authentication.spnego.kerberos.proxyuser.enable\` is set to \`true\` in hbase-site.xml (default is \`false\`), the HBase UI will attempt to impersonate the user specified by the query parameter "doAs". This query parameter is checked case-insensitively. When this option is not provided, the user who executed the request is the "real" user and there is no ability to execute impersonation against the WebUI. For example, if the user "bob" with Kerberos credentials executes a request against the WebUI with this feature enabled and a query string which includes \`doAs=alice\`, the HBase UI will treat this request as executed as \`alice\`, not \`bob\`. The standard Hadoop proxyuser configuration properties to limit users who may impersonate others apply to this change (e.g. to enable \`bob\` to impersonate \`alice\`). See the Hadoop documentation for more information on how to configure these proxyuser rules. --- * [HBASE-24143](https://issues.apache.org/jira/browse/HBASE-24143) | *Major* | **[JDK11] Switch default garbage collector from CMS** `bin/hbase` will now dynamically select a Garbage Collector implementation based on the detected JVM version. JDKs 8,9,10 use `-XX:+UseConcMarkSweepGC`, while JDK11+ use `-XX:+UseG1GC`. Notice a slight compatibility change. Previously, the garbage collector choice would always be appended to a user-provided value for `HBASE_OPTS`. As of this change, this setting will only be applied when `HBASE_OPTS` is unset. That means that operators who provide a value for this variable will now need to also specify the collector. This is especially important for those on JDK8, where the vm default GC is not the recommended ConcMarkSweep. --- * [HBASE-24024](https://issues.apache.org/jira/browse/HBASE-24024) | *Major* | **Optionally reject multi() requests with very high no of rows** New Config: hbase.rpc.rows.size.threshold.reject ----------------------------------------------------------------------- Default value: false Description: If value is true, RegionServer will abort batch requests of Put/Delete with number of rows in a batch operation exceeding threshold defined by value of config: hbase.rpc.rows.warning.threshold. --- * [HBASE-24139](https://issues.apache.org/jira/browse/HBASE-24139) | *Critical* | **Balancer should avoid leaving idle region servers** StochasticLoadBalancer functional improvement: StochasticLoadBalancer would rebalance the cluster if there are any idle RegionServers in the cluster (RegionServer having no region), while other RegionServers have at least 1 region available. --- * [HBASE-24196](https://issues.apache.org/jira/browse/HBASE-24196) | *Major* | **[Shell] Add rename rsgroup command in hbase shell** user or admin can now use hbase shell \> rename\_rsgroup 'oldname', 'newname' to rename rsgroup. --- * [HBASE-24218](https://issues.apache.org/jira/browse/HBASE-24218) | *Major* | **Add hadoop 3.2.x in hadoop check** Add hadoop-3.2.0 and hadoop-3.2.1 in hadoop check and when '--quick-hadoopcheck' we will only check hadoop-3.2.1. Notice that, for aligning the personality scripts across all the active branches, we will commit the patch to all active branches, but the hadoop-3.2.x support in hadoopcheck is only applied to branch-2.2+. --- * [HBASE-23829](https://issues.apache.org/jira/browse/HBASE-23829) | *Major* | **Get \`-PrunSmallTests\` passing on JDK11** \`-PrunSmallTests\` now pass on JDK11 when using \`-Phadoop.profile=3.0\`. --- * [HBASE-24185](https://issues.apache.org/jira/browse/HBASE-24185) | *Major* | **Junit tests do not behave well with System.exit or Runtime.halt or JVM exits in general.** Tests that fail because a process -- RegionServer or Master -- called System.exit, will now instead throw an exception. --- * [HBASE-24072](https://issues.apache.org/jira/browse/HBASE-24072) | *Major* | **Nightlies reporting OutOfMemoryError: unable to create new native thread** Hadoop hosts have had their ulimit -u raised from 10000 to 30000 (per user, by INFRA). The Docker build container has had its limit raised from 10000 to 12500. --- * [HBASE-24112](https://issues.apache.org/jira/browse/HBASE-24112) | *Major* | **[RSGroup] Support renaming rsgroup** Support RSGroup renaming in core codebase. New API Admin#renameRSGroup(String, String) is introduced in 3.0.0. --- * [HBASE-23994](https://issues.apache.org/jira/browse/HBASE-23994) | *Trivial* | ** Add WebUI to Canary** The Canary tool now offers a WebUI when run in `region` mode (the default mode). It is enabled by default, and by default, it binds to `0.0.0.0:16050`. This can be overridden by setting `hbase.canary.info.bindAddress` and `hbase.canary.info.port`. To disable entirely, set the port to `-1`. --- * [HBASE-23779](https://issues.apache.org/jira/browse/HBASE-23779) | *Major* | **Up the default fork count to make builds complete faster; make count relative to CPU count** Pass --threads=2 building on jenkins. It shortens nightly build times by about ~25%. It works by running module build/test in parallel when dependencies allow. Upping the forkcount beyond the pom default of 0.25C would have us broach our CPU budget on jenkins when two modules are running in parallel (2 modules at 0.25% of CPU each makes 0.5C and on jenkins, hadoop nodes run two jenkins executors per host). Higher forkcounts also seems to threaten build stability. For running tests locally, to go faster, up fork count. $ x="0.5C" ; mvn --threads=2 -Dsurefire.firstPartForkCount=$x -Dsurefire.secondPartForkCount=$x test -PrunAllTests You could up the x from 0.5C to 1.0C but YMMV (On overcommitted hardware, tests start bombing out pretty soon after startup). You could try upping thread count but on occasion are likely to overcommit hardware. --- * [HBASE-24126](https://issues.apache.org/jira/browse/HBASE-24126) | *Major* | **Up the container nproc uplimit from 10000 to 12500** Start docker with upped ulimit for nproc passing '--ulimit nproc=12500'. It was 10000, the default, but made it 12500. Then, set PROC\_LIMIT in hbase-personality so when yetus runs, it is w/ the new 12500 value. --- * [HBASE-24150](https://issues.apache.org/jira/browse/HBASE-24150) | *Major* | **Allow module tests run in parallel** Pass -T2 to mvn. Makes it so we do two modules-at-a-time dependencies willing. Helps speed build and testing. Doubles the resource usage when running modules in parallel. --- * [HBASE-24121](https://issues.apache.org/jira/browse/HBASE-24121) | *Major* | **[Authorization] ServiceAuthorizationManager isn't dynamically updatable. And it should be.** Master & RegionService now support refresh policy authorization defined in hbase-policy.xml without restarting service. To refresh policy, please execute hbase shell command: update\_config or update\_config\_all after policy file updated and synced on all nodes. --- * [HBASE-24099](https://issues.apache.org/jira/browse/HBASE-24099) | *Major* | **Use a fair ReentrantReadWriteLock for the region close lock** This change modifies the default acquisition policy for the region's close lock in order to prevent observed starvation of close requests. The new boolean configuration parameter 'hbase.regionserver.fair.region.close.lock' controls the lock acquisition policy: if true, the lock is created in fair mode (default); if false, the lock is created in nonfair mode (the old default). --- * [HBASE-23153](https://issues.apache.org/jira/browse/HBASE-23153) | *Major* | **PrimaryRegionCountSkewCostFunction SLB function should implement CostFunction#isNeeded** The `PrimaryRegionCountSkewCostFunction` for the `StochasticLoadBalancer` is only needed when the read replicas feature is enabled. With this change, that function now properly indicates that it is not needed when the read replica feature is off. If this improvement is not available, operators with clusters that are not using the read replica feature should manually disable it by setting `hbase.master.balancer.stochastic.primaryRegionCountCost` to `0.0` in hbase-site.xml for all HBase Masters. --- * [HBASE-24055](https://issues.apache.org/jira/browse/HBASE-24055) | *Major* | **Make AsyncFSWAL can run on EC cluster** Now AsyncFSWAL can also be used against the directory which has EC enabled. Need to make sure you also make use of the hadoop 3.x client as the option is only available in hadoop 3.x. --- * [HBASE-24113](https://issues.apache.org/jira/browse/HBASE-24113) | *Major* | **Upgrade the maven we use from 3.5.4 to 3.6.3 in nightlies** Branches-2.3+ use maven 3.5.3 building. Older branches use 3.5.4 still. --- * [HBASE-24122](https://issues.apache.org/jira/browse/HBASE-24122) | *Major* | **Change machine ulimit-l to ulimit-a so dumps full ulimit rather than just 'max locked memory'** Our 'Build Artifacts' have a machine directory under which we emit vitals on the host the build was run on. We used to emit the result of 'ulimit -l' as a file named 'ulimit-l'. This has been hijacked to instead emit result of running 'ulimit -a' which includes stat on ulimit -l. --- * [HBASE-23678](https://issues.apache.org/jira/browse/HBASE-23678) | *Major* | **Literate builder API for version management in schema** ColumnFamilyDescriptor new builder API: /\*\* \* Retain all versions for a given TTL(retentionInterval), and then only a specific number \* of versions(versionAfterInterval) after that interval elapses. \* \* @param retentionInterval Retain all versions for this interval \* @param versionAfterInterval Retain no of versions to retain after retentionInterval \*/ public ModifyableColumnFamilyDescriptor setVersionsWithTimeToLive( final int retentionInterval, final int versionAfterInterval) --- * [HBASE-24050](https://issues.apache.org/jira/browse/HBASE-24050) | *Major* | **Deprecated PBType on all 2.x branches** org.apache.hadoop.hbase.types.PBType is marked as deprecated without any replacement. It will be moved to hbase-example module and marked as IA.Private in 3.0.0. This is a mistake as it should not be part of our public API. Users who depend on this class should just copy the code your own code base. --- * [HBASE-8868](https://issues.apache.org/jira/browse/HBASE-8868) | *Minor* | **add metric to report client shortcircuit reads** Expose file system level read metrics for RegionServer. If the HBase RS runs on top of HDFS, calculate the aggregation of ReadStatistics of each HdfsFileInputStream. These metrics include: (1) total number of bytes read from HDFS. (2) total number of bytes read from local DataNode. (3) total number of bytes read locally through short-circuit read. (4) total number of bytes read locally through zero-copy read. Because HDFS ReadStatistics is calculated per input stream, it is not feasible to update the aggregated number in real time. Instead, the metrics are updated when an input stream is closed. --- * [HBASE-24032](https://issues.apache.org/jira/browse/HBASE-24032) | *Major* | **[RSGroup] Assign created tables to respective rsgroup automatically instead of manual operations** Admin can determine which tables go to which rsgroup by script (setting hbase.rsgroup.table.mapping.script with local filystem path) on Master side which aims to lighten the burden of admin operations. Note, since HBase 3+, rsgroup can be specified in TableDescriptor as well, if clients specify this, master will skip the determination from script. Here is a simple example of script: {code} # Input consists of two string, 1st is the namespace of the table, 2nd is the table name of the table #!/bin/bash namespace=$1 tablename=$2 if [[ $namespace == test ]]; then echo test elif [[ $tablename == \*foo\* ]]; then echo other else echo default fi {code} --- * [HBASE-23993](https://issues.apache.org/jira/browse/HBASE-23993) | *Major* | **Use loopback for zk standalone server in minizkcluster** MiniZKCluster now puts up its standalone node listening on loopback/127.0.0.1 rather than "localhost". --- * [HBASE-23986](https://issues.apache.org/jira/browse/HBASE-23986) | *Major* | **Bump hadoop-two.version to 2.10.0 on master and branch-2** Bumped hadoop-two.version to 2.10.0, which means we will drop the support for hadoop-2.8.x and hadoop-2.9.x. --- * [HBASE-23930](https://issues.apache.org/jira/browse/HBASE-23930) | *Minor* | **Shell should attempt to format \`timestamp\` attributes as ISO-8601** Change timestamp display to be ISO8601 when toString on Cell and outputting in shell.... User used to see.... column=table:state, timestamp=1583967620343 ..... ... but now sees: column=table:state, timestamp=2020-03-11T23:00:20.343Z .... --- * [HBASE-22827](https://issues.apache.org/jira/browse/HBASE-22827) | *Major* | **Expose multi-region merge in shell and Admin API** merge\_region shell command can now be used to merge more than 2 regions as well. It takes a list of regions as comma separated values or as an array of regions, and not just 2 regions. The full regionnames and encoded regionnames are continued to be accepted. --- * [HBASE-23767](https://issues.apache.org/jira/browse/HBASE-23767) | *Major* | **Add JDK11 compilation and unit test support to Github precommit** Rebuild our Dockerfile with support for multiple JDK versions. Use multiple stages in the Jenkinsfile instead of yetus's multijdk because of YETUS-953. Run those multiple stages in parallel to speed up results. Note that multiple stages means multiple Yetus invocations means multiple comments on the PreCommit. This should become more obvious to users once we can make use of GitHub Checks API, HBASE-23902. --- * [HBASE-22978](https://issues.apache.org/jira/browse/HBASE-22978) | *Minor* | **Online slow response log** get\_slowlog\_responses and clear\_slowlog\_responses are used to retrieve and clear slow RPC logs from RingBuffer maintained by RegionServers. New Admin APIs: 1. List\ getSlowLogResponses(final Set\ serverNames, final SlowLogQueryFilter slowLogQueryFilter) throws IOException; 2. List\ clearSlowLogResponses(final Set\ serverNames) throws IOException; Configs: 1. hbase.regionserver.slowlog.ringbuffer.size: Default size of ringbuffer to be maintained by each RegionServer in order to store online slowlog responses. This is an in-memory ring buffer of requests that were judged to be too slow in addition to the responseTooSlow logging. The in-memory representation would be complete. For more details, please look into Doc Section: Get Slow Response Log from shell Default 256 2. hbase.regionserver.slowlog.buffer.enabled: Indicates whether RegionServers have ring buffer running for storing Online Slow logs in FIFO manner with limited entries. The size of the ring buffer is indicated by config: hbase.regionserver.slowlog.ringbuffer.size The default value is false, turn this on and get latest slowlog responses with complete data. Default false For more details, please look into "Get Slow Response Log from shell" section from HBase book. --- * [HBASE-23926](https://issues.apache.org/jira/browse/HBASE-23926) | *Major* | **[Flakey Tests] Down the flakies re-run ferocity; it makes for too many fails.** Down the flakey re-rerun fork count from 1.0C -- i.e. a fork per CPU -- to 0.25C. On a recent run, the machine had 16 cores. 0.25 is 4 cores. We'd hardcoded fork count at 3 previous to changes made by parent. --- * [HBASE-23146](https://issues.apache.org/jira/browse/HBASE-23146) | *Major* | **Support CheckAndMutate with multiple conditions** Add a checkAndMutate(row, filter) method in the AsyncTable interface and the Table interface. This method atomically checks if the row matches the specified filter. If it does, it adds the Put/Delete/RowMutations. This is a fluent style API, the code is like: For Table interface: {code} table.checkAndMutate(row, filter).thenPut(put); {code} For AsyncTable interface: {code} table.checkAndMutate(row, filter).thenPut(put) .thenAccept(succ -\> { if (succ) { System.out.println("Check and put succeeded"); } else { System.out.println("Check and put failed"); } }); {code} --- * [HBASE-23874](https://issues.apache.org/jira/browse/HBASE-23874) | *Minor* | **Move Jira-attached file precommit definition from script in Jenkins config to dev-support** The Jira Precommit job (https://builds.apache.org/job/PreCommit-HBASE-Build/) will now look for a file within the source tree (dev-support/jenkins\_precommit\_jira\_yetus.sh) instead of depending on a script section embedded in the job. --- * [HBASE-23865](https://issues.apache.org/jira/browse/HBASE-23865) | *Major* | **Up flakey history from 5 to 10** Changed flakey list reporting to show 5 rather than 10 items. Also changed the second and first part fort counts to be 1C rather than hardcoded 3. --- * [HBASE-23554](https://issues.apache.org/jira/browse/HBASE-23554) | *Major* | **Encoded regionname to regionname utility** Adds shell command regioninfo: hbase(main):001:0\> regioninfo '0e6aa5c19ae2b2627649dc7708ce27d0' {ENCODED =\> 0e6aa5c19ae2b2627649dc7708ce27d0, NAME =\> 'TestTable,,1575941375972.0e6aa5c19ae2b2627649dc7708ce27d0.', STARTKEY =\> '', ENDKEY =\> '00000000000000000000299441'} Took 0.4737 seconds --- * [HBASE-23350](https://issues.apache.org/jira/browse/HBASE-23350) | *Major* | **Make compaction files cacheonWrite configurable based on threshold** This JIRA adds a new configuration - \`hbase.rs.cachecompactedblocksonwrite.threshold\`. This configuration is the maximum total size (in bytes) of the compacted files below which the configuration \`hbase.rs.cachecompactedblocksonwrite\` is honoured. If the total size of the compacted fies exceeds this threshold, even when \`hbase.rs.cachecompactedblocksonwrite\` is enabled, the data blocks are not cached. Caching index and bloom blocks is not affected by this configuration (user configuration is always honoured). Default value of this configuration is Long.MAX\_VALUE. This means whatever the total size of the compacted files, it wil be cached. --- * [HBASE-17115](https://issues.apache.org/jira/browse/HBASE-17115) | *Major* | **HMaster/HRegion Info Server does not honour admin.acl** Implements authorization for the HBase Web UI by limiting access to certain endpoints which could be used to extract sensitive information from HBase. Access to these restricted endpoints can be limited to a group of administrators, identified either by a list of users (hbase.security.authentication.spnego.admin.users) or by a list of groups (hbase.security.authentication.spnego.admin.groups). By default, neither of these values are set which will preserve backwards compatibility (allowing all authenticated users to access all endpoints). Further, users who have sensitive information in the HBase service configuration can set hbase.security.authentication.ui.config.protected to true which will treat the configuration endpoint as a protected, admin-only resource. By default, all authenticated users may access the configuration endpoint. --- * [HBASE-23647](https://issues.apache.org/jira/browse/HBASE-23647) | *Major* | **Make MasterRegistry the default registry impl** Enables master based registry as the default registry used by clients to fetch connection metadata. Refer to the section "Master Registry" in the client documentation for more details and advantages of this implementation over the default Zookeeper based registry. Configuration parameter that controls the registry in use: `hbase.client.registry.impl` Where to set this: HBase client configuration (hbase-site.xml) Possible values: - `org.apache.hadoop.hbase.client.ZKConnectionRegistry` (For ZK based registry implementation) - `org.apache.hadoop.hbase.client.MasterRegistry` (New, for master based registry implementation) Notes on defaults: - For v3.0.0 and later, MasterRegistry is the default registry - For all releases in 2.x line, ZK based registry is the default. This feature has been back ported to 2.3.0 and later releases. MasterRegistry can be enabled by setting the following client configuration. ``` hbase.client.registry.impl org.apache.hadoop.hbase.client.MasterRegistry ``` --- * [HBASE-23069](https://issues.apache.org/jira/browse/HBASE-23069) | *Critical* | **periodic dependency bump for Sep 2019** caffeine: 2.6.2 =\> 2.8.1 commons-codec: 1.10 =\> 1.13 commons-io: 2.5 =\> 2.6 disrupter: 3.3.6 =\> 3.4.2 httpcore: 4.4.6 =\> 4.4.13 jackson: 2.9.10 =\> 2.10.1 jackson.databind: 2.9.10.1 =\> 2.10.1 jetty: 9.3.27.v20190418 =\> 9.3.28.v20191105 protobuf.plugin: 0.5.0 =\> 0.6.1 zookeeper: 3.4.10 =\> 3.4.14 slf4j: 1.7.25 =\> 1.7.30 rat: 0.12 =\> 0.13 asciidoctor: 1.5.5 =\> 1.5.8 asciidoctor.pdf: 1.5.0-alpha.15 =\> 1.5.0-rc.2 error-prone: 2.3.3 =\> 2.3.4 --- * [HBASE-23686](https://issues.apache.org/jira/browse/HBASE-23686) | *Major* | **Revert binary incompatible change and remove reflection** - Reverts a binary incompatible binary change for ByteRangeUtils - Usage of reflection inside CommonFSUtils removed --- * [HBASE-23347](https://issues.apache.org/jira/browse/HBASE-23347) | *Major* | **Pluggable RPC authentication** This change introduces an internal abstraction layer which allows for new SASL-based authentication mechanisms to be used inside HBase services. All existing SASL-based authentication mechanism were ported to the new abstraction, making no external change in runtime semantics, client API, or RPC serialization format. Developers familiar with extending HBase can implement authentication mechanism beyond simple Kerberos and DelegationTokens which authenticate HBase users against some other user database. HBase service authentication (Master to/from RegionServer) continue to operate solely over Kerberos. --- * [HBASE-23156](https://issues.apache.org/jira/browse/HBASE-23156) | *Major* | **start-hbase.sh failed with ClassNotFoundException when build with hadoop3** Introduce a new hbase-assembly/src/main/assembly/hadoop-three-compat.xml for build with hadoop 3.x. --- * [HBASE-23680](https://issues.apache.org/jira/browse/HBASE-23680) | *Major* | **RegionProcedureStore missing cleaning of hfile archive** Add a new config to hbase-default.xml \ \hbase.procedure.store.region.hfilecleaner.plugins\ \org.apache.hadoop.hbase.master.cleaner.TimeToLiveHFileCleaner\ \A comma-separated list of BaseHFileCleanerDelegate invoked by the RegionProcedureStore HFileCleaner service. These HFiles cleaners are called in order, so put the cleaner that prunes the most files in front. To implement your own BaseHFileCleanerDelegate, just put it in HBase's classpath and add the fully qualified class name here. Always add the above default hfile cleaners in the list as they will be overwritten in hbase-site.xml.\ \ It will share the same TTL with other HFileCleaners. And you can also implement your own cleaner and change this property to enable it. --- * [HBASE-23675](https://issues.apache.org/jira/browse/HBASE-23675) | *Minor* | **Move to Apache parent POM version 22** Updated parent pom to Apache version 22. --- * [HBASE-23679](https://issues.apache.org/jira/browse/HBASE-23679) | *Critical* | **FileSystem instance leaks due to bulk loads with Kerberos enabled** This issues fixes an issue with Bulk Loading on installations with Kerberos enabled and more than a single RegionServer. When multiple tables are involved in hosting a table's regions which are being bulk-loaded into, all but the RegionServer hosting the table's first Region will "leak" one DistributedFileSystem object onto the heap, never freeing that memory. Eventually, with enough bulk loads, this will create a situation for RegionServers where they have no free heap space and will either spend all time in JVM GC, lose their ZK session, or crash with an OutOfMemoryError. The only mitigation for this issue is to periodically restart RegionServers. All earlier versions of HBase 2.x are subject to this issue (2.0.x, \<=2.1.8, \<=2.2.3) --- * [HBASE-23286](https://issues.apache.org/jira/browse/HBASE-23286) | *Major* | **Improve MTTR: Split WAL to HFile** Add a new feature to improve MTTR which have 3 steps to failover: 1. Read WAL and write HFile to region’s column family’s recovered.hfiles directory. 2. Open region. 3. Bulkload the recovered.hfiles for every column family. Compared to DLS(distributed log split), this feature will reduce region open time significantly. Config hbase.wal.split.to.hfile to true to enable this featue. --- * [HBASE-23619](https://issues.apache.org/jira/browse/HBASE-23619) | *Trivial* | **Use built-in formatting for logging in hbase-zookeeper** Changed the logging in hbase-zookeeper to use built-in formatting --- * [HBASE-23628](https://issues.apache.org/jira/browse/HBASE-23628) | *Minor* | **Replace Apache Commons Digest Base64 with JDK8 Base64** From the PR: "Yes. The two create the same output... I just wrote a small test suite to increase my confidence on that. I generated many tens of millions of random byte patterns and compared the output of the two algorithms. They came back identical every time. "Just in case any inquiring minds would like to know, there is no longer an encoding required when generating the strings. The JDK implementation specifically specifies that strings returned are StandardCharsets.ISO\_8859\_1. This does not change anything because UTF8 and ISO\_8859 overlap for the limited character set (64 characters) the encoding uses." --- * [HBASE-23651](https://issues.apache.org/jira/browse/HBASE-23651) | *Major* | **Region balance throttling can be disabled** Set hbase.balancer.max.balancing to a int value which \<=0 will disable region balance throttling. --- * [HBASE-23588](https://issues.apache.org/jira/browse/HBASE-23588) | *Major* | **Cache index blocks and bloom blocks on write if CacheCompactedBlocksOnWrite is enabled** If cacheOnWrite is enabled during flush or compaction, index and bloom blocks(with data blocks) would be automatically cached during write. --- * [HBASE-23369](https://issues.apache.org/jira/browse/HBASE-23369) | *Major* | **Auto-close 'unknown' Regions reported as OPEN on RegionServers** If a RegionServer reports a Region as OPEN in disagreement with Master's status on the Region, the Master now tells the RegionServer to silently close the Region. --- * [HBASE-23596](https://issues.apache.org/jira/browse/HBASE-23596) | *Major* | **HBCKServerCrashProcedure can double assign** Makes it so the recently added HBCKServerCrashProcedure -- the SCP that gets invoked when an operator schedules an SCP via hbck2 scheduleRecoveries command -- now works the same as SCP EXCEPT if master knows nothing of the scheduled servername. In this latter case, HBCKSCP will do a full scan of hbase:meta looking for instances of the passed servername. If any found it will attempt cleanup of hbase:meta references by reassigning any found OPEN or OPENING and by closing any in CLOSING state. Used to fix instances of what the 'HBCK Report' page shows as 'Unknown Servers'. --- * [HBASE-23624](https://issues.apache.org/jira/browse/HBASE-23624) | *Major* | **Add a tool to dump the procedure info in HFile** Use ./hbase org.apache.hadoop.hbase.procedure2.store.region.HFileProcedurePrettyPrinter to run the tool. --- * [HBASE-23590](https://issues.apache.org/jira/browse/HBASE-23590) | *Major* | **Update maxStoreFileRefCount to maxCompactedStoreFileRefCount** RegionsRecoveryChore introduced as part of HBASE-22460 tries to reopen regions based on config: hbase.regions.recovery.store.file.ref.count. Region reopen needs to take into consideration all compacted away store files that belong to the region and not store files(non-compacted). Fixed this bug as part of this Jira. Updated description for corresponding configs: 1. hbase.master.regions.recovery.check.interval : Regions Recovery Chore interval in milliseconds. This chore keeps running at this interval to find all regions with configurable max store file ref count and reopens them. Defaults to 20 mins 2. hbase.regions.recovery.store.file.ref.count : Very large number of ref count on a compacted store file indicates that it is a ref leak on that object(compacted store file). Such files can not be removed after it is invalidated via compaction. Only way to recover in such scenario is to reopen the region which can release all resources, like the refcount, leases, etc. This config represents Store files Ref Count threshold value considered for reopening regions. Any region with compacted store files ref count \> this value would be eligible for reopening by master. Here, we get the max refCount among all refCounts on all compacted away store files that belong to a particular region. Default value -1 indicates this feature is turned off. Only positive integer value should be provided to enable this feature. --- * [HBASE-23618](https://issues.apache.org/jira/browse/HBASE-23618) | *Major* | **Add a tool to dump procedure info in the WAL file** Use ./hbase org.apache.hadoop.hbase.procedure2.store.region.WALProcedurePrettyPrinter to run the tool. --- * [HBASE-23617](https://issues.apache.org/jira/browse/HBASE-23617) | *Major* | **Add a stress test tool for region based procedure store** Use ./hbase org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStorePerformanceEvaluation to run the tool. --- * [HBASE-23326](https://issues.apache.org/jira/browse/HBASE-23326) | *Critical* | **Implement a ProcedureStore which stores procedures in a HRegion** Use a region based procedure store to replace the old customized WAL based procedure store. The procedure data migration is done automatically during upgrading. After upgrading, the MasterProcWALs directory will be deleted and a new MasterProc directory will be created. And notice that a region will still write WAL so we still have WAL files and they will be moved to the oldWALs directory. The file name is mostly like a normal WAL file, and the only difference is that it is ended with "$masterproc$". --- * [HBASE-23320](https://issues.apache.org/jira/browse/HBASE-23320) | *Major* | **Upgrade surefire plugin to 3.0.0-M4** Bumped surefire plugin to 3.0.0-M4 --- * [HBASE-20461](https://issues.apache.org/jira/browse/HBASE-20461) | *Major* | **Implement fsync for AsyncFSWAL** Now AsyncFSWAL also supports Durability.FSYNC\_WAL. --- * [HBASE-23066](https://issues.apache.org/jira/browse/HBASE-23066) | *Minor* | **Create a config that forces to cache blocks on compaction** The configuration 'hbase.rs.cacheblocksonwrite' was used to enable caching the blocks on write. But purposefully we were not caching the blocks when we do compaction (since it may be very aggressive) as the caching happens as and when the writer completes a block. In cloud environments since they have bigger sized caches - though they try to enable 'hbase.rs.prefetchblocksonopen' (non - aggressive way of caching the blocks proactively on reader creation) it does not help them because it takes time to cache the compacted blocks. This feature creates a new configuration 'hbase.rs.cachecompactedblocksonwrite' which when set to 'true' will enable the blocks created out of compaction. Remember that since it is aggressive caching the user should be having enough cache space - if not it may lead to other active blocks getting evicted. From the shell this can be enabled by using the option per Column Family also by using the below format {code} create 't1', 'f1', {NUMREGIONS =\> 15, SPLITALGO =\> 'HexStringSplit', CONFIGURATION =\> {'hbase.rs.cachecompactedblocksonwrite' =\> 'true'}} {code} --- * [HBASE-23239](https://issues.apache.org/jira/browse/HBASE-23239) | *Major* | **Reporting on status of backing MOB files from client-facing cells** Users of the MOB feature can now use the `mobrefs` utility to get statistics about data in the MOB system and verify the health of backing files on HDFS. ``` HADOOP_CLASSPATH=/etc/hbase/conf:$(hbase mapredcp) yarn jar \ /some/path/to/hbase-shaded-mapreduce.jar mobrefs mobrefs-report-output some_table foo ``` See javadocs of the class `MobRefReporter` for more details. the reference guide has added some information about MOB internals and troubleshooting. --- * [HBASE-23549](https://issues.apache.org/jira/browse/HBASE-23549) | *Minor* | **Document steps to disable MOB for a column family** The reference guide now includes a walk through of disabling the MOB feature if needed while maintaining availability. --- * [HBASE-23582](https://issues.apache.org/jira/browse/HBASE-23582) | *Minor* | **Unbalanced braces in string representation of table descriptor** Fixed unbalanced braces in string representation within HBase shell --- * [HBASE-23293](https://issues.apache.org/jira/browse/HBASE-23293) | *Minor* | **[REPLICATION] make ship edits timeout configurable** The default rpc timeout for ReplicationSourceShipper#shipEdits is 60s, when bulkload replication enabled, timeout exception may be occurred. Now we can conf the timeout value through replication.source.shipedits.timeout, and it’s adaptive. --- * [HBASE-23312](https://issues.apache.org/jira/browse/HBASE-23312) | *Major* | **HBase Thrift SPNEGO configs (HBASE-19852) should be backwards compatible** The newer HBase Thrift SPNEGO configs should not be required. The hbase.thrift.spnego.keytab.file and hbase.thrift.spnego.principal configs will fall back to the hbase.thrift.keytab.file and hbase.thrift.kerberos.principal original configs. The older configs will log a deprecation warning. It is preferred to new the newer SPNEGO configurations. --- * [HBASE-22969](https://issues.apache.org/jira/browse/HBASE-22969) | *Minor* | **A new binary component comparator(BinaryComponentComparator) to perform comparison of arbitrary length and position** With BinaryComponentCompartor applications will be able to design diverse and powerful set of filters for rows and columns. See https://issues.apache.org/jira/browse/HBASE-22969 for example. In general, the comparator can be used with any filter taking ByteArrayComparable. As of now, following filters take ByteArrayComparable: 1. RowFilter 2. ValueFilter 3. QualifierFilter 4. FamilyFilter 5. ColumnValueFilter --- * [HBASE-23234](https://issues.apache.org/jira/browse/HBASE-23234) | *Major* | **Provide .editorconfig based on checkstyle configuration** Adds a .editorconfig file with configurations populated by IntelliJ, based on our checkstyle configuration. There's lots of IntelliJ-specific configs in here that I assume are not replicated to Eclipse or Netbeans users. Any devs using those tools should push whatever updates they see fit, but please start with the checkstyle configs as the origin of truth. --- * [HBASE-23322](https://issues.apache.org/jira/browse/HBASE-23322) | *Minor* | **[hbck2] Simplification on HBCKSCP scheduling** An hbck2 scheduleRecoveries will run a subclass of ServerCrashProcedure which asks Master what Regions were on the dead Server but it will also do a hbase:meta table scan to see if any vestiges of the old Server remain (for the case where an SCP failed mid-point leaving references in place or where Master and hbase:meta deviated in accounting). --- * [HBASE-23321](https://issues.apache.org/jira/browse/HBASE-23321) | *Minor* | **[hbck2] fixHoles of fixMeta doesn't update in-memory state** If holes in hbase:meta, hbck2 fixMeta now will update Master in-memory state so you do not need to restart master just so you can assign the new hole-bridging regions. --- * [HBASE-23282](https://issues.apache.org/jira/browse/HBASE-23282) | *Major* | **HBCKServerCrashProcedure for 'Unknown Servers'** hbck2 scheduleRecoveries will now run a SCP that also looks in hbase:meta for any references to the scheduled server -- not just consult Master in-memory state -- just in case vestiges of the server are leftover in hbase:meta --- * [HBASE-19450](https://issues.apache.org/jira/browse/HBASE-19450) | *Minor* | **Add log about average execution time for ScheduledChore** HBase internal chores now log a moving average of how long execution of each chore takes at `INFO` level for the logger `org.apache.hadoop.hbase.ScheduledChore`. Such messages will happen at most once per five minutes. --- * [HBASE-23250](https://issues.apache.org/jira/browse/HBASE-23250) | *Minor* | **Log message about CleanerChore delegate initialization should be at INFO** CleanerChore delegate initialization is now logged at INFO level instead of DEBUG --- * [HBASE-23243](https://issues.apache.org/jira/browse/HBASE-23243) | *Major* | **[pv2] Filter out SUCCESS procedures; on decent-sized cluster, plethora overwhelms problems** The 'Procedures & Locks' tab in Master UI only displays problematic Procedures now (RUNNABLE, WAITING-TIMEOUT, etc.). It no longer notes procedures whose state is SUCCESS. --- * [HBASE-23227](https://issues.apache.org/jira/browse/HBASE-23227) | *Blocker* | **Upgrade jackson-databind to 2.9.10.1 to avoid recent CVEs** the Apache HBase REST Proxy now uses Jackson Databind version 2.9.10.1 to address the following CVEs - CVE-2019-16942 - CVE-2019-16943 Users of prior releases with Jackson Databind 2.9.10 are advised to either upgrade to this release or to upgrade their local Jackson Databind jar directly. --- * [HBASE-23222](https://issues.apache.org/jira/browse/HBASE-23222) | *Critical* | **Better logging and mitigation for MOB compaction failures** The MOB compaction process in the HBase Master now logs more about its activity. In the event that you run into the problems described in HBASE-22075, there is a new HFileCleanerDelegate that will stop all removal of MOB hfiles from the archive area. It can be configured by adding `org.apache.hadoop.hbase.mob.ManualMobMaintHFileCleaner` to the list configured for `hbase.master.hfilecleaner.plugins`. This new cleaner delegate will cause your archive area to grow unbounded; you will have to manually prune files which may be prohibitively complex. Consider if your use case will allow you to mitigate by disabling mob compactions instead. Caveats: * Be sure the list of cleaner delegates still includes the default cleaners you will likely need: ttl, snapshot, and hlink. * Be mindful that if you enable this cleaner delegate then there will be *no* automated process for removing these mob hfiles. You should see a single region per table in `%hbase_root%/archive` that accumulates files over time. You will have to determine which of these files are safe or not to remove. * You should list this cleaner delegate after the snapshot and hlink delegates so that you can enable sufficient logging to determine when an archived mob hfile is needed by those subsystems. When set to `TRACE` logging, the CleanerChore logger will include archive retention decision justifications. * If your use case creates a large number of uniquely named tables, this new delegate will cause memory pressure on the master. --- * [HBASE-15519](https://issues.apache.org/jira/browse/HBASE-15519) | *Major* | **Add per-user metrics** Adds per-user metrics for reads/writes to each RegionServer. These metrics are exported by default. hbase.regionserver.user.metrics.enabled can be used to disable the feature if desired for any reason. --- * [HBASE-22460](https://issues.apache.org/jira/browse/HBASE-22460) | *Minor* | **Reopen a region if store reader references may have leaked** Leaked store files can not be removed even after it is invalidated via compaction. A reasonable mitigation for a reader reference leak would be a fast reopen of the region on the same server. Configs: 1. hbase.master.regions.recovery.check.interval : Regions Recovery Chore interval in milliseconds. This chore keeps running at this interval to find all regions with configurable max store file ref count and reopens them. Defaults to 20 mins 2. hbase.regions.recovery.store.file.ref.count : This config represents Store files Ref Count threshold value considered for reopening regions. Any region with store files ref count \> this value would be eligible for reopening by master. Default value -1 indicates this feature is turned off. Only positive integer value should be provided to enable this feature. --- * [HBASE-23172](https://issues.apache.org/jira/browse/HBASE-23172) | *Minor* | **HBase Canary region success count metrics reflect column family successes, not region successes** Added a comment to make clear that read/write success counts are tallying column family success counts, not region success counts. Additionally, the region read and write latencies previously only stored the latencies of the last column family of the region reads/writes. This has been fixed by using a map of each region to a list of read and write latency values. --- * [HBASE-23177](https://issues.apache.org/jira/browse/HBASE-23177) | *Major* | **If fail to open reference because FNFE, make it plain it is a Reference** Changes the message on the FNFE exception thrown when the file a Reference points to is missing; the message now includes detail on Reference as well as pointed-to file so can connect how FNFE relates to region open. --- * [HBASE-20626](https://issues.apache.org/jira/browse/HBASE-20626) | *Major* | **Change the value of "Requests Per Second" on WEBUI** Use 'totalRowActionRequestCount' to calculate QPS on web UI. --- * [HBASE-22874](https://issues.apache.org/jira/browse/HBASE-22874) | *Critical* | **Define a public interface for Canary and move existing implementation to LimitedPrivate** Downstream users who wish to programmatically check the health of their HBase cluster may now rely on a public interface derived from the previously private implementation of the canary cli tool. The interface is named `Canary` and can be found in the user facing javadocs. Downstream users who previously relied on the invoking the canary via the Java classname (either on the command line or programmatically) will need to change how they do so because the non-public implementation has moved. --- * [HBASE-23035](https://issues.apache.org/jira/browse/HBASE-23035) | *Major* | **Retain region to the last RegionServer make the failover slower** Since 2.0.0,when one regionserver crashed and back online again, AssignmentManager will retain the region locations and try assign the regions to this regionserver(same host:port with the crashed one) again. But for 1.x.x, the behavior is round-robin assignment for the regions belong to the crashed regionserver. This jira change the "retain" assignment to round-robin assignment, which is same with 1.x.x version. This change will make the failover faster and improve availability. --- * [HBASE-23046](https://issues.apache.org/jira/browse/HBASE-23046) | *Minor* | **Remove compatibility case from truncate command** Remove backward compatibility from \`truncate\` and \`truncate\_preserve\` shell commands. This means that these commands from HBase Clients are not compatible with pre-0.99 HBase clusters. --- * [HBASE-23040](https://issues.apache.org/jira/browse/HBASE-23040) | *Minor* | **region mover gives NullPointerException instead of saying a host isn't in the cluster** giving the region mover "unload" command a region server name that isn't recognized by the cluster results in a "I don't know about that host" message instead of a NPE. set log level to DEBUG if you'd like the region mover to log the set of region server names it got back from the cluster. --- * [HBASE-21874](https://issues.apache.org/jira/browse/HBASE-21874) | *Major* | **Bucket cache on Persistent memory** Added a new IOEngine type for Bucket cache ie Persistent memory. In order to use BC over pmem configure IOEngine as \ \hbase.bucketcache.ioengine\ \ pmem:///path in persistent memory \ \ --- * [HBASE-22760](https://issues.apache.org/jira/browse/HBASE-22760) | *Major* | **Stop/Resume Snapshot Auto-Cleanup activity with shell command** By default, snapshot auto cleanup based on TTL would be enabled for any new cluster. At any point in time, if snapshot cleanup is supposed to be stopped due to some snapshot restore activity or any other reason, it is advisable to disable it using shell command: hbase\> snapshot\_cleanup\_switch false We can re-enable it using: hbase\> snapshot\_cleanup\_switch true We can query whether snapshot auto cleanup is enabled for cluster using: hbase\> snapshot\_cleanup\_enabled --- * [HBASE-22796](https://issues.apache.org/jira/browse/HBASE-22796) | *Major* | **[HBCK2] Add fix of overlaps to fixMeta hbck Service** Adds fix of overlaps to the fixMeta hbck service method. Uses the bulk-merge facility. Merges a max of 10 at a time. Set hbase.master.metafixer.max.merge.count to higher if you want to do more than 10 in the one go. --- * [HBASE-21745](https://issues.apache.org/jira/browse/HBASE-21745) | *Critical* | **Make HBCK2 be able to fix issues other than region assignment** This issue adds via its subtasks: \* An 'HBCK Report' page to the Master UI added by HBASE-22527+HBASE-22709+HBASE-22723+ (since 2.1.6, 2.2.1, 2.3.0). Lists consistency or anomalies found via new hbase:meta consistency checking extensions added to CatalogJanitor (holes, overlaps, bad servers) and by a new 'HBCK chore' that runs at a lesser periodicity that will note filesystem orphans and overlaps as well as the following conditions: \*\* Master thought this region opened, but no regionserver reported it. \*\* Master thought this region opened on Server1, but regionserver reported Server2 \*\* More than one regionservers reported opened this region Both chores can be triggered from the shell to regenerate ‘new’ reports. \* Means of scheduling a ServerCrashProcedure (HBASE-21393). \* An ‘offline’ hbase:meta rebuild (HBASE-22680). \* Offline replace of hbase.version and hbase.id \* Documentation on how to use completebulkload tool to ‘adopt’ orphaned data found by new HBCK2 ‘filesystem’ check (see below) and ‘HBCK chore’ (HBASE-22859) \* A ‘holes’ and ‘overlaps’ fix that runs in the master that uses new bulk-merge facility to collapse many overlaps in the one go. \* hbase-operator-tools HBCK2 client tool got a bunch of additions: \*\* A specialized 'fix' for the case where operators ran old hbck 'offlinemeta' repair and destroyed their hbase:meta; it ties together holes in meta with orphaned data in the fs (HBASE-22567) \*\* A ‘filesystem’ command that reports on orphan data as well as bad references and hlinks with a ‘fix’ for the latter two options (based on hbck1 facility updated). \*\* Adds back the ‘replication’ fix facility from hbck1 (HBASE-22717) The compound result is that hbck2 is now in excess of hbck1 abilities. The provided functionality is disaggregated as per the hbck2 philosophy of providing 'plumbing' rather than 'porcelain' so there is work to do still adding fix-it playbooks, scripting across outages, and automation. --- * [HBASE-22802](https://issues.apache.org/jira/browse/HBASE-22802) | *Major* | **Avoid temp ByteBuffer allocation in FileIOEngine#read** HBASE-21879 introduces a utility class (org.apache.hadoop.hbase.io.ByteBuffAllocator) used for allocating/freeing ByteBuffers from/to NIO ByteBuffer pool, when BucketCache enabled with file or mmap engine, we will use this ByteBuffer pool to avoid temp ByteBuffer allocation a lot. --- * [HBASE-11062](https://issues.apache.org/jira/browse/HBASE-11062) | *Major* | **hbtop** Introduces hbtop that's a real-time monitoring tool for HBase like Unix's top command. See the ref guide for the details: https://hbase.apache.org/book.html#hbtop --- * [HBASE-21879](https://issues.apache.org/jira/browse/HBASE-21879) | *Major* | **Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose** Before this issue, read path was 100% offheap when block is in the BucketCache. But if a cache miss, then the RS needs to read the block via an on-heap API which causes high young-GC pressure. This issue adds reading the block via offheap even if reading the block from filesystem directly. It requires hadoop version(\>=2.9.3) but can also work with older hadoop versions (all works but we continue to read block onheap). It also requires HBASE-21946 which is not yet in place as of this writing/hbase-2.3.0. We have written a careful doc about the implementation, performance and practice here: https://docs.google.com/document/d/1xSy9axGxafoH-Qc17zbD2Bd--rWjjI00xTWQZ8ZwI\_E/edit#heading=h.nch5d72p27ex --- * [HBASE-22618](https://issues.apache.org/jira/browse/HBASE-22618) | *Major* | **added the possibility to load custom cost functions** Extends `StochasticLoadBalancer` to support user-provided cost function. These are loaded in addition to the default set of cost functions. Custom function implementations must extend `StochasticLoadBalancer$CostFunction`. Enable any additional functions by placing them on the master class path and configuring `hbase.master.balancer.stochastic.additionalCostFunctions` with a comma-separated list of fully-qualified class names. --- * [HBASE-22867](https://issues.apache.org/jira/browse/HBASE-22867) | *Critical* | **The ForkJoinPool in CleanerChore will spawn thousands of threads in our cluster with thousands table** Replace the ForkJoinPool in CleanerChore by ThreadPoolExecutor which can limit the spawn thread size and avoid the master GC frequently. The replacement is an internal implementation in CleanerChore, so no config key change, the upstream users can just upgrade the hbase master without any other change. --- * [HBASE-22810](https://issues.apache.org/jira/browse/HBASE-22810) | *Major* | **Initialize an separate ThreadPoolExecutor for taking/restoring snapshot** Introduced a new config key for the snapshot taking/restoring operations at master side: hbase.master.executor.snapshot.threads, its default value is 3. means we can have 3 snapshot operations running at the same time. --- * [HBASE-22863](https://issues.apache.org/jira/browse/HBASE-22863) | *Major* | **Avoid Jackson versions and dependencies with known CVEs** 1. Stopped exposing vulnerable Jackson1 dependencies so that downstreamers would not pull it in from HBase. 2. However, since Hadoop requires some Jackson1 dependencies, put vulnerable Jackson mapper at test scope in some HBase modules and hence, HBase tarball created by hbase-assembly contains Jackson1 mapper jar in lib. Still, downsteam applications can't pull in Jackson1 from HBase. --- * [HBASE-22841](https://issues.apache.org/jira/browse/HBASE-22841) | *Major* | **TimeRange's factory functions do not support ranges, only \`allTime\` and \`at\`** Add serveral API in TimeRange class for avoiding using the deprecated TimeRange constructor: \* TimeRange#from: Represents the time interval [minStamp, Long.MAX\_VALUE) \* TimeRange#until: Represents the time interval [0, maxStamp) \* TimeRange#between: Represents the time interval [minStamp, maxStamp) --- * [HBASE-22833](https://issues.apache.org/jira/browse/HBASE-22833) | *Minor* | **MultiRowRangeFilter should provide a method for creating a filter which is functionally equivalent to multiple prefix filters** Provide a public method in MultiRowRangeFilter class to speed the requirement of filtering with multiple row prefixes, it will expand the row prefixes as multiple rowkey ranges by MultiRowRangeFilter, it's more efficient. {code} public MultiRowRangeFilter(byte[][] rowKeyPrefixes); {code} --- * [HBASE-22856](https://issues.apache.org/jira/browse/HBASE-22856) | *Major* | **HBASE-Find-Flaky-Tests fails with pip error** Update the base docker image to ubuntu 18.04 for the find flaky tests jenkins job. --- * [HBASE-22771](https://issues.apache.org/jira/browse/HBASE-22771) | *Major* | **[HBCK2] fixMeta method and server-side support** Adds a fixMeta method to hbck Service. Fixes holes in hbase:meta. Follow-up to fix overlaps. See HBASE-22567 also. Follow-on is adding a client-side to hbase-operator-tools that can exploit this new addition (HBASE-22825) --- * [HBASE-22777](https://issues.apache.org/jira/browse/HBASE-22777) | *Major* | **Add a multi-region merge (for fixing overlaps, etc.)** Changes merge so you can merge more than two regions at a time. Currently only available inside HBase. HBASE-22827, a follow-on, is about exposing the facility in the Admin API (and then via the shell). --- * [HBASE-15666](https://issues.apache.org/jira/browse/HBASE-15666) | *Critical* | **shaded dependencies for hbase-testing-util** New shaded artifact for testing: hbase-shaded-testing-util. --- * [HBASE-22776](https://issues.apache.org/jira/browse/HBASE-22776) | *Major* | **Rename config names in user scan snapshot feature** After HBASE-22776, the steps to config user scan snapshot feature is as followings: 1. Check HDFS configuration 2. Add master coprocessor: hbase.coprocessor.master.classes= “org.apache.hadoop.hbase.security.access.AccessController, org.apache.hadoop.hbase.security.access.SnapshotScannerHDFSAclController” 3. Enable this feature: hbase.acl.sync.to.hdfs.enable=true 4. Modify table scheme to enable this feature for a table: alter 't1', CONFIGURATION =\> {'hbase.acl.sync.to.hdfs.enable' =\> 'true'} --- * [HBASE-22539](https://issues.apache.org/jira/browse/HBASE-22539) | *Blocker* | **WAL corruption due to early DBBs re-use when Durability.ASYNC\_WAL is used** We found a critical bug which can lead to WAL corruption when Durability.ASYNC\_WAL is used. The reason is that we release a ByteBuffer before actually persist the content into WAL file. The problem maybe lead to several errors, for example, ArrayIndexOfOutBounds when replaying WAL. This is because that the ByteBuffer is reused by others. ERROR org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while processing event RS\_LOG\_REPLAY java.lang.ArrayIndexOutOfBoundsException: 18056 at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1365) at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1358) at org.apache.hadoop.hbase.PrivateCellUtil.matchingFamily(PrivateCellUtil.java:735) at org.apache.hadoop.hbase.CellUtil.matchingFamily(CellUtil.java:816) at org.apache.hadoop.hbase.wal.WALEdit.isMetaEditFamily(WALEdit.java:143) at org.apache.hadoop.hbase.wal.WALEdit.isMetaEdit(WALEdit.java:148) at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:297) at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:195) at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:100) And may even cause segmentation fault and crash the JVM directly. You will see a hs\_err\_pidXXX.log file and usually the problem is SIGSEGV. This is usually because that the ByteBuffer has already been returned to the OS and used for other purpose. The problem has been reported several times in the past and this time Wellington Ramos Chevreuil provided the full logs and deeply analyzed the logs so we can find the root cause. And Lijin Bin figured out that the problem may only happen when Durability.ASYNC\_WAL is used. Thanks to them. The problem only effects the 2.x releases, all users are highly recommand to upgrade to a release which has this fix in, especially that if you use Durability.ASYNC\_WAL. --- * [HBASE-22737](https://issues.apache.org/jira/browse/HBASE-22737) | *Major* | **Add a new admin method and shell cmd to trigger the hbck chore to run** Add a new method runHbckChore in Hbck interface and a new shell cmd hbck\_chore\_run to request HBCK chore to run at master side. --- * [HBASE-22741](https://issues.apache.org/jira/browse/HBASE-22741) | *Major* | **Show catalogjanitor consistency complaints in new 'HBCK Report' page** Adds a "CatalogJanitor hbase:meta Consistency Issues" section to the new 'HBCK Report' page added by HBASE-22709. This section is empty unless the most recent CatalogJanitor scan turned up problems. If so, will show table of issues found. --- * [HBASE-22723](https://issues.apache.org/jira/browse/HBASE-22723) | *Major* | **Have CatalogJanitor report holes and overlaps; i.e. problems it sees when doing its regular scan of hbase:meta** When CatalogJanitor runs, it now checks for holes, overlaps, empty info:regioninfo columns and bad servers. Dumps findings into log. Follow-up adds report to new 'HBCK Report' linked off the Master UI. NOTE: All features but the badserver check made it into branch-2.1 and branch-2.0 backports. --- * [HBASE-22714](https://issues.apache.org/jira/browse/HBASE-22714) | *Trivial* | **BuffferedMutatorParams opertationTimeOut() is misspelt** The misspelled BufferedMutatorParams.opertationTimeout method has been marked as deprecated, and will be removed in 4.0.0. Please use the BufferedMutatorParams.operationTimeout method instead. --- * [HBASE-22580](https://issues.apache.org/jira/browse/HBASE-22580) | *Major* | **Add a table attribute to make user scan snapshot feature configurable for table** If a table user scan snapshots of the table, please config the following table scheme attribute to make granted users' ACLs are added to hfiles: alter 't1', CONFIGURATION =\> {'hbase.user.scan.snapshot.enable' =\> 'true'} --- * [HBASE-22709](https://issues.apache.org/jira/browse/HBASE-22709) | *Major* | **Add a chore thread in master to do hbck checking and display results in 'HBCK Report' page** 1. Add a new chore thread in master to do hbck checking 2. Add a new web ui "HBCK Report" page to display checking results. This feature is enabled by default. And the hbck chore run per 60 minutes by default. You can config "hbase.master.hbck.checker.interval" to a value lesser than or equal to 0 for disabling the chore. Notice: the config "hbase.master.hbck.checker.interval" was renamed to "hbase.master.hbck.chore.interval" in HBASE-22737. --- * [HBASE-22578](https://issues.apache.org/jira/browse/HBASE-22578) | *Major* | **HFileCleaner should not delete empty ns/table directories used for user san snapshot feature** The HFileCleaner will clean the empty directories under archive, but if enable user scan snaphot feature, the user ACLs are set at there directories, so please config the following cleaner to make the directories with user ACLs not be cleaned: hbase.master.hfilecleaner.plugins=org.apache.hadoop.hbase.security.access.SnapshotScannerHDFSAclCleaner --- * [HBASE-22722](https://issues.apache.org/jira/browse/HBASE-22722) | *Blocker* | **Upgrade jackson databind dependencies to 2.9.9.1** Upgrade jackson databind dependency to 2.9.9.1 due to CVEs https://nvd.nist.gov/vuln/detail/CVE-2019-12814 https://nvd.nist.gov/vuln/detail/CVE-2019-12384 --- * [HBASE-22527](https://issues.apache.org/jira/browse/HBASE-22527) | *Major* | **[hbck2] Add a master web ui to show the problematic regions** Add a new master web UI to show the potentially problematic opened regions. There are three case: 1. Master thought this region opened, but no regionserver reported it. 2. Master thought this region opened on Server1, but regionserver reported Server2 3. More than one regionservers reported opened this region --- * [HBASE-22648](https://issues.apache.org/jira/browse/HBASE-22648) | *Minor* | **Snapshot TTL** Feature: Take a Snapshot With TTL for auto-cleanup Attribute: 1. TTL - Specify TTL in sec while creating snapshot. e.g. snapshot 'mytable', 'snapshot1234', {TTL =\> 86400} (snapshot to be auto-cleaned after 24 hr) Configs: 1. Default Snapshot TTL: - FOREVER by default - User specified Default TTL(sec) with config: hbase.master.snapshot.ttl 2. If Snapshot cleanup is supposed to be stopped due to some snapshot restore activity, disable it with config: - hbase.master.cleaner.snapshot.disable: "true" With this config, HMaster needs restart just like any other hbase-site config. For more details, see the section "Take a Snapshot With TTL" in the HBase Reference Guide. --- * [HBASE-22610](https://issues.apache.org/jira/browse/HBASE-22610) | *Trivial* | **[BucketCache] Rename "hbase.offheapcache.minblocksize"** The config point "hbase.offheapcache.minblocksize" was wrong and is now deprecated. The new config point is "hbase.blockcache.minblocksize". --- * [HBASE-22690](https://issues.apache.org/jira/browse/HBASE-22690) | *Major* | **Deprecate / Remove OfflineMetaRepair in hbase-2+** OfflineMetaRepair is no longer supported in HBase-2+. Please refer to https://hbase.apache.org/book.html#HBCK2 This tool is deprecated in 2.x and will be removed in 3.0. --- * [HBASE-22673](https://issues.apache.org/jira/browse/HBASE-22673) | *Major* | **Avoid to expose protobuf stuff in Hbck interface** Mark the Hbck#scheduleServerCrashProcedure(List\ serverNames) as deprecated. Use Hbck#scheduleServerCrashProcedures(List\ serverNames) instead. --- * [HBASE-22617](https://issues.apache.org/jira/browse/HBASE-22617) | *Blocker* | **Recovered WAL directories not getting cleaned up** In HBASE-20734 we moved the recovered.edits onto the wal file system but when constructing the directory we missed the BASE\_NAMESPACE\_DIR('data'). So when using the default config, you will find that there are lots of new directories at the same level with the 'data' directory. In this issue, we add the BASE\_NAMESPACE\_DIR back, and also try our best to clean up the wrong directories. But we can only clean up the region level directories, so if you want a clean fs layout on HDFS you still need to manually delete the empty directories at the same level with 'data'. The effect versions are 2.2.0, 2.1.[1-5], 1.4.[8-10], 1.3.[3-5]. --- * [HBASE-21995](https://issues.apache.org/jira/browse/HBASE-21995) | *Major* | **Add a coprocessor to set HDFS ACL for hbase granted user** Add a coprocessor to set HDFS acls to make hbase granted users with READ permission have the access to scan snapshots. To use this feature, please make sure the HDFS config is set: dfs.namenode.acls.enabled=true fs.permissions.umask-mode=027 and set the HBase config: hbase.coprocessor.master.classes="org.apache.hadoop.hbase.security.access.AccessController,org.apache.hadoop.hbase.security.access.SnapshotScannerHDFSAclController" hbase.user.scan.snapshot.enable=true --- * [HBASE-22596](https://issues.apache.org/jira/browse/HBASE-22596) | *Minor* | **[Chore] Separate the execution period between CompactionChecker and PeriodicMemStoreFlusher** hbase.regionserver.compaction.check.period is used for controlling how often the compaction checker runs. If unset, will use hbase.server.thread.wakefrequency as default value. hbase.regionserver.flush.check.period is used for controlling how ofter the flush checker runs. If unset, will use hbase.server.thread.wakefrequency as default value. --- * [HBASE-22588](https://issues.apache.org/jira/browse/HBASE-22588) | *Major* | **Upgrade jaxws-ri dependency to 2.3.2** When run with JDK11 HBase now uses more recent version of the jaxws reference implementation (v2.3.2). --- * [HBASE-21536](https://issues.apache.org/jira/browse/HBASE-21536) | *Trivial* | **Fix completebulkload usage instructions** Added completebulkload short name for BulkLoadHFilesTool to bin/hbase. --- * [HBASE-22500](https://issues.apache.org/jira/browse/HBASE-22500) | *Blocker* | **Modify pom and jenkins jobs for hadoop versions** Change the default hadoop-3 version to 3.1.2. Drop the support for the releases which are effected by CVE-2018-8029, see this email https://lists.apache.org/thread.html/3d6831c3893cd27b6850aea2feff7d536888286d588e703c6ffd2e82@%3Cuser.hadoop.apache.org%3E --- * [HBASE-22459](https://issues.apache.org/jira/browse/HBASE-22459) | *Minor* | **Expose store reader reference count** This change exposes the aggregate count of store reader references for a given store as 'storeRefCount' in region metrics and ClusterStatus. --- * [HBASE-22469](https://issues.apache.org/jira/browse/HBASE-22469) | *Minor* | **replace md5 checksum in saveVersion script with sha512 for hbase version information** The HBase "source checksum" now uses SHA512 instead of MD5. --- * [HBASE-22148](https://issues.apache.org/jira/browse/HBASE-22148) | *Blocker* | **Provide an alternative to CellUtil.setTimestamp** The `CellUtil.setTimestamp` method changes to be an API with audience `LimitedPrivate(COPROC)` in HBase 3.0. With that designation the API should remain stable within a given minor release line, but may change between minor releases. Previously, this method was deprecated in HBase 2.0 for removal in HBase 3.0. Deprecation messages in HBase 2.y releases have been updated to indicate the expected API audience change. --- * [HBASE-20782](https://issues.apache.org/jira/browse/HBASE-20782) | *Minor* | **Fix duplication of TestServletFilter.access** The access method was used to the HttpServerFunctionalTest class as a common place. --- * [HBASE-21991](https://issues.apache.org/jira/browse/HBASE-21991) | *Major* | **Fix MetaMetrics issues - [Race condition, Faulty remove logic], few improvements** The class LossyCounting was unintentionally marked Public but was never intended to be part of our public API. This oversight has been corrected and LossyCounting is now marked as Private and going forward may be subject to additional breaking changes or removal without notice. If you have taken a dependency on this class we recommend cloning it locally into your project before upgrading to this release. --- * [HBASE-22226](https://issues.apache.org/jira/browse/HBASE-22226) | *Trivial* | **Incorrect level for headings in asciidoc** Warnings for level headings are corrected in the book for the HBase Incompatibilities section. --- * [HBASE-20970](https://issues.apache.org/jira/browse/HBASE-20970) | *Major* | **Update hadoop check versions for hadoop3 in hbase-personality** Add hadoop 3.0.3, 3.1.1 3.1.2 in our hadoop check jobs. --- * [HBASE-21784](https://issues.apache.org/jira/browse/HBASE-21784) | *Major* | **Dump replication queue should show list of wal files ordered chronologically** The DumpReplicationQueues tool will now list replication queues sorted in chronological order. --- * [HBASE-21048](https://issues.apache.org/jira/browse/HBASE-21048) | *Major* | **Get LogLevel is not working from console in secure environment** Support get\|set LogLevel in secure(kerberized) environment. --- * [HBASE-22384](https://issues.apache.org/jira/browse/HBASE-22384) | *Minor* | **Formatting issues in administration section of book** Fixes a formatting issue in the administration section of the book, where listing indentation were a little bit off. --- * [HBASE-22377](https://issues.apache.org/jira/browse/HBASE-22377) | *Major* | **Provide API to check the existence of a namespace which does not require ADMIN permissions** This change adds the new method listNamespaces to the Admin interface, which can be used to retrieve a list of the namespaces present in the schema as an unprivileged operation. Formerly the only available method for accomplishing this was listNamespaceDescriptors, which requires GLOBAL CREATE or ADMIN permissions. --- * [HBASE-22399](https://issues.apache.org/jira/browse/HBASE-22399) | *Major* | **Change default hadoop-two.version to 2.8.x and remove the 2.7.x hadoop checks** Now the default hadoop-two.version has been changed to 2.8.5, and all hadoop versions before 2.8.2(exclude) will not be supported any more. --- * [HBASE-22392](https://issues.apache.org/jira/browse/HBASE-22392) | *Trivial* | **Remove extra/useless +** Removed extra + in HRegion, HStore and LoadIncrementalHFiles for branch-2 and HRegion and HStore for branch-1. --- * [HBASE-20494](https://issues.apache.org/jira/browse/HBASE-20494) | *Major* | **Upgrade com.yammer.metrics dependency** Updated metrics core from 3.2.1 to 3.2.6. --- * [HBASE-22358](https://issues.apache.org/jira/browse/HBASE-22358) | *Minor* | **Change rubocop configuration for method length** The rubocop definition for the maximum method length was set to 75. --- * [HBASE-22379](https://issues.apache.org/jira/browse/HBASE-22379) | *Minor* | **Fix Markdown for "Voting on Release Candidates" in book** Fixes the formatting of the "Voting on Release Candidates" to actually show the quote and code formatting of the RAT check. --- * [HBASE-20851](https://issues.apache.org/jira/browse/HBASE-20851) | *Minor* | **Change rubocop config for max line length of 100** The rubocop configuration in the hbase-shell module now allows a line length with 100 characters, instead of 80 as before. For everything before 2.1.5 this change introduces rubocop itself. --- * [HBASE-22301](https://issues.apache.org/jira/browse/HBASE-22301) | *Minor* | **Consider rolling the WAL if the HDFS write pipeline is slow** This change adds new conditions for rolling the WAL for when syncs on the HDFS writer pipeline are perceived to be slow. As before the configuration parameter hbase.regionserver.wal.slowsync.ms sets the slow sync warning threshold. If we encounter hbase.regionserver.wal.slowsync.roll.threshold number of slow syncs (default 100) within the interval defined by hbase.regionserver.wal.slowsync.roll.interval.ms (default 1 minute), we will request a WAL roll. Or, if the time for any sync exceeds the threshold set by hbase.regionserver.wal.roll.on.sync.ms (default 10 seconds) we will request a WAL roll immediately. Operators can monitor how often these new thresholds result in a WAL roll by looking at newly added metrics to the WAL related metric group: \* slowSyncRollRequest - How many times a roll was requested due to sync too slow on the write pipeline. Additionally, as a part of this change there are also additional metrics for existing reasons for a WAL roll: \* errorRollRequest - How many times a roll was requested due to I/O or other errors. \* sizeRollRequest - How many times a roll was requested due to file size roll threshold. --- * [HBASE-21883](https://issues.apache.org/jira/browse/HBASE-21883) | *Minor* | **Enhancements to Major Compaction tool** MajorCompactorTTL Tool allows to compact all regions in a table that have been TTLed out. This saves space on DFS and is useful for tables which are similar to time series data. This is typically scheduled to run frequently (say via cron) to cleanup old data on an ongoing basis. RSGroupMajorCompactionTTL tool is similar to MajorCompactorTTL but runs at a region server group level. If multiple tables in an rsgroup are similar to time-series data, then it runs a single command to clean them up. As more tables are added/removed from rsgroup, it's easy to have a single command to take care of all of them. --- * [HBASE-22054](https://issues.apache.org/jira/browse/HBASE-22054) | *Minor* | **Space Quota: Compaction is not working for super user in case of NO\_WRITES\_COMPACTIONS** This change allows the system and superusers to initiate compactions, even when a space quota violation policy disallows compactions from happening. The original intent behind disallowing of compactions was to prevent end-user compactions from creating undue I/O load, not disallowing \*any\* compaction in the system. --- * [HBASE-22083](https://issues.apache.org/jira/browse/HBASE-22083) | *Minor* | **move eclipse specific configs into a profile** Maven project integration for Eclipse has been isolated into a maven profile to ensure it only is active when in an Eclipse project. Things should continue to behave the same for Eclipse users. If something should go wrong folks should manually activate the `eclipse-specific` profile. --- * [HBASE-22307](https://issues.apache.org/jira/browse/HBASE-22307) | *Major* | **Deprecated Preemptive Fail Fast** Deprecated Preemptive Fail Fast related constants in HConstants, the support of this feature will be removed in 3.0.0 so use these constants will have no effect for 3.0.0+ releases. And the constants will be kept till 4.0.0. Users can use 'hbase.client.perserver.requests.threshold' to control the number of concurrent requests to the same region server. Please see the release note of HBASE-16388 for more details. --- * [HBASE-22292](https://issues.apache.org/jira/browse/HBASE-22292) | *Blocker* | **PreemptiveFastFailInterceptor clean repeatedFailuresMap issue** Adds new configuration hbase.client.failure.map.cleanup.interval which defaults to ten minutes. --- * [HBASE-19222](https://issues.apache.org/jira/browse/HBASE-19222) | *Major* | **update jruby to 9.1.17.0** The default version of JRuby shipped with HBase has been updated to the JRuby 9.1.17.0 release. For details on changes see [the release notes for JRuby 9.1.17.0](https://www.jruby.org/2018/04/23/jruby-9-1-17-0) --- * [HBASE-22279](https://issues.apache.org/jira/browse/HBASE-22279) | *Major* | **Add a getRegionLocator method in Table/AsyncTable interface** Add below method in Table interface: RegionLocator getRegionLocator() throws IOException; Add below methods in AsyncTable interface: AsyncTableRegionLocator getRegionLocator(); CompletableFuture\ getDescriptor(); --- * [HBASE-15560](https://issues.apache.org/jira/browse/HBASE-15560) | *Major* | **TinyLFU-based BlockCache** LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and recency of the working set. It achieves concurrency by using an O(n) background thread to prioritize the entries and evict. Accessing an entry is O(1) by a hash table lookup, recording its logical access time, and setting a frequency flag. A write is performed in O(1) time by updating the hash table and triggering an async eviction thread. This provides ideal concurrency and minimizes the latencies by penalizing the thread instead of the caller. However the policy does not age the frequencies and may not be resilient to various workload patterns. This change introduces a new L1 policy, TinyLfuBlockCache, which records the frequency in a counting sketch, ages periodically by halving the counters, and orders entries by SLRU. An entry is discarded by comparing the frequency of the new arrival to the SLRU's victim, and keeping the one with the highest frequency. This allows the operations to be performed in O(1) time and, though the use of a compact sketch, a much larger history is retained beyond the current working set. In a variety of real world traces the policy had near optimal hit rates. New configuration variable hfile.block.cache.policy sets the eviction policy for the L1 block cache. The default is "LRU" (LruBlockCache). Set to "TinyLFU" to use TinyLfuBlockCache instead. --- * [HBASE-22178](https://issues.apache.org/jira/browse/HBASE-22178) | *Major* | **Introduce a createTableAsync with TableDescriptor method in Admin** Introduced Future\ createTableAsync(TableDescriptor); --- * [HBASE-22108](https://issues.apache.org/jira/browse/HBASE-22108) | *Major* | **Avoid passing null in Admin methods** Introduced these methods: void move(byte[]); void move(byte[], ServerName); Future\ splitRegionAsync(byte[]); These methods are deprecated: void move(byte[], byte[]) --- * [HBASE-22152](https://issues.apache.org/jira/browse/HBASE-22152) | *Major* | **Create a jenkins file for yetus to processing GitHub PR** Add a new jenkins file for running pre commit check for GitHub PR. --- * [HBASE-22007](https://issues.apache.org/jira/browse/HBASE-22007) | *Major* | **Add restoreSnapshot and cloneSnapshot with acl methods in AsyncAdmin** Add cloneSnapshot/restoreSnapshot with acl methods in AsyncAdmin. --- * [HBASE-22123](https://issues.apache.org/jira/browse/HBASE-22123) | *Minor* | **REST gateway reports Insufficient permissions exceptions as 404 Not Found** When insufficient permissions, you now get: HTTP/1.1 403 Forbidden on the HTTP side, and in the message Forbidden org.apache.hadoop.hbase.security.AccessDeniedException: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions for user ‘myuser',action: get, tableName:mytable, family:cf. at org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor.authorizeAccess(RangerAuthorizationCoprocessor.java:547) and the rest of the ADE stack --- * [HBASE-22100](https://issues.apache.org/jira/browse/HBASE-22100) | *Minor* | **False positive for error prone warnings in pre commit job** Now we will sort the javac WARNING/ERROR before generating diff in pre-commit so we can get a stable output for the error prone. The downside is that we just sort the output lexicographically so the line number will also be sorted lexicographically, which is a bit strange to human. --- * [HBASE-22057](https://issues.apache.org/jira/browse/HBASE-22057) | *Major* | **Impose upper-bound on size of ZK ops sent in a single multi()** Exposes a new configuration property "zookeeper.multi.max.size" which dictates the maximum size of deletes that HBase will make to ZooKeeper in a single RPC. This property defaults to 1MB, which should fall beneath the default ZooKeeper limit of 2MB, controlled by "jute.maxbuffer". --- * [HBASE-22052](https://issues.apache.org/jira/browse/HBASE-22052) | *Major* | **pom cleaning; filter out jersey-core in hadoop2 to match hadoop3 and remove redunant version specifications** Fixed awkward dependency issue that prevented site building. #### note specific to HBase 2.1.4 HBase 2.1.4 shipped with an early version of this fix that incorrectly altered the libraries included in our binary assembly for using Apache Hadoop 2.7 (the current build default Hadoop version for 2.1.z). For folks running out of the box against a Hadoop 2.7 cluster (or folks who skip the installation step of [replacing the bundled Hadoop libraries](http://hbase.apache.org/book.html#hadoop)) this will result in a failure at Region Server startup due to a missing class definition. e.g.: ``` 2019-03-27 09:02:05,779 ERROR [main] regionserver.HRegionServer: Failed construction RegionServer java.lang.NoClassDefFoundError: org/apache/htrace/SamplerBuilder at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:644) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:628) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2701) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2683) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:372) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:171) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:356) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.hadoop.hbase.util.CommonFSUtils.getRootDir(CommonFSUtils.java:362) at org.apache.hadoop.hbase.util.CommonFSUtils.isValidWALRootDir(CommonFSUtils.java:411) at org.apache.hadoop.hbase.util.CommonFSUtils.getWALRootDir(CommonFSUtils.java:387) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeFileSystem(HRegionServer.java:704) at org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:613) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:3029) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:63) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149) at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:3047) Caused by: java.lang.ClassNotFoundException: org.apache.htrace.SamplerBuilder at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 26 more ``` Workaround via any _one_ of the following: * If you are running against a Hadoop cluster that is 2.8+, ensure you replace the Hadoop libaries in the default binary assembly with those for your version. * If you are running against a Hadoop cluster that is 2.8+, build the binary assembly from the source release while specifying your Hadoop version. * If you are running against a Hadoop cluster that is a supported 2.7 release, ensure the `hadoop` executable is in the `PATH` seen at Region Server startup and that you are not using the `HBASE_DISABLE_HADOOP_CLASSPATH_LOOKUP` bypass. * For any supported Hadoop version, manually make the Apache HTrace artifact `htrace-core-3.1.0-incubating.jar` available to all Region Servers via the HBASE_CLASSPATH environment variable. * For any supported Hadoop version, manually make the Apache HTrace artifact `htrace-core-3.1.0-incubating.jar` available to all Region Servers by copying it into the directory `${HBASE_HOME}/lib/client-facing-thirdparty/`. --- * [HBASE-22065](https://issues.apache.org/jira/browse/HBASE-22065) | *Major* | **Add listTableDescriptors(List\) method in AsyncAdmin** Add a listTableDescriptors(List\) method in the AsyncAdmin interface, to align with the Admin interface. --- * [HBASE-22063](https://issues.apache.org/jira/browse/HBASE-22063) | *Major* | **Deprecated Admin.deleteSnapshot(byte[])** Deprecate Admin.deleteSnapshot(byte[]), please use the String version instead. --- * [HBASE-22040](https://issues.apache.org/jira/browse/HBASE-22040) | *Major* | **Add mergeRegionsAsync with a List of region names method in AsyncAdmin** Add a mergeRegionsAsync(byte[][], boolean) method in the AsyncAdmin interface. Instead of using assert, now we will throw IllegalArgumentException when you want to merge less than 2 regions at client side. And also, at master side, instead of using assert, now we will throw DoNotRetryIOException if you want merge more than 2 regions, since we only support merging two regions at once for now. --- * [HBASE-22039](https://issues.apache.org/jira/browse/HBASE-22039) | *Major* | **Should add the synchronous parameter for the XXXSwitch method in AsyncAdmin** Add drainXXX parameter for balancerSwitch/splitSwitch/mergeSwitch methods in the AsyncAdmin interface, which has the same meaning with the synchronous parameter for these methods in the Admin interface. --- * [HBASE-22044](https://issues.apache.org/jira/browse/HBASE-22044) | *Major* | **ByteBufferUtils should not be IA.Public API** As of HBase 3.0, the ByteBufferUtils class is now marked as a Private API for internal project use only. Downstream users are advised that it no longer has any compatibility promises across releases. As of earlier HBase release lines the class is now marked as deprecated to call attention to this planned transition. --- * [HBASE-21810](https://issues.apache.org/jira/browse/HBASE-21810) | *Major* | **bulkload support set hfile compression on client** bulkload (HFileOutputFormat2) support config the compression on client ,you can set the job configuration "hbase.mapreduce.hfileoutputformat.compression" override the auto-detection of the target table's compression --- * [HBASE-22001](https://issues.apache.org/jira/browse/HBASE-22001) | *Major* | **Polish the Admin interface** Add a cloneSnapshotAsync method with restoreAcl parameter. Deprecated restoreSnapshotAsync method as it just ignores the failsafe configuration. Make snapshotAsync method returns a Future\. Deprecated the snapshot related methods which take a 'byte[]' as the snapshot name. Use default methods to reduce the code base for implementation classes. --- * [HBASE-22000](https://issues.apache.org/jira/browse/HBASE-22000) | *Major* | **Deprecated isTableAvailable with splitKeys** Deprecated AsyncTable.isTableAvailable(TableName, byte[][]). --- * [HBASE-21871](https://issues.apache.org/jira/browse/HBASE-21871) | *Major* | **Support to specify a peer table name in VerifyReplication tool** After HBASE-21871, we can specify a peer table name with --peerTableName in VerifyReplication tool like the following: hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication --peerTableName=peerTable 5 TestTable In addition, we can compare any 2 tables in any remote clusters with specifying both peerId and --peerTableName. For example: hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication --peerTableName=peerTable zk1,zk2,zk3:2181/hbase TestTable --- * [HBASE-15728](https://issues.apache.org/jira/browse/HBASE-15728) | *Major* | **Add remaining per-table region / store / flush / compaction related metrics** Adds below flush, split, and compaction metrics + // split related metrics + private MutableFastCounter splitRequest; + private MutableFastCounter splitSuccess; + private MetricHistogram splitTimeHisto; + + // flush related metrics + private MetricHistogram flushTimeHisto; + private MetricHistogram flushMemstoreSizeHisto; + private MetricHistogram flushOutputSizeHisto; + private MutableFastCounter flushedMemstoreBytes; + private MutableFastCounter flushedOutputBytes; + + // compaction related metrics + private MetricHistogram compactionTimeHisto; + private MetricHistogram compactionInputFileCountHisto; + private MetricHistogram compactionInputSizeHisto; + private MetricHistogram compactionOutputFileCountHisto; + private MetricHistogram compactionOutputSizeHisto; + private MutableFastCounter compactedInputBytes; + private MutableFastCounter compactedOutputBytes; + + private MetricHistogram majorCompactionTimeHisto; + private MetricHistogram majorCompactionInputFileCountHisto; + private MetricHistogram majorCompactionInputSizeHisto; + private MetricHistogram majorCompactionOutputFileCountHisto; + private MetricHistogram majorCompactionOutputSizeHisto; + private MutableFastCounter majorCompactedInputBytes; + private MutableFastCounter majorCompactedOutputBytes; --- * [HBASE-21481](https://issues.apache.org/jira/browse/HBASE-21481) | *Major* | **[acl] Superuser's permissions should not be granted or revoked by any non-su global admin** HBASE-21481 improves the quality of access control, by strengthening the protection of super users's privileges. --- * [HBASE-21082](https://issues.apache.org/jira/browse/HBASE-21082) | *Critical* | **Reimplement assign/unassign related procedure metrics** Now we have four types of RIT procedure metrics, assign, unassign, move, reopen. The meaning of assign/unassign is changed, as we will not increase the unassign metric and then the assign metric when moving a region. Also introduced two new procedure metrics, open and close, which are used to track the open/close region calls to region server. We may send open/close multiple times to finish a RIT since we may retry multiple times. --- * [HBASE-20724](https://issues.apache.org/jira/browse/HBASE-20724) | *Critical* | **Sometimes some compacted storefiles are still opened after region failover** Problem: This is an old problem since HBASE-2231. The compaction event marker was only writed to WAL. But after flush, the WAL may be archived, which means an useful compaction event marker be deleted, too. So the compacted store files cannot be archived when region open and replay WAL. Solution: After this jira, the compaction event tracker will be writed to HFile. When region open and load store files, read the compaction evnet tracker from HFile and archive the compacted store files which still exist. --- * [HBASE-21820](https://issues.apache.org/jira/browse/HBASE-21820) | *Major* | **Implement CLUSTER quota scope** HBase contains two quota scopes: MACHINE and CLUSTER. Before this patch, set quota operations did not expose scope option to client api and use MACHINE as default, CLUSTER scope can not be set and used. Shell commands are as follows: set\_quota, TYPE =\> THROTTLE, TABLE =\> 't1', LIMIT =\> '10req/sec' This issue implements CLUSTER scope in a simple way: For user, namespace, user over namespace quota, use [ClusterLimit / RSNum] as machine limit. For table and user over table quota, use [ClusterLimit / TotalTableRegionNum \* MachineTableRegionNum] as machine limit. After this patch, user can set CLUSTER scope quota, but MACHINE is still default if user ignore scope. Shell commands are as follows: set\_quota, TYPE =\> THROTTLE, TABLE =\> 't1', LIMIT =\> '10req/sec' set\_quota, TYPE =\> THROTTLE, TABLE =\> 't1', LIMIT =\> '10req/sec', SCOPE =\> MACHINE set\_quota, TYPE =\> THROTTLE, TABLE =\> 't1', LIMIT =\> '10req/sec', SCOPE =\> CLUSTER --- * [HBASE-21057](https://issues.apache.org/jira/browse/HBASE-21057) | *Minor* | **upgrade to latest spotbugs** Change spotbugs version to 3.1.11. --- * [HBASE-21505](https://issues.apache.org/jira/browse/HBASE-21505) | *Major* | **Several inconsistencies on information reported for Replication Sources by hbase shell status 'replication' command.** This modifies "status 'replication'" output, fixing inconsistencies on the reporting times and ages of last shipped edits, as well as wrong calculation of replication lags. It also introduces additional info for each recovery queue, which was not accounted by this command before. The new output for "status 'replication'" command is explained in details below: a) Source started, target stopped, no edits arrived on source yet: ... SOURCE: PeerID=1 Normal Queue: 1 No Ops shipped since last restart, SizeOfLogQueue=1, No edits for this source since it started, Replication Lag=0 ... b) Source started, target stopped, add edit on source: ... Normal Queue: 1 No Ops shipped since last restart, SizeOfLogQueue=1, TimeStampOfLastArrivedInSource=Wed Nov 21 07:21:00 GMT 2018, Replication Lag=2459 ... c) Source started, target stopped, edit added on source, restart source: ... SOURCE: PeerID=1 Normal Queue: 1 No Ops shipped since last restart, SizeOfLogQueue=1, No edits for this source since it started, Replication Lag=0 Recovered Queue: 1-hbase01.home,16020,1542784524057 No Ops shipped since last restart, SizeOfLogQueue=1, TimeStampOfLastArrivedInSource=Wed Nov 21 07:23:00 GMT 2018, Replication Lag=201495 ... d) Source started, target stopped, add edit on source, restart source, add another edit on source: ... SOURCE: PeerID=1 Normal Queue: 1 No Ops shipped since last restart, SizeOfLogQueue=1, TimeStampOfLastArrivedInSource=Wed Nov 21 07:02:28 GMT 2018, Replication Lag=6349 Recovered Queue: 1-hbase01.home,16020,1542782758742 No Ops shipped since last restart, SizeOfLogQueue=0, TimeStampOfLastArrivedInSource=Wed Nov 21 06:53:05 GMT 2018, Replication Lag=569394 ... e) Source started, target stopped, add edit on source, restart source, add another edit on source, start target: ... SOURCE: PeerID=1 Normal Queue: 1 AgeOfLastShippedOp=30000, TimeStampOfLastShippedOp=Wed Nov 21 07:07:58 GMT 2018, SizeOfLogQueue=1, TimeStampOfLastArrivedInSource=Wed Nov 21 07:02:28 GMT 2018, Replication Lag=0 ... f) Source started, target stopped, add edit on source, restart source, restart target: ... SOURCE: PeerID=1 Normal Queue: 1 No Ops shipped since last restart, SizeOfLogQueue=1, No edits for this source since it started, Replication Lag=0 ... --- * [HBASE-21922](https://issues.apache.org/jira/browse/HBASE-21922) | *Major* | **BloomContext#sanityCheck may failed when use ROWPREFIX\_DELIMITED bloom filter** Remove bloom filter type ROWPREFIX\_DELIMITED. May add it back when find a better solution. --- * [HBASE-21783](https://issues.apache.org/jira/browse/HBASE-21783) | *Major* | **Support exceed user/table/ns throttle quota if region server has available quota** Support enable or disable exceed throttle quota. Exceed throttle quota means, user can over consume user/namespace/table quota if region server has additional available quota because other users don't consume at the same time. Use the following shell commands to enable/disable exceed throttle quota: enable\_exceed\_throttle\_quota disable\_exceed\_throttle\_quota There are two limits when enable exceed throttle quota: 1. Must set at least one read and one write region server throttle quota; 2. All region server throttle quotas must be in seconds time unit. Because once previous requests exceed their quota and consume region server quota, quota in other time units may be refilled in a long time, this may affect later requests. --- * [HBASE-20587](https://issues.apache.org/jira/browse/HBASE-20587) | *Major* | **Replace Jackson with shaded thirdparty gson** Remove jackson dependencies from most hbase modules except hbase-rest, use shaded gson instead. The output json will be a bit different since jackson can use getter/setter, but gson will always use the fields. --- * [HBASE-21928](https://issues.apache.org/jira/browse/HBASE-21928) | *Major* | **Deprecated HConstants.META\_QOS** Mark HConstants.META\_QOS as deprecated. It is for internal use only, which is the highest priority. You should not try to set a priority greater than or equal to this value, although it is no harm but also useless. --- * [HBASE-17942](https://issues.apache.org/jira/browse/HBASE-17942) | *Major* | **Disable region splits and merges per table** This patch adds the ability to disable split and/or merge for a table (By default, split and merge are enabled for a table). --- * [HBASE-21636](https://issues.apache.org/jira/browse/HBASE-21636) | *Major* | **Enhance the shell scan command to support missing scanner specifications like ReadType, IsolationLevel etc.** Allows shell to set Scan options previously not exposed. See additions as part of the scan help by typing following hbase shell: hbase\> help 'scan' --- * [HBASE-21201](https://issues.apache.org/jira/browse/HBASE-21201) | *Major* | **Support to run VerifyReplication MR tool without peerid** We can specify peerQuorumAddress instead of peerId in VerifyReplication tool. So it no longer requires peerId to be setup when using this tool. For example: hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication zk1,zk2,zk3:2181/hbase testTable --- * [HBASE-21838](https://issues.apache.org/jira/browse/HBASE-21838) | *Major* | **Create a special ReplicationEndpoint just for verifying the WAL entries are fine** Introduce a VerifyWALEntriesReplicationEndpoint which replicates nothing but only verifies if all the cells are valid. It can be used to capture bugs for writing WAL, as most times we will not read the WALs again after writing it if there are no region server crashes. --- * [HBASE-21764](https://issues.apache.org/jira/browse/HBASE-21764) | *Major* | **Size of in-memory compaction thread pool should be configurable** Introduced an new config key in this issue: hbase.regionserver.inmemory.compaction.pool.size. the default value would be 10. you can configure this to set the pool size of in-memory compaction pool. Note that all memstores in one region server will share the same pool, so if you have many regions in one region server, you need to set this larger to compact faster for better read performance. --- * [HBASE-21684](https://issues.apache.org/jira/browse/HBASE-21684) | *Major* | **Throw DNRIOE when connection or rpc client is closed** Make StoppedRpcClientException extend DoNotRetryIOException. --- * [HBASE-21739](https://issues.apache.org/jira/browse/HBASE-21739) | *Major* | **Move grant/revoke from regionserver to master** To implement user permission control in Precedure V2, move grant and revoke method from AccessController to master firstly. Mark AccessController#grant and AccessController#revoke as deprecated and please use Admin#grant and Admin#revoke instead. --- * [HBASE-21791](https://issues.apache.org/jira/browse/HBASE-21791) | *Blocker* | **Upgrade thrift dependency to 0.12.0** IMPORTANT: Due to security issues, all users who use hbase thrift should avoid using releases which do not have this fix. The effect releases are: 2.1.x: 2.1.2 and below 2.0.x: 2.0.4 and below 1.x: 1.4.x and below If you are using the effect releases above, please consider upgrading to a newer release ASAP. --- * [HBASE-20894](https://issues.apache.org/jira/browse/HBASE-20894) | *Major* | **Move BucketCache from java serialization to protobuf** For users who have configured hbase.bucketcache.ioengine with either the file:, files:, or mmap: prefix, and configured it to be persistent via the hbase.bucketcache.persistent.path property, the serialization format of the bucket cache has changed between versions. The old state will not be read during startup, and there is currently no migration path. The impact is expected to be minimal, however, since the cache will rebuild over time as access patterns dictate. # HBASE 2.3.0 Release Notes These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements. --- * [HBASE-24603](https://issues.apache.org/jira/browse/HBASE-24603) | *Critical* | **Zookeeper sync() call is async** Fixes a couple of bugs in ZooKeeper interaction. Firstly, zk sync() call that is used to sync the lagging followers with leader so that the client sees a consistent snapshot state was actually asynchronous under the hood. We make it synchronous for correctness. Second, zookeeper events are now processed in a separate thread rather than doing it in the thread context of zookeeper client connection. This decoupling frees up client connection quickly and avoids deadlocks. --- * [HBASE-24631](https://issues.apache.org/jira/browse/HBASE-24631) | *Major* | **Loosen Dockerfile pinned package versions of the "debian-revision"** Update our package version numbers throughout the Dockerfiles to be pinned to their epic:upstream-version components only. Previously we'd specify the full debian package version number, including the debian-revision. This lead to instability as debian packaging details changed. See also [man deb-version](http://manpages.ubuntu.com/manpages/xenial/en/man5/deb-version.5.html) --- * [HBASE-24205](https://issues.apache.org/jira/browse/HBASE-24205) | *Major* | **Create metric to know the number of reads that happens from memstore** Adds a new metric where we collect the number of read requests (tracked per row) whether the row was fetched completely from memstore or it was pulled from files and memstore. The metric is now collected under the mbean for Tables and under the mbean for regions. Under table mbean ie.- 'name": "Hadoop:service=HBase,name=RegionServer,sub=Tables' The new metrics will be listed as {code} "Namespace\_default\_table\_t3\_columnfamily\_f1\_metric\_memstoreOnlyRowReadsCount": 5, "Namespace\_default\_table\_t3\_columnfamily\_f1\_metric\_mixedRowReadsCount": 1, {code} Where the format is Namespace\_\\_table\_\\_columnfamily\_\\_metric\_memstoreOnlyRowReadsCount Namespace\_\\_table\_\\_columnfamily\_\\_metric\_mixedRowReadsCount {code} The same one under the region ie. "name": "Hadoop:service=HBase,name=RegionServer,sub=Regions", comes as {code} "Namespace\_default\_table\_t3\_region\_75a7846f4ac4a2805071a855f7d0dbdc\_store\_f1\_metric\_memstoreOnlyRowReadsCount": 5, "Namespace\_default\_table\_t3\_region\_75a7846f4ac4a2805071a855f7d0dbdc\_store\_f1\_metric\_mixedRowReadsCount": 1, {code} where Namespace\_\\_region\_\\_store\_\\_metric\_memstoreOnlyRowReadsCount Namespace\_\\_region\_\\_store\_\\_metric\_mixedRowReadsCount This is also an aggregate against every store the number of reads that happened purely from the memstore or it was a mixed read that happened from memstore and file. --- * [HBASE-21773](https://issues.apache.org/jira/browse/HBASE-21773) | *Critical* | **rowcounter utility should respond to pleas for help** This adds [-h\|-help] options to rowcounter. Passing either -h or -help will print rowcounter guide as below: $hbase rowcounter -h usage: hbase rowcounter \ [options] [\ \...] Options: --starttime=\ starting time filter to start counting rows from. --endtime=\ end time filter limit, to only count rows up to this timestamp. --range=\ [startKey],[endKey][;[startKey],[endKey]...]] --expectedCount=\ expected number of rows to be count. For performance, consider the following configuration properties: -Dhbase.client.scanner.caching=100 -Dmapreduce.map.speculative=false --- * [HBASE-24217](https://issues.apache.org/jira/browse/HBASE-24217) | *Major* | **Add hadoop 3.2.x support** CI coverage has been extended to include Hadoop 3.2.x for HBase 2.2+. --- * [HBASE-23055](https://issues.apache.org/jira/browse/HBASE-23055) | *Major* | **Alter hbase:meta** Adds being able to edit hbase:meta table schema. For example, hbase(main):006:0\> alter 'hbase:meta', {NAME =\> 'info', DATA\_BLOCK\_ENCODING =\> 'ROW\_INDEX\_V1'} Updating all regions with the new schema... All regions updated. Done. Took 1.2138 seconds You can even add columnfamilies. Howevert, you cannot delete any of the core hbase:meta column families such as 'info' and 'table'. --- * [HBASE-15161](https://issues.apache.org/jira/browse/HBASE-15161) | *Major* | **Umbrella: Miscellaneous improvements from production usage** This ticket summarizes significant improvements and expansion to the metrics surface area. Interested users should review the individual sub-tasks. --- * [HBASE-24545](https://issues.apache.org/jira/browse/HBASE-24545) | *Major* | **Add backoff to SCP check on WAL split completion** Adds backoff in ServerCrashProcedure wait on WAL split to complete if large backlog of files to split (Its possible to avoid SCP blocking, waiting on WALs to split if you use procedure-based splitting -- set 'hbase.split.wal.zk.coordinated' to false to enable procedure based wal splitting.) --- * [HBASE-24524](https://issues.apache.org/jira/browse/HBASE-24524) | *Minor* | **SyncTable logging improvements** Notice this has changed log level for mismatching row keys, originally those were being logged at INFO level, now it's logged at DEBUG level. This is consistent with the logging of mismatching cells. Also, for missing row keys, it now logs row key values in human readable format, making it more meaningful for operators troubleshooting mismatches. --- * [HBASE-24359](https://issues.apache.org/jira/browse/HBASE-24359) | *Major* | **Optionally ignore edits for deleted CFs for replication.** Introduce a new config hbase.replication.drop.on.deleted.columnfamily, default is false. When config to true, the replication will drop the edits for columnfamily that has been deleted from the replication source and target. --- * [HBASE-24418](https://issues.apache.org/jira/browse/HBASE-24418) | *Major* | **Consolidate Normalizer implementations** This change extends the Normalizer with a handful of new configurations. The configuration points supported are: * `hbase.normalizer.split.enabled` Whether to split a region as part of normalization. Default: `true`. * `hbase.normalizer.merge.enabled` Whether to merge a region as part of normalization. Default `true`. * `hbase.normalizer.min.region.count` The minimum number of regions in a table to consider it for merge normalization. Default: 3. * `hbase.normalizer.merge.min_region_age.days` The minimum age for a region to be considered for a merge, in days. Default: 3. * `hbase.normalizer.merge.min_region_size.mb` The minimum size for a region to be considered for a merge, in whole MBs. Default: 1. --- * [HBASE-24309](https://issues.apache.org/jira/browse/HBASE-24309) | *Major* | **Avoid introducing log4j and slf4j-log4j dependencies for modules other than hbase-assembly** Add a hbase-logging module, put the log4j related code in this module only so other modules do not need to depend on log4j at compile scope. See the comments of Log4jUtils and InternalLog4jUtils for more details. Add a log4j.properties to the test jar of hbase-logging module, so for other sub modules we just need to depend on the test jar of hbase-logging module at test scope to output the log to console, without placing a log4j.properties in the test resources as they all (almost) have the same content. And this test module will not be included in the assembly tarball so it will not mess up the binary distribution. Ban direct commons-logging dependency, and ban commons-logging and log4j imports in non-test code, to avoid mess up the downstream users logging framework. In hbase-logging module we do need to use log4j classes and the trick is to use full class name. Add jcl-over-slf4j and jul-to-slf4j dependencies, as some of our dependencies use jcl or jul as logging framework, we should also redirect their log message to slf4j. --- * [HBASE-21406](https://issues.apache.org/jira/browse/HBASE-21406) | *Minor* | **"status 'replication'" should not show SINK if the cluster does not act as sink** Added new metric to differentiate sink startup time from last OP applied time. Original behaviour was to always set startup time to TimestampsOfLastAppliedOp, and always show it on "status 'replication'" command, regardless if the sink ever applied any OP. This was confusing, specially for scenarios where cluster was just acting as source, the output could lead to wrong interpretations about sink not applying edits or replication being stuck. With the new metric, we now compare the two metrics values, assuming that if both are the same, there's never been any OP shipped to the given sink, so output would reflect it more clearly, to something as for example: SINK: TimeStampStarted=Thu Dec 06 23:59:47 GMT 2018, Waiting for OPs... --- * [HBASE-24132](https://issues.apache.org/jira/browse/HBASE-24132) | *Major* | **Upgrade to Apache ZooKeeper 3.5.7** HBase ships ZooKeeper 3.5.x. Was the EOL'd 3.4.x. 3.5.x client can talk to 3.4.x ensemble. The ZooKeeper project has built a [FAQ](https://cwiki.apache.org/confluence/display/ZOOKEEPER/Upgrade+FAQ) that documents known issues and work-arounds when upgrading existing deployments. --- * [HBASE-22287](https://issues.apache.org/jira/browse/HBASE-22287) | *Major* | **inifinite retries on failed server in RSProcedureDispatcher** Add backoff. Avoid retrying every 100ms. --- * [HBASE-24425](https://issues.apache.org/jira/browse/HBASE-24425) | *Major* | **Run hbck\_chore\_run and catalogjanitor\_run on draw of 'HBCK Report' page** Runs 'catalogjanitor\_run' and 'hbck\_chore\_run' inline with the loading of the 'HBCK Report' page. Pass '?cache=true' to skip inline invocation of 'catalogjanitor\_run' and 'hbck\_chore\_run' drawing the page. --- * [HBASE-24408](https://issues.apache.org/jira/browse/HBASE-24408) | *Blocker* | **Introduce a general 'local region' to store data on master** Introduced a general 'local region' at master side to store the procedure data, etc. The hfile of this region will be stored on the root fs while the wal will be stored on the wal fs. This issue supercedes part of the code for HBASE-23326, as now we store the data in 'MasterData' directory instead of 'MasterProcs'. The old hfiles will be moved to the global hfile archived directory with the suffix $-masterlocalhfile-$. The wal files will be moved to the global old wal directory with the suffix $masterlocalwal$. The TimeToLiveMasterLocalStoreHFileCleaner and TimeToLiveMasterLocalStoreWALCleaner are configured by default for cleaning the old hfiles and wal files, and the default TTLs are both 7 days. --- * [HBASE-24115](https://issues.apache.org/jira/browse/HBASE-24115) | *Major* | **Relocate test-only REST "client" from src/ to test/ and mark Private** Relocate test-only REST RemoteHTable and RemoteAdmin from src/ to test/. And mark them as InterfaceAudience.Private. --- * [HBASE-23938](https://issues.apache.org/jira/browse/HBASE-23938) | *Major* | **Replicate slow/large RPC calls to HDFS** Config key: hbase.regionserver.slowlog.systable.enabled Default value: false This config can be enabled if hbase.regionserver.slowlog.buffer.enabled is already enabled. While hbase.regionserver.slowlog.buffer.enabled ensures that any slow/large RPC logs with complete details are written to ring buffer available at each RegionServer, hbase.regionserver.slowlog.systable.enabled would ensure that all such logs are also persisted in new system table hbase:slowlog. Operator can scan hbase:slowlog with filters to retrieve specific attribute matching records and this table would be useful to capture historical performance of slowness of RPC calls with detailed analysis. hbase:slowlog consists of single ColumnFamily info. info consists of multiple qualifiers similar to the attributes available to query as part of Admin API: get\_slowlog\_responses. One example of a row from hbase:slowlog scan result (Attached a sample screenshot in the Jira) : \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:call\_details, timestamp=2020-05-16T14:59:58.764Z, value=Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest) \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:client\_address, timestamp=2020-05-16T14:59:58.764Z, value=172.20.10.2:57348 \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:method\_name, timestamp=2020-05-16T14:59:58.764Z, value=Scan \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:param, timestamp=2020-05-16T14:59:58.764Z, value=region { type: REGION\_NAME value: "cluster\_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf." } scan { a ttribute { name: "\_isolationlevel\_" value: "\\x5C000" } start\_row: "cccccccc" time\_range { from: 0 to: 9223372036854775807 } max\_versions: 1 cache\_blocks: true max\_result\_size: 2 097152 caching: 2147483647 include\_stop\_row: false } number\_of\_rows: 2147483647 close\_scanner: false client\_handles\_partials: true client\_handles\_heartbeats: true track\_scan\_met rics: false \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:processing\_time, timestamp=2020-05-16T14:59:58.764Z, value=24 \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:queue\_time, timestamp=2020-05-16T14:59:58.764Z, value=0 \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:region\_name, timestamp=2020-05-16T14:59:58.764Z, value=cluster\_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf. \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:response\_size, timestamp=2020-05-16T14:59:58.764Z, value=211227 \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:server\_class, timestamp=2020-05-16T14:59:58.764Z, value=HRegionServer \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:start\_time, timestamp=2020-05-16T14:59:58.764Z, value=1589640743932 \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:type, timestamp=2020-05-16T14:59:58.764Z, value=ALL \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:username, timestamp=2020-05-16T14:59:58.764Z, value=vjasani --- * [HBASE-24271](https://issues.apache.org/jira/browse/HBASE-24271) | *Major* | **Set values in \`conf/hbase-site.xml\` that enable running on \`LocalFileSystem\` out of the box** HBASE-24271 makes changes the the default `conf/hbase-site.xml` such that `bin/hbase` will run directly out of the binary tarball or a compiled source tree without any configuration modifications vs. Hadoop 2.8+. This changes our long-standing history of shipping no configured values in `conf/hbase-site.xml`, so existing processes that assume this file is empty of configuration properties may require attention. --- * [HBASE-24310](https://issues.apache.org/jira/browse/HBASE-24310) | *Major* | **Use Slf4jRequestLog for hbase-http** Use Slf4jRequestLog instead of the log4j HttpRequestLogAppender in HttpServer. The request log is disabled by default in conf/log4j.properties by the following lines: # Disable request log by default, you can enable this by changing the appender log4j.category.http.requests=INFO,NullAppender log4j.additivity.http.requests=false Change the 'NullAppender' to what ever you want if you want to enable request log. Notice that, the logger name for master status http server is 'http.requests.master', and for region server it is 'http.requests.regionserver' --- * [HBASE-24335](https://issues.apache.org/jira/browse/HBASE-24335) | *Major* | **Support deleteall with ts but without column in shell mode** Use a empty string to represent no column specified for deleteall in shell mode. useage: deleteall 'test','r1','',12345 deleteall 'test', {ROWPREFIXFILTER =\> 'prefix'}, '', 12345 --- * [HBASE-24304](https://issues.apache.org/jira/browse/HBASE-24304) | *Major* | **Separate a hbase-asyncfs module** Added a new hbase-asyncfs module to hold the asynchronous dfs output stream implementation for implementing WAL. --- * [HBASE-22710](https://issues.apache.org/jira/browse/HBASE-22710) | *Major* | **Wrong result in one case of scan that use raw and versions and filter together** Make the logic of the versions chosen more reasonable for raw scan, to avoid lose result when using filter. --- * [HBASE-24285](https://issues.apache.org/jira/browse/HBASE-24285) | *Major* | **Move to hbase-thirdparty-3.3.0** Moved to hbase-thirdparty 3.3.0. --- * [HBASE-24252](https://issues.apache.org/jira/browse/HBASE-24252) | *Major* | **Implement proxyuser/doAs mechanism for hbase-http** This feature enables the HBase Web UI's to accept a 'proxyuser' via the HTTP Request's query string. When the parameter \`hbase.security.authentication.spnego.kerberos.proxyuser.enable\` is set to \`true\` in hbase-site.xml (default is \`false\`), the HBase UI will attempt to impersonate the user specified by the query parameter "doAs". This query parameter is checked case-insensitively. When this option is not provided, the user who executed the request is the "real" user and there is no ability to execute impersonation against the WebUI. For example, if the user "bob" with Kerberos credentials executes a request against the WebUI with this feature enabled and a query string which includes \`doAs=alice\`, the HBase UI will treat this request as executed as \`alice\`, not \`bob\`. The standard Hadoop proxyuser configuration properties to limit users who may impersonate others apply to this change (e.g. to enable \`bob\` to impersonate \`alice\`). See the Hadoop documentation for more information on how to configure these proxyuser rules. --- * [HBASE-24143](https://issues.apache.org/jira/browse/HBASE-24143) | *Major* | **[JDK11] Switch default garbage collector from CMS** `bin/hbase` will now dynamically select a Garbage Collector implementation based on the detected JVM version. JDKs 8,9,10 use `-XX:+UseConcMarkSweepGC`, while JDK11+ use `-XX:+UseG1GC`. Notice a slight compatibility change. Previously, the garbage collector choice would always be appended to a user-provided value for `HBASE_OPTS`. As of this change, this setting will only be applied when `HBASE_OPTS` is unset. That means that operators who provide a value for this variable will now need to also specify the collector. This is especially important for those on JDK8, where the vm default GC is not the recommended ConcMarkSweep. --- * [HBASE-24024](https://issues.apache.org/jira/browse/HBASE-24024) | *Major* | **Optionally reject multi() requests with very high no of rows** New Config: hbase.rpc.rows.size.threshold.reject ----------------------------------------------------------------------- Default value: false Description: If value is true, RegionServer will abort batch requests of Put/Delete with number of rows in a batch operation exceeding threshold defined by value of config: hbase.rpc.rows.warning.threshold. --- * [HBASE-24139](https://issues.apache.org/jira/browse/HBASE-24139) | *Critical* | **Balancer should avoid leaving idle region servers** StochasticLoadBalancer functional improvement: StochasticLoadBalancer would rebalance the cluster if there are any idle RegionServers in the cluster (RegionServer having no region), while other RegionServers have at least 1 region available. --- * [HBASE-24196](https://issues.apache.org/jira/browse/HBASE-24196) | *Major* | **[Shell] Add rename rsgroup command in hbase shell** user or admin can now use hbase shell \> rename\_rsgroup 'oldname', 'newname' to rename rsgroup. --- * [HBASE-24218](https://issues.apache.org/jira/browse/HBASE-24218) | *Major* | **Add hadoop 3.2.x in hadoop check** Add hadoop-3.2.0 and hadoop-3.2.1 in hadoop check and when '--quick-hadoopcheck' we will only check hadoop-3.2.1. Notice that, for aligning the personality scripts across all the active branches, we will commit the patch to all active branches, but the hadoop-3.2.x support in hadoopcheck is only applied to branch-2.2+. --- * [HBASE-23829](https://issues.apache.org/jira/browse/HBASE-23829) | *Major* | **Get \`-PrunSmallTests\` passing on JDK11** \`-PrunSmallTests\` now pass on JDK11 when using \`-Phadoop.profile=3.0\`. --- * [HBASE-24185](https://issues.apache.org/jira/browse/HBASE-24185) | *Major* | **Junit tests do not behave well with System.exit or Runtime.halt or JVM exits in general.** Tests that fail because a process -- RegionServer or Master -- called System.exit, will now instead throw an exception. --- * [HBASE-24072](https://issues.apache.org/jira/browse/HBASE-24072) | *Major* | **Nightlies reporting OutOfMemoryError: unable to create new native thread** Hadoop hosts have had their ulimit -u raised from 10000 to 30000 (per user, by INFRA). The Docker build container has had its limit raised from 10000 to 12500. --- * [HBASE-24112](https://issues.apache.org/jira/browse/HBASE-24112) | *Major* | **[RSGroup] Support renaming rsgroup** Support RSGroup renaming in core codebase. New API Admin#renameRSGroup(String, String) is introduced in 3.0.0. --- * [HBASE-23994](https://issues.apache.org/jira/browse/HBASE-23994) | *Trivial* | ** Add WebUI to Canary** The Canary tool now offers a WebUI when run in `region` mode (the default mode). It is enabled by default, and by default, it binds to `0.0.0.0:16050`. This can be overridden by setting `hbase.canary.info.bindAddress` and `hbase.canary.info.port`. To disable entirely, set the port to `-1`. --- * [HBASE-23779](https://issues.apache.org/jira/browse/HBASE-23779) | *Major* | **Up the default fork count to make builds complete faster; make count relative to CPU count** Pass --threads=2 building on jenkins. It shortens nightly build times by about ~25%. It works by running module build/test in parallel when dependencies allow. Upping the forkcount beyond the pom default of 0.25C would have us broach our CPU budget on jenkins when two modules are running in parallel (2 modules at 0.25% of CPU each makes 0.5C and on jenkins, hadoop nodes run two jenkins executors per host). Higher forkcounts also seems to threaten build stability. For running tests locally, to go faster, up fork count. $ x="0.5C" ; mvn --threads=2 -Dsurefire.firstPartForkCount=$x -Dsurefire.secondPartForkCount=$x test -PrunAllTests You could up the x from 0.5C to 1.0C but YMMV (On overcommitted hardware, tests start bombing out pretty soon after startup). You could try upping thread count but on occasion are likely to overcommit hardware. --- * [HBASE-24126](https://issues.apache.org/jira/browse/HBASE-24126) | *Major* | **Up the container nproc uplimit from 10000 to 12500** Start docker with upped ulimit for nproc passing '--ulimit nproc=12500'. It was 10000, the default, but made it 12500. Then, set PROC\_LIMIT in hbase-personality so when yetus runs, it is w/ the new 12500 value. --- * [HBASE-24150](https://issues.apache.org/jira/browse/HBASE-24150) | *Major* | **Allow module tests run in parallel** Pass -T2 to mvn. Makes it so we do two modules-at-a-time dependencies willing. Helps speed build and testing. Doubles the resource usage when running modules in parallel. --- * [HBASE-24121](https://issues.apache.org/jira/browse/HBASE-24121) | *Major* | **[Authorization] ServiceAuthorizationManager isn't dynamically updatable. And it should be.** Master & RegionService now support refresh policy authorization defined in hbase-policy.xml without restarting service. To refresh policy, please execute hbase shell command: update\_config or update\_config\_all after policy file updated and synced on all nodes. --- * [HBASE-24099](https://issues.apache.org/jira/browse/HBASE-24099) | *Major* | **Use a fair ReentrantReadWriteLock for the region close lock** This change modifies the default acquisition policy for the region's close lock in order to prevent observed starvation of close requests. The new boolean configuration parameter 'hbase.regionserver.fair.region.close.lock' controls the lock acquisition policy: if true, the lock is created in fair mode (default); if false, the lock is created in nonfair mode (the old default). --- * [HBASE-23153](https://issues.apache.org/jira/browse/HBASE-23153) | *Major* | **PrimaryRegionCountSkewCostFunction SLB function should implement CostFunction#isNeeded** The `PrimaryRegionCountSkewCostFunction` for the `StochasticLoadBalancer` is only needed when the read replicas feature is enabled. With this change, that function now properly indicates that it is not needed when the read replica feature is off. If this improvement is not available, operators with clusters that are not using the read replica feature should manually disable it by setting `hbase.master.balancer.stochastic.primaryRegionCountCost` to `0.0` in hbase-site.xml for all HBase Masters. --- * [HBASE-24055](https://issues.apache.org/jira/browse/HBASE-24055) | *Major* | **Make AsyncFSWAL can run on EC cluster** Now AsyncFSWAL can also be used against the directory which has EC enabled. Need to make sure you also make use of the hadoop 3.x client as the option is only available in hadoop 3.x. --- * [HBASE-24113](https://issues.apache.org/jira/browse/HBASE-24113) | *Major* | **Upgrade the maven we use from 3.5.4 to 3.6.3 in nightlies** Branches-2.3+ use maven 3.5.3 building. Older branches use 3.5.4 still. --- * [HBASE-24122](https://issues.apache.org/jira/browse/HBASE-24122) | *Major* | **Change machine ulimit-l to ulimit-a so dumps full ulimit rather than just 'max locked memory'** Our 'Build Artifacts' have a machine directory under which we emit vitals on the host the build was run on. We used to emit the result of 'ulimit -l' as a file named 'ulimit-l'. This has been hijacked to instead emit result of running 'ulimit -a' which includes stat on ulimit -l. --- * [HBASE-23678](https://issues.apache.org/jira/browse/HBASE-23678) | *Major* | **Literate builder API for version management in schema** ColumnFamilyDescriptor new builder API: /\*\* \* Retain all versions for a given TTL(retentionInterval), and then only a specific number \* of versions(versionAfterInterval) after that interval elapses. \* \* @param retentionInterval Retain all versions for this interval \* @param versionAfterInterval Retain no of versions to retain after retentionInterval \*/ public ModifyableColumnFamilyDescriptor setVersionsWithTimeToLive( final int retentionInterval, final int versionAfterInterval) --- * [HBASE-24050](https://issues.apache.org/jira/browse/HBASE-24050) | *Major* | **Deprecated PBType on all 2.x branches** org.apache.hadoop.hbase.types.PBType is marked as deprecated without any replacement. It will be moved to hbase-example module and marked as IA.Private in 3.0.0. This is a mistake as it should not be part of our public API. Users who depend on this class should just copy the code your own code base. --- * [HBASE-8868](https://issues.apache.org/jira/browse/HBASE-8868) | *Minor* | **add metric to report client shortcircuit reads** Expose file system level read metrics for RegionServer. If the HBase RS runs on top of HDFS, calculate the aggregation of ReadStatistics of each HdfsFileInputStream. These metrics include: (1) total number of bytes read from HDFS. (2) total number of bytes read from local DataNode. (3) total number of bytes read locally through short-circuit read. (4) total number of bytes read locally through zero-copy read. Because HDFS ReadStatistics is calculated per input stream, it is not feasible to update the aggregated number in real time. Instead, the metrics are updated when an input stream is closed. --- * [HBASE-24032](https://issues.apache.org/jira/browse/HBASE-24032) | *Major* | **[RSGroup] Assign created tables to respective rsgroup automatically instead of manual operations** Admin can determine which tables go to which rsgroup by script (setting hbase.rsgroup.table.mapping.script with local filystem path) on Master side which aims to lighten the burden of admin operations. Note, since HBase 3+, rsgroup can be specified in TableDescriptor as well, if clients specify this, master will skip the determination from script. Here is a simple example of script: {code} # Input consists of two string, 1st is the namespace of the table, 2nd is the table name of the table #!/bin/bash namespace=$1 tablename=$2 if [[ $namespace == test ]]; then echo test elif [[ $tablename == \*foo\* ]]; then echo other else echo default fi {code} --- * [HBASE-23993](https://issues.apache.org/jira/browse/HBASE-23993) | *Major* | **Use loopback for zk standalone server in minizkcluster** MiniZKCluster now puts up its standalone node listening on loopback/127.0.0.1 rather than "localhost". --- * [HBASE-23986](https://issues.apache.org/jira/browse/HBASE-23986) | *Major* | **Bump hadoop-two.version to 2.10.0 on master and branch-2** Bumped hadoop-two.version to 2.10.0, which means we will drop the support for hadoop-2.8.x and hadoop-2.9.x. --- * [HBASE-23930](https://issues.apache.org/jira/browse/HBASE-23930) | *Minor* | **Shell should attempt to format \`timestamp\` attributes as ISO-8601** Change timestamp display to be ISO8601 when toString on Cell and outputting in shell.... User used to see.... column=table:state, timestamp=1583967620343 ..... ... but now sees: column=table:state, timestamp=2020-03-11T23:00:20.343Z .... --- * [HBASE-22827](https://issues.apache.org/jira/browse/HBASE-22827) | *Major* | **Expose multi-region merge in shell and Admin API** merge\_region shell command can now be used to merge more than 2 regions as well. It takes a list of regions as comma separated values or as an array of regions, and not just 2 regions. The full regionnames and encoded regionnames are continued to be accepted. --- * [HBASE-23767](https://issues.apache.org/jira/browse/HBASE-23767) | *Major* | **Add JDK11 compilation and unit test support to Github precommit** Rebuild our Dockerfile with support for multiple JDK versions. Use multiple stages in the Jenkinsfile instead of yetus's multijdk because of YETUS-953. Run those multiple stages in parallel to speed up results. Note that multiple stages means multiple Yetus invocations means multiple comments on the PreCommit. This should become more obvious to users once we can make use of GitHub Checks API, HBASE-23902. --- * [HBASE-22978](https://issues.apache.org/jira/browse/HBASE-22978) | *Minor* | **Online slow response log** get\_slowlog\_responses and clear\_slowlog\_responses are used to retrieve and clear slow RPC logs from RingBuffer maintained by RegionServers. New Admin APIs: 1. List\ getSlowLogResponses(final Set\ serverNames, final SlowLogQueryFilter slowLogQueryFilter) throws IOException; 2. List\ clearSlowLogResponses(final Set\ serverNames) throws IOException; Configs: 1. hbase.regionserver.slowlog.ringbuffer.size: Default size of ringbuffer to be maintained by each RegionServer in order to store online slowlog responses. This is an in-memory ring buffer of requests that were judged to be too slow in addition to the responseTooSlow logging. The in-memory representation would be complete. For more details, please look into Doc Section: Get Slow Response Log from shell Default 256 2. hbase.regionserver.slowlog.buffer.enabled: Indicates whether RegionServers have ring buffer running for storing Online Slow logs in FIFO manner with limited entries. The size of the ring buffer is indicated by config: hbase.regionserver.slowlog.ringbuffer.size The default value is false, turn this on and get latest slowlog responses with complete data. Default false For more details, please look into "Get Slow Response Log from shell" section from HBase book. --- * [HBASE-23926](https://issues.apache.org/jira/browse/HBASE-23926) | *Major* | **[Flakey Tests] Down the flakies re-run ferocity; it makes for too many fails.** Down the flakey re-rerun fork count from 1.0C -- i.e. a fork per CPU -- to 0.25C. On a recent run, the machine had 16 cores. 0.25 is 4 cores. We'd hardcoded fork count at 3 previous to changes made by parent. --- * [HBASE-23146](https://issues.apache.org/jira/browse/HBASE-23146) | *Major* | **Support CheckAndMutate with multiple conditions** Add a checkAndMutate(row, filter) method in the AsyncTable interface and the Table interface. This method atomically checks if the row matches the specified filter. If it does, it adds the Put/Delete/RowMutations. This is a fluent style API, the code is like: For Table interface: {code} table.checkAndMutate(row, filter).thenPut(put); {code} For AsyncTable interface: {code} table.checkAndMutate(row, filter).thenPut(put) .thenAccept(succ -\> { if (succ) { System.out.println("Check and put succeeded"); } else { System.out.println("Check and put failed"); } }); {code} --- * [HBASE-23874](https://issues.apache.org/jira/browse/HBASE-23874) | *Minor* | **Move Jira-attached file precommit definition from script in Jenkins config to dev-support** The Jira Precommit job (https://builds.apache.org/job/PreCommit-HBASE-Build/) will now look for a file within the source tree (dev-support/jenkins\_precommit\_jira\_yetus.sh) instead of depending on a script section embedded in the job. --- * [HBASE-23865](https://issues.apache.org/jira/browse/HBASE-23865) | *Major* | **Up flakey history from 5 to 10** Changed flakey list reporting to show 5 rather than 10 items. Also changed the second and first part fort counts to be 1C rather than hardcoded 3. --- * [HBASE-23554](https://issues.apache.org/jira/browse/HBASE-23554) | *Major* | **Encoded regionname to regionname utility** Adds shell command regioninfo: hbase(main):001:0\> regioninfo '0e6aa5c19ae2b2627649dc7708ce27d0' {ENCODED =\> 0e6aa5c19ae2b2627649dc7708ce27d0, NAME =\> 'TestTable,,1575941375972.0e6aa5c19ae2b2627649dc7708ce27d0.', STARTKEY =\> '', ENDKEY =\> '00000000000000000000299441'} Took 0.4737 seconds --- * [HBASE-23350](https://issues.apache.org/jira/browse/HBASE-23350) | *Major* | **Make compaction files cacheonWrite configurable based on threshold** This JIRA adds a new configuration - \`hbase.rs.cachecompactedblocksonwrite.threshold\`. This configuration is the maximum total size (in bytes) of the compacted files below which the configuration \`hbase.rs.cachecompactedblocksonwrite\` is honoured. If the total size of the compacted fies exceeds this threshold, even when \`hbase.rs.cachecompactedblocksonwrite\` is enabled, the data blocks are not cached. Caching index and bloom blocks is not affected by this configuration (user configuration is always honoured). Default value of this configuration is Long.MAX\_VALUE. This means whatever the total size of the compacted files, it wil be cached. --- * [HBASE-17115](https://issues.apache.org/jira/browse/HBASE-17115) | *Major* | **HMaster/HRegion Info Server does not honour admin.acl** Implements authorization for the HBase Web UI by limiting access to certain endpoints which could be used to extract sensitive information from HBase. Access to these restricted endpoints can be limited to a group of administrators, identified either by a list of users (hbase.security.authentication.spnego.admin.users) or by a list of groups (hbase.security.authentication.spnego.admin.groups). By default, neither of these values are set which will preserve backwards compatibility (allowing all authenticated users to access all endpoints). Further, users who have sensitive information in the HBase service configuration can set hbase.security.authentication.ui.config.protected to true which will treat the configuration endpoint as a protected, admin-only resource. By default, all authenticated users may access the configuration endpoint. --- * [HBASE-23647](https://issues.apache.org/jira/browse/HBASE-23647) | *Major* | **Make MasterRegistry the default registry impl** Enables master based registry as the default registry used by clients to fetch connection metadata. Refer to the section "Master Registry" in the client documentation for more details and advantages of this implementation over the default Zookeeper based registry. Configuration parameter that controls the registry in use: `hbase.client.registry.impl` Where to set this: HBase client configuration (hbase-site.xml) Possible values: - `org.apache.hadoop.hbase.client.ZKConnectionRegistry` (For ZK based registry implementation) - `org.apache.hadoop.hbase.client.MasterRegistry` (New, for master based registry implementation) Notes on defaults: - For v3.0.0 and later, MasterRegistry is the default registry - For all releases in 2.x line, ZK based registry is the default. This feature has been back ported to 2.3.0 and later releases. MasterRegistry can be enabled by setting the following client configuration. ``` hbase.client.registry.impl org.apache.hadoop.hbase.client.MasterRegistry ``` --- * [HBASE-23069](https://issues.apache.org/jira/browse/HBASE-23069) | *Critical* | **periodic dependency bump for Sep 2019** caffeine: 2.6.2 =\> 2.8.1 commons-codec: 1.10 =\> 1.13 commons-io: 2.5 =\> 2.6 disrupter: 3.3.6 =\> 3.4.2 httpcore: 4.4.6 =\> 4.4.13 jackson: 2.9.10 =\> 2.10.1 jackson.databind: 2.9.10.1 =\> 2.10.1 jetty: 9.3.27.v20190418 =\> 9.3.28.v20191105 protobuf.plugin: 0.5.0 =\> 0.6.1 zookeeper: 3.4.10 =\> 3.4.14 slf4j: 1.7.25 =\> 1.7.30 rat: 0.12 =\> 0.13 asciidoctor: 1.5.5 =\> 1.5.8 asciidoctor.pdf: 1.5.0-alpha.15 =\> 1.5.0-rc.2 error-prone: 2.3.3 =\> 2.3.4 --- * [HBASE-23686](https://issues.apache.org/jira/browse/HBASE-23686) | *Major* | **Revert binary incompatible change and remove reflection** - Reverts a binary incompatible binary change for ByteRangeUtils - Usage of reflection inside CommonFSUtils removed --- * [HBASE-23347](https://issues.apache.org/jira/browse/HBASE-23347) | *Major* | **Pluggable RPC authentication** This change introduces an internal abstraction layer which allows for new SASL-based authentication mechanisms to be used inside HBase services. All existing SASL-based authentication mechanism were ported to the new abstraction, making no external change in runtime semantics, client API, or RPC serialization format. Developers familiar with extending HBase can implement authentication mechanism beyond simple Kerberos and DelegationTokens which authenticate HBase users against some other user database. HBase service authentication (Master to/from RegionServer) continue to operate solely over Kerberos. --- * [HBASE-23156](https://issues.apache.org/jira/browse/HBASE-23156) | *Major* | **start-hbase.sh failed with ClassNotFoundException when build with hadoop3** Introduce a new hbase-assembly/src/main/assembly/hadoop-three-compat.xml for build with hadoop 3.x. --- * [HBASE-23680](https://issues.apache.org/jira/browse/HBASE-23680) | *Major* | **RegionProcedureStore missing cleaning of hfile archive** Add a new config to hbase-default.xml \ \hbase.procedure.store.region.hfilecleaner.plugins\ \org.apache.hadoop.hbase.master.cleaner.TimeToLiveHFileCleaner\ \A comma-separated list of BaseHFileCleanerDelegate invoked by the RegionProcedureStore HFileCleaner service. These HFiles cleaners are called in order, so put the cleaner that prunes the most files in front. To implement your own BaseHFileCleanerDelegate, just put it in HBase's classpath and add the fully qualified class name here. Always add the above default hfile cleaners in the list as they will be overwritten in hbase-site.xml.\ \ It will share the same TTL with other HFileCleaners. And you can also implement your own cleaner and change this property to enable it. --- * [HBASE-23675](https://issues.apache.org/jira/browse/HBASE-23675) | *Minor* | **Move to Apache parent POM version 22** Updated parent pom to Apache version 22. --- * [HBASE-23679](https://issues.apache.org/jira/browse/HBASE-23679) | *Critical* | **FileSystem instance leaks due to bulk loads with Kerberos enabled** This issues fixes an issue with Bulk Loading on installations with Kerberos enabled and more than a single RegionServer. When multiple tables are involved in hosting a table's regions which are being bulk-loaded into, all but the RegionServer hosting the table's first Region will "leak" one DistributedFileSystem object onto the heap, never freeing that memory. Eventually, with enough bulk loads, this will create a situation for RegionServers where they have no free heap space and will either spend all time in JVM GC, lose their ZK session, or crash with an OutOfMemoryError. The only mitigation for this issue is to periodically restart RegionServers. All earlier versions of HBase 2.x are subject to this issue (2.0.x, \<=2.1.8, \<=2.2.3) --- * [HBASE-23286](https://issues.apache.org/jira/browse/HBASE-23286) | *Major* | **Improve MTTR: Split WAL to HFile** Add a new feature to improve MTTR which have 3 steps to failover: 1. Read WAL and write HFile to region’s column family’s recovered.hfiles directory. 2. Open region. 3. Bulkload the recovered.hfiles for every column family. Compared to DLS(distributed log split), this feature will reduce region open time significantly. Config hbase.wal.split.to.hfile to true to enable this featue. --- * [HBASE-23619](https://issues.apache.org/jira/browse/HBASE-23619) | *Trivial* | **Use built-in formatting for logging in hbase-zookeeper** Changed the logging in hbase-zookeeper to use built-in formatting --- * [HBASE-23628](https://issues.apache.org/jira/browse/HBASE-23628) | *Minor* | **Replace Apache Commons Digest Base64 with JDK8 Base64** From the PR: "Yes. The two create the same output... I just wrote a small test suite to increase my confidence on that. I generated many tens of millions of random byte patterns and compared the output of the two algorithms. They came back identical every time. "Just in case any inquiring minds would like to know, there is no longer an encoding required when generating the strings. The JDK implementation specifically specifies that strings returned are StandardCharsets.ISO\_8859\_1. This does not change anything because UTF8 and ISO\_8859 overlap for the limited character set (64 characters) the encoding uses." --- * [HBASE-23651](https://issues.apache.org/jira/browse/HBASE-23651) | *Major* | **Region balance throttling can be disabled** Set hbase.balancer.max.balancing to a int value which \<=0 will disable region balance throttling. --- * [HBASE-23588](https://issues.apache.org/jira/browse/HBASE-23588) | *Major* | **Cache index blocks and bloom blocks on write if CacheCompactedBlocksOnWrite is enabled** If cacheOnWrite is enabled during flush or compaction, index and bloom blocks(with data blocks) would be automatically cached during write. --- * [HBASE-23369](https://issues.apache.org/jira/browse/HBASE-23369) | *Major* | **Auto-close 'unknown' Regions reported as OPEN on RegionServers** If a RegionServer reports a Region as OPEN in disagreement with Master's status on the Region, the Master now tells the RegionServer to silently close the Region. --- * [HBASE-23596](https://issues.apache.org/jira/browse/HBASE-23596) | *Major* | **HBCKServerCrashProcedure can double assign** Makes it so the recently added HBCKServerCrashProcedure -- the SCP that gets invoked when an operator schedules an SCP via hbck2 scheduleRecoveries command -- now works the same as SCP EXCEPT if master knows nothing of the scheduled servername. In this latter case, HBCKSCP will do a full scan of hbase:meta looking for instances of the passed servername. If any found it will attempt cleanup of hbase:meta references by reassigning any found OPEN or OPENING and by closing any in CLOSING state. Used to fix instances of what the 'HBCK Report' page shows as 'Unknown Servers'. --- * [HBASE-23624](https://issues.apache.org/jira/browse/HBASE-23624) | *Major* | **Add a tool to dump the procedure info in HFile** Use ./hbase org.apache.hadoop.hbase.procedure2.store.region.HFileProcedurePrettyPrinter to run the tool. --- * [HBASE-23590](https://issues.apache.org/jira/browse/HBASE-23590) | *Major* | **Update maxStoreFileRefCount to maxCompactedStoreFileRefCount** RegionsRecoveryChore introduced as part of HBASE-22460 tries to reopen regions based on config: hbase.regions.recovery.store.file.ref.count. Region reopen needs to take into consideration all compacted away store files that belong to the region and not store files(non-compacted). Fixed this bug as part of this Jira. Updated description for corresponding configs: 1. hbase.master.regions.recovery.check.interval : Regions Recovery Chore interval in milliseconds. This chore keeps running at this interval to find all regions with configurable max store file ref count and reopens them. Defaults to 20 mins 2. hbase.regions.recovery.store.file.ref.count : Very large number of ref count on a compacted store file indicates that it is a ref leak on that object(compacted store file). Such files can not be removed after it is invalidated via compaction. Only way to recover in such scenario is to reopen the region which can release all resources, like the refcount, leases, etc. This config represents Store files Ref Count threshold value considered for reopening regions. Any region with compacted store files ref count \> this value would be eligible for reopening by master. Here, we get the max refCount among all refCounts on all compacted away store files that belong to a particular region. Default value -1 indicates this feature is turned off. Only positive integer value should be provided to enable this feature. --- * [HBASE-23618](https://issues.apache.org/jira/browse/HBASE-23618) | *Major* | **Add a tool to dump procedure info in the WAL file** Use ./hbase org.apache.hadoop.hbase.procedure2.store.region.WALProcedurePrettyPrinter to run the tool. --- * [HBASE-23617](https://issues.apache.org/jira/browse/HBASE-23617) | *Major* | **Add a stress test tool for region based procedure store** Use ./hbase org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStorePerformanceEvaluation to run the tool. --- * [HBASE-23326](https://issues.apache.org/jira/browse/HBASE-23326) | *Critical* | **Implement a ProcedureStore which stores procedures in a HRegion** Use a region based procedure store to replace the old customized WAL based procedure store. The procedure data migration is done automatically during upgrading. After upgrading, the MasterProcWALs directory will be deleted and a new MasterProc directory will be created. And notice that a region will still write WAL so we still have WAL files and they will be moved to the oldWALs directory. The file name is mostly like a normal WAL file, and the only difference is that it is ended with "$masterproc$". --- * [HBASE-23320](https://issues.apache.org/jira/browse/HBASE-23320) | *Major* | **Upgrade surefire plugin to 3.0.0-M4** Bumped surefire plugin to 3.0.0-M4 --- * [HBASE-20461](https://issues.apache.org/jira/browse/HBASE-20461) | *Major* | **Implement fsync for AsyncFSWAL** Now AsyncFSWAL also supports Durability.FSYNC\_WAL. --- * [HBASE-23066](https://issues.apache.org/jira/browse/HBASE-23066) | *Minor* | **Create a config that forces to cache blocks on compaction** The configuration 'hbase.rs.cacheblocksonwrite' was used to enable caching the blocks on write. But purposefully we were not caching the blocks when we do compaction (since it may be very aggressive) as the caching happens as and when the writer completes a block. In cloud environments since they have bigger sized caches - though they try to enable 'hbase.rs.prefetchblocksonopen' (non - aggressive way of caching the blocks proactively on reader creation) it does not help them because it takes time to cache the compacted blocks. This feature creates a new configuration 'hbase.rs.cachecompactedblocksonwrite' which when set to 'true' will enable the blocks created out of compaction. Remember that since it is aggressive caching the user should be having enough cache space - if not it may lead to other active blocks getting evicted. From the shell this can be enabled by using the option per Column Family also by using the below format {code} create 't1', 'f1', {NUMREGIONS =\> 15, SPLITALGO =\> 'HexStringSplit', CONFIGURATION =\> {'hbase.rs.cachecompactedblocksonwrite' =\> 'true'}} {code} --- * [HBASE-23239](https://issues.apache.org/jira/browse/HBASE-23239) | *Major* | **Reporting on status of backing MOB files from client-facing cells** Users of the MOB feature can now use the `mobrefs` utility to get statistics about data in the MOB system and verify the health of backing files on HDFS. ``` HADOOP_CLASSPATH=/etc/hbase/conf:$(hbase mapredcp) yarn jar \ /some/path/to/hbase-shaded-mapreduce.jar mobrefs mobrefs-report-output some_table foo ``` See javadocs of the class `MobRefReporter` for more details. the reference guide has added some information about MOB internals and troubleshooting. --- * [HBASE-23549](https://issues.apache.org/jira/browse/HBASE-23549) | *Minor* | **Document steps to disable MOB for a column family** The reference guide now includes a walk through of disabling the MOB feature if needed while maintaining availability. --- * [HBASE-23582](https://issues.apache.org/jira/browse/HBASE-23582) | *Minor* | **Unbalanced braces in string representation of table descriptor** Fixed unbalanced braces in string representation within HBase shell --- * [HBASE-23293](https://issues.apache.org/jira/browse/HBASE-23293) | *Minor* | **[REPLICATION] make ship edits timeout configurable** The default rpc timeout for ReplicationSourceShipper#shipEdits is 60s, when bulkload replication enabled, timeout exception may be occurred. Now we can conf the timeout value through replication.source.shipedits.timeout, and it’s adaptive. --- * [HBASE-23312](https://issues.apache.org/jira/browse/HBASE-23312) | *Major* | **HBase Thrift SPNEGO configs (HBASE-19852) should be backwards compatible** The newer HBase Thrift SPNEGO configs should not be required. The hbase.thrift.spnego.keytab.file and hbase.thrift.spnego.principal configs will fall back to the hbase.thrift.keytab.file and hbase.thrift.kerberos.principal original configs. The older configs will log a deprecation warning. It is preferred to new the newer SPNEGO configurations. --- * [HBASE-22969](https://issues.apache.org/jira/browse/HBASE-22969) | *Minor* | **A new binary component comparator(BinaryComponentComparator) to perform comparison of arbitrary length and position** With BinaryComponentCompartor applications will be able to design diverse and powerful set of filters for rows and columns. See https://issues.apache.org/jira/browse/HBASE-22969 for example. In general, the comparator can be used with any filter taking ByteArrayComparable. As of now, following filters take ByteArrayComparable: 1. RowFilter 2. ValueFilter 3. QualifierFilter 4. FamilyFilter 5. ColumnValueFilter --- * [HBASE-23234](https://issues.apache.org/jira/browse/HBASE-23234) | *Major* | **Provide .editorconfig based on checkstyle configuration** Adds a .editorconfig file with configurations populated by IntelliJ, based on our checkstyle configuration. There's lots of IntelliJ-specific configs in here that I assume are not replicated to Eclipse or Netbeans users. Any devs using those tools should push whatever updates they see fit, but please start with the checkstyle configs as the origin of truth. --- * [HBASE-23322](https://issues.apache.org/jira/browse/HBASE-23322) | *Minor* | **[hbck2] Simplification on HBCKSCP scheduling** An hbck2 scheduleRecoveries will run a subclass of ServerCrashProcedure which asks Master what Regions were on the dead Server but it will also do a hbase:meta table scan to see if any vestiges of the old Server remain (for the case where an SCP failed mid-point leaving references in place or where Master and hbase:meta deviated in accounting). --- * [HBASE-23321](https://issues.apache.org/jira/browse/HBASE-23321) | *Minor* | **[hbck2] fixHoles of fixMeta doesn't update in-memory state** If holes in hbase:meta, hbck2 fixMeta now will update Master in-memory state so you do not need to restart master just so you can assign the new hole-bridging regions. --- * [HBASE-23282](https://issues.apache.org/jira/browse/HBASE-23282) | *Major* | **HBCKServerCrashProcedure for 'Unknown Servers'** hbck2 scheduleRecoveries will now run a SCP that also looks in hbase:meta for any references to the scheduled server -- not just consult Master in-memory state -- just in case vestiges of the server are leftover in hbase:meta --- * [HBASE-19450](https://issues.apache.org/jira/browse/HBASE-19450) | *Minor* | **Add log about average execution time for ScheduledChore** HBase internal chores now log a moving average of how long execution of each chore takes at `INFO` level for the logger `org.apache.hadoop.hbase.ScheduledChore`. Such messages will happen at most once per five minutes. --- * [HBASE-23250](https://issues.apache.org/jira/browse/HBASE-23250) | *Minor* | **Log message about CleanerChore delegate initialization should be at INFO** CleanerChore delegate initialization is now logged at INFO level instead of DEBUG --- * [HBASE-23243](https://issues.apache.org/jira/browse/HBASE-23243) | *Major* | **[pv2] Filter out SUCCESS procedures; on decent-sized cluster, plethora overwhelms problems** The 'Procedures & Locks' tab in Master UI only displays problematic Procedures now (RUNNABLE, WAITING-TIMEOUT, etc.). It no longer notes procedures whose state is SUCCESS. --- * [HBASE-23227](https://issues.apache.org/jira/browse/HBASE-23227) | *Blocker* | **Upgrade jackson-databind to 2.9.10.1 to avoid recent CVEs** the Apache HBase REST Proxy now uses Jackson Databind version 2.9.10.1 to address the following CVEs - CVE-2019-16942 - CVE-2019-16943 Users of prior releases with Jackson Databind 2.9.10 are advised to either upgrade to this release or to upgrade their local Jackson Databind jar directly. --- * [HBASE-23222](https://issues.apache.org/jira/browse/HBASE-23222) | *Critical* | **Better logging and mitigation for MOB compaction failures** The MOB compaction process in the HBase Master now logs more about its activity. In the event that you run into the problems described in HBASE-22075, there is a new HFileCleanerDelegate that will stop all removal of MOB hfiles from the archive area. It can be configured by adding `org.apache.hadoop.hbase.mob.ManualMobMaintHFileCleaner` to the list configured for `hbase.master.hfilecleaner.plugins`. This new cleaner delegate will cause your archive area to grow unbounded; you will have to manually prune files which may be prohibitively complex. Consider if your use case will allow you to mitigate by disabling mob compactions instead. Caveats: * Be sure the list of cleaner delegates still includes the default cleaners you will likely need: ttl, snapshot, and hlink. * Be mindful that if you enable this cleaner delegate then there will be *no* automated process for removing these mob hfiles. You should see a single region per table in `%hbase_root%/archive` that accumulates files over time. You will have to determine which of these files are safe or not to remove. * You should list this cleaner delegate after the snapshot and hlink delegates so that you can enable sufficient logging to determine when an archived mob hfile is needed by those subsystems. When set to `TRACE` logging, the CleanerChore logger will include archive retention decision justifications. * If your use case creates a large number of uniquely named tables, this new delegate will cause memory pressure on the master. --- * [HBASE-15519](https://issues.apache.org/jira/browse/HBASE-15519) | *Major* | **Add per-user metrics** Adds per-user metrics for reads/writes to each RegionServer. These metrics are exported by default. hbase.regionserver.user.metrics.enabled can be used to disable the feature if desired for any reason. --- * [HBASE-22460](https://issues.apache.org/jira/browse/HBASE-22460) | *Minor* | **Reopen a region if store reader references may have leaked** Leaked store files can not be removed even after it is invalidated via compaction. A reasonable mitigation for a reader reference leak would be a fast reopen of the region on the same server. Configs: 1. hbase.master.regions.recovery.check.interval : Regions Recovery Chore interval in milliseconds. This chore keeps running at this interval to find all regions with configurable max store file ref count and reopens them. Defaults to 20 mins 2. hbase.regions.recovery.store.file.ref.count : This config represents Store files Ref Count threshold value considered for reopening regions. Any region with store files ref count \> this value would be eligible for reopening by master. Default value -1 indicates this feature is turned off. Only positive integer value should be provided to enable this feature. --- * [HBASE-23172](https://issues.apache.org/jira/browse/HBASE-23172) | *Minor* | **HBase Canary region success count metrics reflect column family successes, not region successes** Added a comment to make clear that read/write success counts are tallying column family success counts, not region success counts. Additionally, the region read and write latencies previously only stored the latencies of the last column family of the region reads/writes. This has been fixed by using a map of each region to a list of read and write latency values. --- * [HBASE-23177](https://issues.apache.org/jira/browse/HBASE-23177) | *Major* | **If fail to open reference because FNFE, make it plain it is a Reference** Changes the message on the FNFE exception thrown when the file a Reference points to is missing; the message now includes detail on Reference as well as pointed-to file so can connect how FNFE relates to region open. --- * [HBASE-20626](https://issues.apache.org/jira/browse/HBASE-20626) | *Major* | **Change the value of "Requests Per Second" on WEBUI** Use 'totalRowActionRequestCount' to calculate QPS on web UI. --- * [HBASE-22874](https://issues.apache.org/jira/browse/HBASE-22874) | *Critical* | **Define a public interface for Canary and move existing implementation to LimitedPrivate** Downstream users who wish to programmatically check the health of their HBase cluster may now rely on a public interface derived from the previously private implementation of the canary cli tool. The interface is named `Canary` and can be found in the user facing javadocs. Downstream users who previously relied on the invoking the canary via the Java classname (either on the command line or programmatically) will need to change how they do so because the non-public implementation has moved. --- * [HBASE-23035](https://issues.apache.org/jira/browse/HBASE-23035) | *Major* | **Retain region to the last RegionServer make the failover slower** Since 2.0.0,when one regionserver crashed and back online again, AssignmentManager will retain the region locations and try assign the regions to this regionserver(same host:port with the crashed one) again. But for 1.x.x, the behavior is round-robin assignment for the regions belong to the crashed regionserver. This jira change the "retain" assignment to round-robin assignment, which is same with 1.x.x version. This change will make the failover faster and improve availability. --- * [HBASE-23046](https://issues.apache.org/jira/browse/HBASE-23046) | *Minor* | **Remove compatibility case from truncate command** Remove backward compatibility from \`truncate\` and \`truncate\_preserve\` shell commands. This means that these commands from HBase Clients are not compatible with pre-0.99 HBase clusters. --- * [HBASE-23040](https://issues.apache.org/jira/browse/HBASE-23040) | *Minor* | **region mover gives NullPointerException instead of saying a host isn't in the cluster** giving the region mover "unload" command a region server name that isn't recognized by the cluster results in a "I don't know about that host" message instead of a NPE. set log level to DEBUG if you'd like the region mover to log the set of region server names it got back from the cluster. --- * [HBASE-21874](https://issues.apache.org/jira/browse/HBASE-21874) | *Major* | **Bucket cache on Persistent memory** Added a new IOEngine type for Bucket cache ie Persistent memory. In order to use BC over pmem configure IOEngine as \ \hbase.bucketcache.ioengine\ \ pmem:///path in persistent memory \ \ --- * [HBASE-22760](https://issues.apache.org/jira/browse/HBASE-22760) | *Major* | **Stop/Resume Snapshot Auto-Cleanup activity with shell command** By default, snapshot auto cleanup based on TTL would be enabled for any new cluster. At any point in time, if snapshot cleanup is supposed to be stopped due to some snapshot restore activity or any other reason, it is advisable to disable it using shell command: hbase\> snapshot\_cleanup\_switch false We can re-enable it using: hbase\> snapshot\_cleanup\_switch true We can query whether snapshot auto cleanup is enabled for cluster using: hbase\> snapshot\_cleanup\_enabled --- * [HBASE-22796](https://issues.apache.org/jira/browse/HBASE-22796) | *Major* | **[HBCK2] Add fix of overlaps to fixMeta hbck Service** Adds fix of overlaps to the fixMeta hbck service method. Uses the bulk-merge facility. Merges a max of 10 at a time. Set hbase.master.metafixer.max.merge.count to higher if you want to do more than 10 in the one go. --- * [HBASE-21745](https://issues.apache.org/jira/browse/HBASE-21745) | *Critical* | **Make HBCK2 be able to fix issues other than region assignment** This issue adds via its subtasks: \* An 'HBCK Report' page to the Master UI added by HBASE-22527+HBASE-22709+HBASE-22723+ (since 2.1.6, 2.2.1, 2.3.0). Lists consistency or anomalies found via new hbase:meta consistency checking extensions added to CatalogJanitor (holes, overlaps, bad servers) and by a new 'HBCK chore' that runs at a lesser periodicity that will note filesystem orphans and overlaps as well as the following conditions: \*\* Master thought this region opened, but no regionserver reported it. \*\* Master thought this region opened on Server1, but regionserver reported Server2 \*\* More than one regionservers reported opened this region Both chores can be triggered from the shell to regenerate ‘new’ reports. \* Means of scheduling a ServerCrashProcedure (HBASE-21393). \* An ‘offline’ hbase:meta rebuild (HBASE-22680). \* Offline replace of hbase.version and hbase.id \* Documentation on how to use completebulkload tool to ‘adopt’ orphaned data found by new HBCK2 ‘filesystem’ check (see below) and ‘HBCK chore’ (HBASE-22859) \* A ‘holes’ and ‘overlaps’ fix that runs in the master that uses new bulk-merge facility to collapse many overlaps in the one go. \* hbase-operator-tools HBCK2 client tool got a bunch of additions: \*\* A specialized 'fix' for the case where operators ran old hbck 'offlinemeta' repair and destroyed their hbase:meta; it ties together holes in meta with orphaned data in the fs (HBASE-22567) \*\* A ‘filesystem’ command that reports on orphan data as well as bad references and hlinks with a ‘fix’ for the latter two options (based on hbck1 facility updated). \*\* Adds back the ‘replication’ fix facility from hbck1 (HBASE-22717) The compound result is that hbck2 is now in excess of hbck1 abilities. The provided functionality is disaggregated as per the hbck2 philosophy of providing 'plumbing' rather than 'porcelain' so there is work to do still adding fix-it playbooks, scripting across outages, and automation. --- * [HBASE-22802](https://issues.apache.org/jira/browse/HBASE-22802) | *Major* | **Avoid temp ByteBuffer allocation in FileIOEngine#read** HBASE-21879 introduces a utility class (org.apache.hadoop.hbase.io.ByteBuffAllocator) used for allocating/freeing ByteBuffers from/to NIO ByteBuffer pool, when BucketCache enabled with file or mmap engine, we will use this ByteBuffer pool to avoid temp ByteBuffer allocation a lot. --- * [HBASE-11062](https://issues.apache.org/jira/browse/HBASE-11062) | *Major* | **hbtop** Introduces hbtop that's a real-time monitoring tool for HBase like Unix's top command. See the ref guide for the details: https://hbase.apache.org/book.html#hbtop --- * [HBASE-21879](https://issues.apache.org/jira/browse/HBASE-21879) | *Major* | **Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose** Before this issue, read path was 100% offheap when block is in the BucketCache. But if a cache miss, then the RS needs to read the block via an on-heap API which causes high young-GC pressure. This issue adds reading the block via offheap even if reading the block from filesystem directly. It requires hadoop version(\>=2.9.3) but can also work with older hadoop versions (all works but we continue to read block onheap). It also requires HBASE-21946 which is not yet in place as of this writing/hbase-2.3.0. We have written a careful doc about the implementation, performance and practice here: https://docs.google.com/document/d/1xSy9axGxafoH-Qc17zbD2Bd--rWjjI00xTWQZ8ZwI\_E/edit#heading=h.nch5d72p27ex --- * [HBASE-22618](https://issues.apache.org/jira/browse/HBASE-22618) | *Major* | **added the possibility to load custom cost functions** Extends `StochasticLoadBalancer` to support user-provided cost function. These are loaded in addition to the default set of cost functions. Custom function implementations must extend `StochasticLoadBalancer$CostFunction`. Enable any additional functions by placing them on the master class path and configuring `hbase.master.balancer.stochastic.additionalCostFunctions` with a comma-separated list of fully-qualified class names. --- * [HBASE-22867](https://issues.apache.org/jira/browse/HBASE-22867) | *Critical* | **The ForkJoinPool in CleanerChore will spawn thousands of threads in our cluster with thousands table** Replace the ForkJoinPool in CleanerChore by ThreadPoolExecutor which can limit the spawn thread size and avoid the master GC frequently. The replacement is an internal implementation in CleanerChore, so no config key change, the upstream users can just upgrade the hbase master without any other change. --- * [HBASE-22810](https://issues.apache.org/jira/browse/HBASE-22810) | *Major* | **Initialize an separate ThreadPoolExecutor for taking/restoring snapshot** Introduced a new config key for the snapshot taking/restoring operations at master side: hbase.master.executor.snapshot.threads, its default value is 3. means we can have 3 snapshot operations running at the same time. --- * [HBASE-22863](https://issues.apache.org/jira/browse/HBASE-22863) | *Major* | **Avoid Jackson versions and dependencies with known CVEs** 1. Stopped exposing vulnerable Jackson1 dependencies so that downstreamers would not pull it in from HBase. 2. However, since Hadoop requires some Jackson1 dependencies, put vulnerable Jackson mapper at test scope in some HBase modules and hence, HBase tarball created by hbase-assembly contains Jackson1 mapper jar in lib. Still, downsteam applications can't pull in Jackson1 from HBase. --- * [HBASE-22841](https://issues.apache.org/jira/browse/HBASE-22841) | *Major* | **TimeRange's factory functions do not support ranges, only \`allTime\` and \`at\`** Add serveral API in TimeRange class for avoiding using the deprecated TimeRange constructor: \* TimeRange#from: Represents the time interval [minStamp, Long.MAX\_VALUE) \* TimeRange#until: Represents the time interval [0, maxStamp) \* TimeRange#between: Represents the time interval [minStamp, maxStamp) --- * [HBASE-22833](https://issues.apache.org/jira/browse/HBASE-22833) | *Minor* | **MultiRowRangeFilter should provide a method for creating a filter which is functionally equivalent to multiple prefix filters** Provide a public method in MultiRowRangeFilter class to speed the requirement of filtering with multiple row prefixes, it will expand the row prefixes as multiple rowkey ranges by MultiRowRangeFilter, it's more efficient. {code} public MultiRowRangeFilter(byte[][] rowKeyPrefixes); {code} --- * [HBASE-22856](https://issues.apache.org/jira/browse/HBASE-22856) | *Major* | **HBASE-Find-Flaky-Tests fails with pip error** Update the base docker image to ubuntu 18.04 for the find flaky tests jenkins job. --- * [HBASE-22771](https://issues.apache.org/jira/browse/HBASE-22771) | *Major* | **[HBCK2] fixMeta method and server-side support** Adds a fixMeta method to hbck Service. Fixes holes in hbase:meta. Follow-up to fix overlaps. See HBASE-22567 also. Follow-on is adding a client-side to hbase-operator-tools that can exploit this new addition (HBASE-22825) --- * [HBASE-22777](https://issues.apache.org/jira/browse/HBASE-22777) | *Major* | **Add a multi-region merge (for fixing overlaps, etc.)** Changes merge so you can merge more than two regions at a time. Currently only available inside HBase. HBASE-22827, a follow-on, is about exposing the facility in the Admin API (and then via the shell). --- * [HBASE-15666](https://issues.apache.org/jira/browse/HBASE-15666) | *Critical* | **shaded dependencies for hbase-testing-util** New shaded artifact for testing: hbase-shaded-testing-util. --- * [HBASE-22776](https://issues.apache.org/jira/browse/HBASE-22776) | *Major* | **Rename config names in user scan snapshot feature** After HBASE-22776, the steps to config user scan snapshot feature is as followings: 1. Check HDFS configuration 2. Add master coprocessor: hbase.coprocessor.master.classes= “org.apache.hadoop.hbase.security.access.AccessController, org.apache.hadoop.hbase.security.access.SnapshotScannerHDFSAclController” 3. Enable this feature: hbase.acl.sync.to.hdfs.enable=true 4. Modify table scheme to enable this feature for a table: alter 't1', CONFIGURATION =\> {'hbase.acl.sync.to.hdfs.enable' =\> 'true'} --- * [HBASE-22539](https://issues.apache.org/jira/browse/HBASE-22539) | *Blocker* | **WAL corruption due to early DBBs re-use when Durability.ASYNC\_WAL is used** We found a critical bug which can lead to WAL corruption when Durability.ASYNC\_WAL is used. The reason is that we release a ByteBuffer before actually persist the content into WAL file. The problem maybe lead to several errors, for example, ArrayIndexOfOutBounds when replaying WAL. This is because that the ByteBuffer is reused by others. ERROR org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while processing event RS\_LOG\_REPLAY java.lang.ArrayIndexOutOfBoundsException: 18056 at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1365) at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1358) at org.apache.hadoop.hbase.PrivateCellUtil.matchingFamily(PrivateCellUtil.java:735) at org.apache.hadoop.hbase.CellUtil.matchingFamily(CellUtil.java:816) at org.apache.hadoop.hbase.wal.WALEdit.isMetaEditFamily(WALEdit.java:143) at org.apache.hadoop.hbase.wal.WALEdit.isMetaEdit(WALEdit.java:148) at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:297) at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:195) at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:100) And may even cause segmentation fault and crash the JVM directly. You will see a hs\_err\_pidXXX.log file and usually the problem is SIGSEGV. This is usually because that the ByteBuffer has already been returned to the OS and used for other purpose. The problem has been reported several times in the past and this time Wellington Ramos Chevreuil provided the full logs and deeply analyzed the logs so we can find the root cause. And Lijin Bin figured out that the problem may only happen when Durability.ASYNC\_WAL is used. Thanks to them. The problem only effects the 2.x releases, all users are highly recommand to upgrade to a release which has this fix in, especially that if you use Durability.ASYNC\_WAL. --- * [HBASE-22737](https://issues.apache.org/jira/browse/HBASE-22737) | *Major* | **Add a new admin method and shell cmd to trigger the hbck chore to run** Add a new method runHbckChore in Hbck interface and a new shell cmd hbck\_chore\_run to request HBCK chore to run at master side. --- * [HBASE-22741](https://issues.apache.org/jira/browse/HBASE-22741) | *Major* | **Show catalogjanitor consistency complaints in new 'HBCK Report' page** Adds a "CatalogJanitor hbase:meta Consistency Issues" section to the new 'HBCK Report' page added by HBASE-22709. This section is empty unless the most recent CatalogJanitor scan turned up problems. If so, will show table of issues found. --- * [HBASE-22723](https://issues.apache.org/jira/browse/HBASE-22723) | *Major* | **Have CatalogJanitor report holes and overlaps; i.e. problems it sees when doing its regular scan of hbase:meta** When CatalogJanitor runs, it now checks for holes, overlaps, empty info:regioninfo columns and bad servers. Dumps findings into log. Follow-up adds report to new 'HBCK Report' linked off the Master UI. NOTE: All features but the badserver check made it into branch-2.1 and branch-2.0 backports. --- * [HBASE-22714](https://issues.apache.org/jira/browse/HBASE-22714) | *Trivial* | **BuffferedMutatorParams opertationTimeOut() is misspelt** The misspelled BufferedMutatorParams.opertationTimeout method has been marked as deprecated, and will be removed in 4.0.0. Please use the BufferedMutatorParams.operationTimeout method instead. --- * [HBASE-22580](https://issues.apache.org/jira/browse/HBASE-22580) | *Major* | **Add a table attribute to make user scan snapshot feature configurable for table** If a table user scan snapshots of the table, please config the following table scheme attribute to make granted users' ACLs are added to hfiles: alter 't1', CONFIGURATION =\> {'hbase.user.scan.snapshot.enable' =\> 'true'} --- * [HBASE-22709](https://issues.apache.org/jira/browse/HBASE-22709) | *Major* | **Add a chore thread in master to do hbck checking and display results in 'HBCK Report' page** 1. Add a new chore thread in master to do hbck checking 2. Add a new web ui "HBCK Report" page to display checking results. This feature is enabled by default. And the hbck chore run per 60 minutes by default. You can config "hbase.master.hbck.checker.interval" to a value lesser than or equal to 0 for disabling the chore. Notice: the config "hbase.master.hbck.checker.interval" was renamed to "hbase.master.hbck.chore.interval" in HBASE-22737. --- * [HBASE-22578](https://issues.apache.org/jira/browse/HBASE-22578) | *Major* | **HFileCleaner should not delete empty ns/table directories used for user san snapshot feature** The HFileCleaner will clean the empty directories under archive, but if enable user scan snaphot feature, the user ACLs are set at there directories, so please config the following cleaner to make the directories with user ACLs not be cleaned: hbase.master.hfilecleaner.plugins=org.apache.hadoop.hbase.security.access.SnapshotScannerHDFSAclCleaner --- * [HBASE-22722](https://issues.apache.org/jira/browse/HBASE-22722) | *Blocker* | **Upgrade jackson databind dependencies to 2.9.9.1** Upgrade jackson databind dependency to 2.9.9.1 due to CVEs https://nvd.nist.gov/vuln/detail/CVE-2019-12814 https://nvd.nist.gov/vuln/detail/CVE-2019-12384 --- * [HBASE-22527](https://issues.apache.org/jira/browse/HBASE-22527) | *Major* | **[hbck2] Add a master web ui to show the problematic regions** Add a new master web UI to show the potentially problematic opened regions. There are three case: 1. Master thought this region opened, but no regionserver reported it. 2. Master thought this region opened on Server1, but regionserver reported Server2 3. More than one regionservers reported opened this region --- * [HBASE-22648](https://issues.apache.org/jira/browse/HBASE-22648) | *Minor* | **Snapshot TTL** Feature: Take a Snapshot With TTL for auto-cleanup Attribute: 1. TTL - Specify TTL in sec while creating snapshot. e.g. snapshot 'mytable', 'snapshot1234', {TTL =\> 86400} (snapshot to be auto-cleaned after 24 hr) Configs: 1. Default Snapshot TTL: - FOREVER by default - User specified Default TTL(sec) with config: hbase.master.snapshot.ttl 2. If Snapshot cleanup is supposed to be stopped due to some snapshot restore activity, disable it with config: - hbase.master.cleaner.snapshot.disable: "true" With this config, HMaster needs restart just like any other hbase-site config. For more details, see the section "Take a Snapshot With TTL" in the HBase Reference Guide. --- * [HBASE-22610](https://issues.apache.org/jira/browse/HBASE-22610) | *Trivial* | **[BucketCache] Rename "hbase.offheapcache.minblocksize"** The config point "hbase.offheapcache.minblocksize" was wrong and is now deprecated. The new config point is "hbase.blockcache.minblocksize". --- * [HBASE-22690](https://issues.apache.org/jira/browse/HBASE-22690) | *Major* | **Deprecate / Remove OfflineMetaRepair in hbase-2+** OfflineMetaRepair is no longer supported in HBase-2+. Please refer to https://hbase.apache.org/book.html#HBCK2 This tool is deprecated in 2.x and will be removed in 3.0. --- * [HBASE-22673](https://issues.apache.org/jira/browse/HBASE-22673) | *Major* | **Avoid to expose protobuf stuff in Hbck interface** Mark the Hbck#scheduleServerCrashProcedure(List\ serverNames) as deprecated. Use Hbck#scheduleServerCrashProcedures(List\ serverNames) instead. --- * [HBASE-22617](https://issues.apache.org/jira/browse/HBASE-22617) | *Blocker* | **Recovered WAL directories not getting cleaned up** In HBASE-20734 we moved the recovered.edits onto the wal file system but when constructing the directory we missed the BASE\_NAMESPACE\_DIR('data'). So when using the default config, you will find that there are lots of new directories at the same level with the 'data' directory. In this issue, we add the BASE\_NAMESPACE\_DIR back, and also try our best to clean up the wrong directories. But we can only clean up the region level directories, so if you want a clean fs layout on HDFS you still need to manually delete the empty directories at the same level with 'data'. The effect versions are 2.2.0, 2.1.[1-5], 1.4.[8-10], 1.3.[3-5]. --- * [HBASE-21995](https://issues.apache.org/jira/browse/HBASE-21995) | *Major* | **Add a coprocessor to set HDFS ACL for hbase granted user** Add a coprocessor to set HDFS acls to make hbase granted users with READ permission have the access to scan snapshots. To use this feature, please make sure the HDFS config is set: dfs.namenode.acls.enabled=true fs.permissions.umask-mode=027 and set the HBase config: hbase.coprocessor.master.classes="org.apache.hadoop.hbase.security.access.AccessController,org.apache.hadoop.hbase.security.access.SnapshotScannerHDFSAclController" hbase.user.scan.snapshot.enable=true --- * [HBASE-22596](https://issues.apache.org/jira/browse/HBASE-22596) | *Minor* | **[Chore] Separate the execution period between CompactionChecker and PeriodicMemStoreFlusher** hbase.regionserver.compaction.check.period is used for controlling how often the compaction checker runs. If unset, will use hbase.server.thread.wakefrequency as default value. hbase.regionserver.flush.check.period is used for controlling how ofter the flush checker runs. If unset, will use hbase.server.thread.wakefrequency as default value. --- * [HBASE-22588](https://issues.apache.org/jira/browse/HBASE-22588) | *Major* | **Upgrade jaxws-ri dependency to 2.3.2** When run with JDK11 HBase now uses more recent version of the jaxws reference implementation (v2.3.2). --- * [HBASE-21536](https://issues.apache.org/jira/browse/HBASE-21536) | *Trivial* | **Fix completebulkload usage instructions** Added completebulkload short name for BulkLoadHFilesTool to bin/hbase. --- * [HBASE-22500](https://issues.apache.org/jira/browse/HBASE-22500) | *Blocker* | **Modify pom and jenkins jobs for hadoop versions** Change the default hadoop-3 version to 3.1.2. Drop the support for the releases which are effected by CVE-2018-8029, see this email https://lists.apache.org/thread.html/3d6831c3893cd27b6850aea2feff7d536888286d588e703c6ffd2e82@%3Cuser.hadoop.apache.org%3E --- * [HBASE-22459](https://issues.apache.org/jira/browse/HBASE-22459) | *Minor* | **Expose store reader reference count** This change exposes the aggregate count of store reader references for a given store as 'storeRefCount' in region metrics and ClusterStatus. --- * [HBASE-22469](https://issues.apache.org/jira/browse/HBASE-22469) | *Minor* | **replace md5 checksum in saveVersion script with sha512 for hbase version information** The HBase "source checksum" now uses SHA512 instead of MD5. --- * [HBASE-22148](https://issues.apache.org/jira/browse/HBASE-22148) | *Blocker* | **Provide an alternative to CellUtil.setTimestamp** The `CellUtil.setTimestamp` method changes to be an API with audience `LimitedPrivate(COPROC)` in HBase 3.0. With that designation the API should remain stable within a given minor release line, but may change between minor releases. Previously, this method was deprecated in HBase 2.0 for removal in HBase 3.0. Deprecation messages in HBase 2.y releases have been updated to indicate the expected API audience change. --- * [HBASE-20782](https://issues.apache.org/jira/browse/HBASE-20782) | *Minor* | **Fix duplication of TestServletFilter.access** The access method was used to the HttpServerFunctionalTest class as a common place. --- * [HBASE-21991](https://issues.apache.org/jira/browse/HBASE-21991) | *Major* | **Fix MetaMetrics issues - [Race condition, Faulty remove logic], few improvements** The class LossyCounting was unintentionally marked Public but was never intended to be part of our public API. This oversight has been corrected and LossyCounting is now marked as Private and going forward may be subject to additional breaking changes or removal without notice. If you have taken a dependency on this class we recommend cloning it locally into your project before upgrading to this release. --- * [HBASE-22226](https://issues.apache.org/jira/browse/HBASE-22226) | *Trivial* | **Incorrect level for headings in asciidoc** Warnings for level headings are corrected in the book for the HBase Incompatibilities section. --- * [HBASE-20970](https://issues.apache.org/jira/browse/HBASE-20970) | *Major* | **Update hadoop check versions for hadoop3 in hbase-personality** Add hadoop 3.0.3, 3.1.1 3.1.2 in our hadoop check jobs. --- * [HBASE-21784](https://issues.apache.org/jira/browse/HBASE-21784) | *Major* | **Dump replication queue should show list of wal files ordered chronologically** The DumpReplicationQueues tool will now list replication queues sorted in chronological order. --- * [HBASE-21048](https://issues.apache.org/jira/browse/HBASE-21048) | *Major* | **Get LogLevel is not working from console in secure environment** Support get\|set LogLevel in secure(kerberized) environment. --- * [HBASE-22384](https://issues.apache.org/jira/browse/HBASE-22384) | *Minor* | **Formatting issues in administration section of book** Fixes a formatting issue in the administration section of the book, where listing indentation were a little bit off. --- * [HBASE-22377](https://issues.apache.org/jira/browse/HBASE-22377) | *Major* | **Provide API to check the existence of a namespace which does not require ADMIN permissions** This change adds the new method listNamespaces to the Admin interface, which can be used to retrieve a list of the namespaces present in the schema as an unprivileged operation. Formerly the only available method for accomplishing this was listNamespaceDescriptors, which requires GLOBAL CREATE or ADMIN permissions. --- * [HBASE-22399](https://issues.apache.org/jira/browse/HBASE-22399) | *Major* | **Change default hadoop-two.version to 2.8.x and remove the 2.7.x hadoop checks** Now the default hadoop-two.version has been changed to 2.8.5, and all hadoop versions before 2.8.2(exclude) will not be supported any more. --- * [HBASE-22392](https://issues.apache.org/jira/browse/HBASE-22392) | *Trivial* | **Remove extra/useless +** Removed extra + in HRegion, HStore and LoadIncrementalHFiles for branch-2 and HRegion and HStore for branch-1. --- * [HBASE-20494](https://issues.apache.org/jira/browse/HBASE-20494) | *Major* | **Upgrade com.yammer.metrics dependency** Updated metrics core from 3.2.1 to 3.2.6. --- * [HBASE-22358](https://issues.apache.org/jira/browse/HBASE-22358) | *Minor* | **Change rubocop configuration for method length** The rubocop definition for the maximum method length was set to 75. --- * [HBASE-22379](https://issues.apache.org/jira/browse/HBASE-22379) | *Minor* | **Fix Markdown for "Voting on Release Candidates" in book** Fixes the formatting of the "Voting on Release Candidates" to actually show the quote and code formatting of the RAT check. --- * [HBASE-20851](https://issues.apache.org/jira/browse/HBASE-20851) | *Minor* | **Change rubocop config for max line length of 100** The rubocop configuration in the hbase-shell module now allows a line length with 100 characters, instead of 80 as before. For everything before 2.1.5 this change introduces rubocop itself. --- * [HBASE-22301](https://issues.apache.org/jira/browse/HBASE-22301) | *Minor* | **Consider rolling the WAL if the HDFS write pipeline is slow** This change adds new conditions for rolling the WAL for when syncs on the HDFS writer pipeline are perceived to be slow. As before the configuration parameter hbase.regionserver.wal.slowsync.ms sets the slow sync warning threshold. If we encounter hbase.regionserver.wal.slowsync.roll.threshold number of slow syncs (default 100) within the interval defined by hbase.regionserver.wal.slowsync.roll.interval.ms (default 1 minute), we will request a WAL roll. Or, if the time for any sync exceeds the threshold set by hbase.regionserver.wal.roll.on.sync.ms (default 10 seconds) we will request a WAL roll immediately. Operators can monitor how often these new thresholds result in a WAL roll by looking at newly added metrics to the WAL related metric group: \* slowSyncRollRequest - How many times a roll was requested due to sync too slow on the write pipeline. Additionally, as a part of this change there are also additional metrics for existing reasons for a WAL roll: \* errorRollRequest - How many times a roll was requested due to I/O or other errors. \* sizeRollRequest - How many times a roll was requested due to file size roll threshold. --- * [HBASE-21883](https://issues.apache.org/jira/browse/HBASE-21883) | *Minor* | **Enhancements to Major Compaction tool** MajorCompactorTTL Tool allows to compact all regions in a table that have been TTLed out. This saves space on DFS and is useful for tables which are similar to time series data. This is typically scheduled to run frequently (say via cron) to cleanup old data on an ongoing basis. RSGroupMajorCompactionTTL tool is similar to MajorCompactorTTL but runs at a region server group level. If multiple tables in an rsgroup are similar to time-series data, then it runs a single command to clean them up. As more tables are added/removed from rsgroup, it's easy to have a single command to take care of all of them. --- * [HBASE-22054](https://issues.apache.org/jira/browse/HBASE-22054) | *Minor* | **Space Quota: Compaction is not working for super user in case of NO\_WRITES\_COMPACTIONS** This change allows the system and superusers to initiate compactions, even when a space quota violation policy disallows compactions from happening. The original intent behind disallowing of compactions was to prevent end-user compactions from creating undue I/O load, not disallowing \*any\* compaction in the system. --- * [HBASE-22083](https://issues.apache.org/jira/browse/HBASE-22083) | *Minor* | **move eclipse specific configs into a profile** Maven project integration for Eclipse has been isolated into a maven profile to ensure it only is active when in an Eclipse project. Things should continue to behave the same for Eclipse users. If something should go wrong folks should manually activate the `eclipse-specific` profile. --- * [HBASE-22307](https://issues.apache.org/jira/browse/HBASE-22307) | *Major* | **Deprecated Preemptive Fail Fast** Deprecated Preemptive Fail Fast related constants in HConstants, the support of this feature will be removed in 3.0.0 so use these constants will have no effect for 3.0.0+ releases. And the constants will be kept till 4.0.0. Users can use 'hbase.client.perserver.requests.threshold' to control the number of concurrent requests to the same region server. Please see the release note of HBASE-16388 for more details. --- * [HBASE-22292](https://issues.apache.org/jira/browse/HBASE-22292) | *Blocker* | **PreemptiveFastFailInterceptor clean repeatedFailuresMap issue** Adds new configuration hbase.client.failure.map.cleanup.interval which defaults to ten minutes. --- * [HBASE-19222](https://issues.apache.org/jira/browse/HBASE-19222) | *Major* | **update jruby to 9.1.17.0** The default version of JRuby shipped with HBase has been updated to the JRuby 9.1.17.0 release. For details on changes see [the release notes for JRuby 9.1.17.0](https://www.jruby.org/2018/04/23/jruby-9-1-17-0) --- * [HBASE-22279](https://issues.apache.org/jira/browse/HBASE-22279) | *Major* | **Add a getRegionLocator method in Table/AsyncTable interface** Add below method in Table interface: RegionLocator getRegionLocator() throws IOException; Add below methods in AsyncTable interface: AsyncTableRegionLocator getRegionLocator(); CompletableFuture\ getDescriptor(); --- * [HBASE-15560](https://issues.apache.org/jira/browse/HBASE-15560) | *Major* | **TinyLFU-based BlockCache** LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and recency of the working set. It achieves concurrency by using an O(n) background thread to prioritize the entries and evict. Accessing an entry is O(1) by a hash table lookup, recording its logical access time, and setting a frequency flag. A write is performed in O(1) time by updating the hash table and triggering an async eviction thread. This provides ideal concurrency and minimizes the latencies by penalizing the thread instead of the caller. However the policy does not age the frequencies and may not be resilient to various workload patterns. This change introduces a new L1 policy, TinyLfuBlockCache, which records the frequency in a counting sketch, ages periodically by halving the counters, and orders entries by SLRU. An entry is discarded by comparing the frequency of the new arrival to the SLRU's victim, and keeping the one with the highest frequency. This allows the operations to be performed in O(1) time and, though the use of a compact sketch, a much larger history is retained beyond the current working set. In a variety of real world traces the policy had near optimal hit rates. New configuration variable hfile.block.cache.policy sets the eviction policy for the L1 block cache. The default is "LRU" (LruBlockCache). Set to "TinyLFU" to use TinyLfuBlockCache instead. --- * [HBASE-22178](https://issues.apache.org/jira/browse/HBASE-22178) | *Major* | **Introduce a createTableAsync with TableDescriptor method in Admin** Introduced Future\ createTableAsync(TableDescriptor); --- * [HBASE-22108](https://issues.apache.org/jira/browse/HBASE-22108) | *Major* | **Avoid passing null in Admin methods** Introduced these methods: void move(byte[]); void move(byte[], ServerName); Future\ splitRegionAsync(byte[]); These methods are deprecated: void move(byte[], byte[]) --- * [HBASE-22152](https://issues.apache.org/jira/browse/HBASE-22152) | *Major* | **Create a jenkins file for yetus to processing GitHub PR** Add a new jenkins file for running pre commit check for GitHub PR. --- * [HBASE-22007](https://issues.apache.org/jira/browse/HBASE-22007) | *Major* | **Add restoreSnapshot and cloneSnapshot with acl methods in AsyncAdmin** Add cloneSnapshot/restoreSnapshot with acl methods in AsyncAdmin. --- * [HBASE-22123](https://issues.apache.org/jira/browse/HBASE-22123) | *Minor* | **REST gateway reports Insufficient permissions exceptions as 404 Not Found** When insufficient permissions, you now get: HTTP/1.1 403 Forbidden on the HTTP side, and in the message Forbidden org.apache.hadoop.hbase.security.AccessDeniedException: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions for user ‘myuser',action: get, tableName:mytable, family:cf. at org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor.authorizeAccess(RangerAuthorizationCoprocessor.java:547) and the rest of the ADE stack --- * [HBASE-22100](https://issues.apache.org/jira/browse/HBASE-22100) | *Minor* | **False positive for error prone warnings in pre commit job** Now we will sort the javac WARNING/ERROR before generating diff in pre-commit so we can get a stable output for the error prone. The downside is that we just sort the output lexicographically so the line number will also be sorted lexicographically, which is a bit strange to human. --- * [HBASE-22057](https://issues.apache.org/jira/browse/HBASE-22057) | *Major* | **Impose upper-bound on size of ZK ops sent in a single multi()** Exposes a new configuration property "zookeeper.multi.max.size" which dictates the maximum size of deletes that HBase will make to ZooKeeper in a single RPC. This property defaults to 1MB, which should fall beneath the default ZooKeeper limit of 2MB, controlled by "jute.maxbuffer". --- * [HBASE-22052](https://issues.apache.org/jira/browse/HBASE-22052) | *Major* | **pom cleaning; filter out jersey-core in hadoop2 to match hadoop3 and remove redunant version specifications** Fixed awkward dependency issue that prevented site building. #### note specific to HBase 2.1.4 HBase 2.1.4 shipped with an early version of this fix that incorrectly altered the libraries included in our binary assembly for using Apache Hadoop 2.7 (the current build default Hadoop version for 2.1.z). For folks running out of the box against a Hadoop 2.7 cluster (or folks who skip the installation step of [replacing the bundled Hadoop libraries](http://hbase.apache.org/book.html#hadoop)) this will result in a failure at Region Server startup due to a missing class definition. e.g.: ``` 2019-03-27 09:02:05,779 ERROR [main] regionserver.HRegionServer: Failed construction RegionServer java.lang.NoClassDefFoundError: org/apache/htrace/SamplerBuilder at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:644) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:628) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2701) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2683) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:372) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:171) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:356) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.hadoop.hbase.util.CommonFSUtils.getRootDir(CommonFSUtils.java:362) at org.apache.hadoop.hbase.util.CommonFSUtils.isValidWALRootDir(CommonFSUtils.java:411) at org.apache.hadoop.hbase.util.CommonFSUtils.getWALRootDir(CommonFSUtils.java:387) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeFileSystem(HRegionServer.java:704) at org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:613) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:3029) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:63) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149) at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:3047) Caused by: java.lang.ClassNotFoundException: org.apache.htrace.SamplerBuilder at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 26 more ``` Workaround via any _one_ of the following: * If you are running against a Hadoop cluster that is 2.8+, ensure you replace the Hadoop libaries in the default binary assembly with those for your version. * If you are running against a Hadoop cluster that is 2.8+, build the binary assembly from the source release while specifying your Hadoop version. * If you are running against a Hadoop cluster that is a supported 2.7 release, ensure the `hadoop` executable is in the `PATH` seen at Region Server startup and that you are not using the `HBASE_DISABLE_HADOOP_CLASSPATH_LOOKUP` bypass. * For any supported Hadoop version, manually make the Apache HTrace artifact `htrace-core-3.1.0-incubating.jar` available to all Region Servers via the HBASE_CLASSPATH environment variable. * For any supported Hadoop version, manually make the Apache HTrace artifact `htrace-core-3.1.0-incubating.jar` available to all Region Servers by copying it into the directory `${HBASE_HOME}/lib/client-facing-thirdparty/`. --- * [HBASE-22065](https://issues.apache.org/jira/browse/HBASE-22065) | *Major* | **Add listTableDescriptors(List\) method in AsyncAdmin** Add a listTableDescriptors(List\) method in the AsyncAdmin interface, to align with the Admin interface. --- * [HBASE-22063](https://issues.apache.org/jira/browse/HBASE-22063) | *Major* | **Deprecated Admin.deleteSnapshot(byte[])** Deprecate Admin.deleteSnapshot(byte[]), please use the String version instead. --- * [HBASE-22040](https://issues.apache.org/jira/browse/HBASE-22040) | *Major* | **Add mergeRegionsAsync with a List of region names method in AsyncAdmin** Add a mergeRegionsAsync(byte[][], boolean) method in the AsyncAdmin interface. Instead of using assert, now we will throw IllegalArgumentException when you want to merge less than 2 regions at client side. And also, at master side, instead of using assert, now we will throw DoNotRetryIOException if you want merge more than 2 regions, since we only support merging two regions at once for now. --- * [HBASE-22039](https://issues.apache.org/jira/browse/HBASE-22039) | *Major* | **Should add the synchronous parameter for the XXXSwitch method in AsyncAdmin** Add drainXXX parameter for balancerSwitch/splitSwitch/mergeSwitch methods in the AsyncAdmin interface, which has the same meaning with the synchronous parameter for these methods in the Admin interface. --- * [HBASE-22044](https://issues.apache.org/jira/browse/HBASE-22044) | *Major* | **ByteBufferUtils should not be IA.Public API** As of HBase 3.0, the ByteBufferUtils class is now marked as a Private API for internal project use only. Downstream users are advised that it no longer has any compatibility promises across releases. As of earlier HBase release lines the class is now marked as deprecated to call attention to this planned transition. --- * [HBASE-21810](https://issues.apache.org/jira/browse/HBASE-21810) | *Major* | **bulkload support set hfile compression on client** bulkload (HFileOutputFormat2) support config the compression on client ,you can set the job configuration "hbase.mapreduce.hfileoutputformat.compression" override the auto-detection of the target table's compression --- * [HBASE-22001](https://issues.apache.org/jira/browse/HBASE-22001) | *Major* | **Polish the Admin interface** Add a cloneSnapshotAsync method with restoreAcl parameter. Deprecated restoreSnapshotAsync method as it just ignores the failsafe configuration. Make snapshotAsync method returns a Future\. Deprecated the snapshot related methods which take a 'byte[]' as the snapshot name. Use default methods to reduce the code base for implementation classes. --- * [HBASE-22000](https://issues.apache.org/jira/browse/HBASE-22000) | *Major* | **Deprecated isTableAvailable with splitKeys** Deprecated AsyncTable.isTableAvailable(TableName, byte[][]). --- * [HBASE-21871](https://issues.apache.org/jira/browse/HBASE-21871) | *Major* | **Support to specify a peer table name in VerifyReplication tool** After HBASE-21871, we can specify a peer table name with --peerTableName in VerifyReplication tool like the following: hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication --peerTableName=peerTable 5 TestTable In addition, we can compare any 2 tables in any remote clusters with specifying both peerId and --peerTableName. For example: hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication --peerTableName=peerTable zk1,zk2,zk3:2181/hbase TestTable --- * [HBASE-15728](https://issues.apache.org/jira/browse/HBASE-15728) | *Major* | **Add remaining per-table region / store / flush / compaction related metrics** Adds below flush, split, and compaction metrics + // split related metrics + private MutableFastCounter splitRequest; + private MutableFastCounter splitSuccess; + private MetricHistogram splitTimeHisto; + + // flush related metrics + private MetricHistogram flushTimeHisto; + private MetricHistogram flushMemstoreSizeHisto; + private MetricHistogram flushOutputSizeHisto; + private MutableFastCounter flushedMemstoreBytes; + private MutableFastCounter flushedOutputBytes; + + // compaction related metrics + private MetricHistogram compactionTimeHisto; + private MetricHistogram compactionInputFileCountHisto; + private MetricHistogram compactionInputSizeHisto; + private MetricHistogram compactionOutputFileCountHisto; + private MetricHistogram compactionOutputSizeHisto; + private MutableFastCounter compactedInputBytes; + private MutableFastCounter compactedOutputBytes; + + private MetricHistogram majorCompactionTimeHisto; + private MetricHistogram majorCompactionInputFileCountHisto; + private MetricHistogram majorCompactionInputSizeHisto; + private MetricHistogram majorCompactionOutputFileCountHisto; + private MetricHistogram majorCompactionOutputSizeHisto; + private MutableFastCounter majorCompactedInputBytes; + private MutableFastCounter majorCompactedOutputBytes; --- * [HBASE-21481](https://issues.apache.org/jira/browse/HBASE-21481) | *Major* | **[acl] Superuser's permissions should not be granted or revoked by any non-su global admin** HBASE-21481 improves the quality of access control, by strengthening the protection of super users's privileges. --- * [HBASE-21082](https://issues.apache.org/jira/browse/HBASE-21082) | *Critical* | **Reimplement assign/unassign related procedure metrics** Now we have four types of RIT procedure metrics, assign, unassign, move, reopen. The meaning of assign/unassign is changed, as we will not increase the unassign metric and then the assign metric when moving a region. Also introduced two new procedure metrics, open and close, which are used to track the open/close region calls to region server. We may send open/close multiple times to finish a RIT since we may retry multiple times. --- * [HBASE-20724](https://issues.apache.org/jira/browse/HBASE-20724) | *Critical* | **Sometimes some compacted storefiles are still opened after region failover** Problem: This is an old problem since HBASE-2231. The compaction event marker was only writed to WAL. But after flush, the WAL may be archived, which means an useful compaction event marker be deleted, too. So the compacted store files cannot be archived when region open and replay WAL. Solution: After this jira, the compaction event tracker will be writed to HFile. When region open and load store files, read the compaction evnet tracker from HFile and archive the compacted store files which still exist. --- * [HBASE-21820](https://issues.apache.org/jira/browse/HBASE-21820) | *Major* | **Implement CLUSTER quota scope** HBase contains two quota scopes: MACHINE and CLUSTER. Before this patch, set quota operations did not expose scope option to client api and use MACHINE as default, CLUSTER scope can not be set and used. Shell commands are as follows: set\_quota, TYPE =\> THROTTLE, TABLE =\> 't1', LIMIT =\> '10req/sec' This issue implements CLUSTER scope in a simple way: For user, namespace, user over namespace quota, use [ClusterLimit / RSNum] as machine limit. For table and user over table quota, use [ClusterLimit / TotalTableRegionNum \* MachineTableRegionNum] as machine limit. After this patch, user can set CLUSTER scope quota, but MACHINE is still default if user ignore scope. Shell commands are as follows: set\_quota, TYPE =\> THROTTLE, TABLE =\> 't1', LIMIT =\> '10req/sec' set\_quota, TYPE =\> THROTTLE, TABLE =\> 't1', LIMIT =\> '10req/sec', SCOPE =\> MACHINE set\_quota, TYPE =\> THROTTLE, TABLE =\> 't1', LIMIT =\> '10req/sec', SCOPE =\> CLUSTER --- * [HBASE-21057](https://issues.apache.org/jira/browse/HBASE-21057) | *Minor* | **upgrade to latest spotbugs** Change spotbugs version to 3.1.11. --- * [HBASE-21505](https://issues.apache.org/jira/browse/HBASE-21505) | *Major* | **Several inconsistencies on information reported for Replication Sources by hbase shell status 'replication' command.** This modifies "status 'replication'" output, fixing inconsistencies on the reporting times and ages of last shipped edits, as well as wrong calculation of replication lags. It also introduces additional info for each recovery queue, which was not accounted by this command before. The new output for "status 'replication'" command is explained in details below: a) Source started, target stopped, no edits arrived on source yet: ... SOURCE: PeerID=1 Normal Queue: 1 No Ops shipped since last restart, SizeOfLogQueue=1, No edits for this source since it started, Replication Lag=0 ... b) Source started, target stopped, add edit on source: ... Normal Queue: 1 No Ops shipped since last restart, SizeOfLogQueue=1, TimeStampOfLastArrivedInSource=Wed Nov 21 07:21:00 GMT 2018, Replication Lag=2459 ... c) Source started, target stopped, edit added on source, restart source: ... SOURCE: PeerID=1 Normal Queue: 1 No Ops shipped since last restart, SizeOfLogQueue=1, No edits for this source since it started, Replication Lag=0 Recovered Queue: 1-hbase01.home,16020,1542784524057 No Ops shipped since last restart, SizeOfLogQueue=1, TimeStampOfLastArrivedInSource=Wed Nov 21 07:23:00 GMT 2018, Replication Lag=201495 ... d) Source started, target stopped, add edit on source, restart source, add another edit on source: ... SOURCE: PeerID=1 Normal Queue: 1 No Ops shipped since last restart, SizeOfLogQueue=1, TimeStampOfLastArrivedInSource=Wed Nov 21 07:02:28 GMT 2018, Replication Lag=6349 Recovered Queue: 1-hbase01.home,16020,1542782758742 No Ops shipped since last restart, SizeOfLogQueue=0, TimeStampOfLastArrivedInSource=Wed Nov 21 06:53:05 GMT 2018, Replication Lag=569394 ... e) Source started, target stopped, add edit on source, restart source, add another edit on source, start target: ... SOURCE: PeerID=1 Normal Queue: 1 AgeOfLastShippedOp=30000, TimeStampOfLastShippedOp=Wed Nov 21 07:07:58 GMT 2018, SizeOfLogQueue=1, TimeStampOfLastArrivedInSource=Wed Nov 21 07:02:28 GMT 2018, Replication Lag=0 ... f) Source started, target stopped, add edit on source, restart source, restart target: ... SOURCE: PeerID=1 Normal Queue: 1 No Ops shipped since last restart, SizeOfLogQueue=1, No edits for this source since it started, Replication Lag=0 ... --- * [HBASE-21922](https://issues.apache.org/jira/browse/HBASE-21922) | *Major* | **BloomContext#sanityCheck may failed when use ROWPREFIX\_DELIMITED bloom filter** Remove bloom filter type ROWPREFIX\_DELIMITED. May add it back when find a better solution. --- * [HBASE-21783](https://issues.apache.org/jira/browse/HBASE-21783) | *Major* | **Support exceed user/table/ns throttle quota if region server has available quota** Support enable or disable exceed throttle quota. Exceed throttle quota means, user can over consume user/namespace/table quota if region server has additional available quota because other users don't consume at the same time. Use the following shell commands to enable/disable exceed throttle quota: enable\_exceed\_throttle\_quota disable\_exceed\_throttle\_quota There are two limits when enable exceed throttle quota: 1. Must set at least one read and one write region server throttle quota; 2. All region server throttle quotas must be in seconds time unit. Because once previous requests exceed their quota and consume region server quota, quota in other time units may be refilled in a long time, this may affect later requests. --- * [HBASE-20587](https://issues.apache.org/jira/browse/HBASE-20587) | *Major* | **Replace Jackson with shaded thirdparty gson** Remove jackson dependencies from most hbase modules except hbase-rest, use shaded gson instead. The output json will be a bit different since jackson can use getter/setter, but gson will always use the fields. --- * [HBASE-21928](https://issues.apache.org/jira/browse/HBASE-21928) | *Major* | **Deprecated HConstants.META\_QOS** Mark HConstants.META\_QOS as deprecated. It is for internal use only, which is the highest priority. You should not try to set a priority greater than or equal to this value, although it is no harm but also useless. --- * [HBASE-17942](https://issues.apache.org/jira/browse/HBASE-17942) | *Major* | **Disable region splits and merges per table** This patch adds the ability to disable split and/or merge for a table (By default, split and merge are enabled for a table). --- * [HBASE-21636](https://issues.apache.org/jira/browse/HBASE-21636) | *Major* | **Enhance the shell scan command to support missing scanner specifications like ReadType, IsolationLevel etc.** Allows shell to set Scan options previously not exposed. See additions as part of the scan help by typing following hbase shell: hbase\> help 'scan' --- * [HBASE-21201](https://issues.apache.org/jira/browse/HBASE-21201) | *Major* | **Support to run VerifyReplication MR tool without peerid** We can specify peerQuorumAddress instead of peerId in VerifyReplication tool. So it no longer requires peerId to be setup when using this tool. For example: hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication zk1,zk2,zk3:2181/hbase testTable --- * [HBASE-21838](https://issues.apache.org/jira/browse/HBASE-21838) | *Major* | **Create a special ReplicationEndpoint just for verifying the WAL entries are fine** Introduce a VerifyWALEntriesReplicationEndpoint which replicates nothing but only verifies if all the cells are valid. It can be used to capture bugs for writing WAL, as most times we will not read the WALs again after writing it if there are no region server crashes. --- * [HBASE-21764](https://issues.apache.org/jira/browse/HBASE-21764) | *Major* | **Size of in-memory compaction thread pool should be configurable** Introduced an new config key in this issue: hbase.regionserver.inmemory.compaction.pool.size. the default value would be 10. you can configure this to set the pool size of in-memory compaction pool. Note that all memstores in one region server will share the same pool, so if you have many regions in one region server, you need to set this larger to compact faster for better read performance. --- * [HBASE-21684](https://issues.apache.org/jira/browse/HBASE-21684) | *Major* | **Throw DNRIOE when connection or rpc client is closed** Make StoppedRpcClientException extend DoNotRetryIOException. --- * [HBASE-21739](https://issues.apache.org/jira/browse/HBASE-21739) | *Major* | **Move grant/revoke from regionserver to master** To implement user permission control in Precedure V2, move grant and revoke method from AccessController to master firstly. Mark AccessController#grant and AccessController#revoke as deprecated and please use Admin#grant and Admin#revoke instead. --- * [HBASE-21791](https://issues.apache.org/jira/browse/HBASE-21791) | *Blocker* | **Upgrade thrift dependency to 0.12.0** IMPORTANT: Due to security issues, all users who use hbase thrift should avoid using releases which do not have this fix. The effect releases are: 2.1.x: 2.1.2 and below 2.0.x: 2.0.4 and below 1.x: 1.4.x and below If you are using the effect releases above, please consider upgrading to a newer release ASAP. --- * [HBASE-20894](https://issues.apache.org/jira/browse/HBASE-20894) | *Major* | **Move BucketCache from java serialization to protobuf** For users who have configured hbase.bucketcache.ioengine with either the file:, files:, or mmap: prefix, and configured it to be persistent via the hbase.bucketcache.persistent.path property, the serialization format of the bucket cache has changed between versions. The old state will not be read during startup, and there is currently no migration path. The impact is expected to be minimal, however, since the cache will rebuild over time as access patterns dictate. # HBASE 2.3.0 Release Notes These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements. --- * [HBASE-24631](https://issues.apache.org/jira/browse/HBASE-24631) | *Major* | **Loosen Dockerfile pinned package versions of the "debian-revision"** Update our package version numbers throughout the Dockerfiles to be pinned to their epic:upstream-version components only. Previously we'd specify the full debian package version number, including the debian-revision. This lead to instability as debian packaging details changed. See also [man deb-version](http://manpages.ubuntu.com/manpages/xenial/en/man5/deb-version.5.html) --- * [HBASE-24205](https://issues.apache.org/jira/browse/HBASE-24205) | *Major* | **Create metric to know the number of reads that happens from memstore** Adds a new metric where we collect the number of read requests (tracked per row) whether the row was fetched completely from memstore or it was pulled from files and memstore. The metric is now collected under the mbean for Tables and under the mbean for regions. Under table mbean ie.- 'name": "Hadoop:service=HBase,name=RegionServer,sub=Tables' The new metrics will be listed as {code} "Namespace\_default\_table\_t3\_columnfamily\_f1\_metric\_memstoreOnlyRowReadsCount": 5, "Namespace\_default\_table\_t3\_columnfamily\_f1\_metric\_mixedRowReadsCount": 1, {code} Where the format is Namespace\_\\_table\_\\_columnfamily\_\\_metric\_memstoreOnlyRowReadsCount Namespace\_\\_table\_\\_columnfamily\_\\_metric\_mixedRowReadsCount {code} The same one under the region ie. "name": "Hadoop:service=HBase,name=RegionServer,sub=Regions", comes as {code} "Namespace\_default\_table\_t3\_region\_75a7846f4ac4a2805071a855f7d0dbdc\_store\_f1\_metric\_memstoreOnlyRowReadsCount": 5, "Namespace\_default\_table\_t3\_region\_75a7846f4ac4a2805071a855f7d0dbdc\_store\_f1\_metric\_mixedRowReadsCount": 1, {code} where Namespace\_\\_region\_\\_store\_\\_metric\_memstoreOnlyRowReadsCount Namespace\_\\_region\_\\_store\_\\_metric\_mixedRowReadsCount This is also an aggregate against every store the number of reads that happened purely from the memstore or it was a mixed read that happened from memstore and file. --- * [HBASE-21773](https://issues.apache.org/jira/browse/HBASE-21773) | *Critical* | **rowcounter utility should respond to pleas for help** This adds [-h\|-help] options to rowcounter. Passing either -h or -help will print rowcounter guide as below: $hbase rowcounter -h usage: hbase rowcounter \ [options] [\ \...] Options: --starttime=\ starting time filter to start counting rows from. --endtime=\ end time filter limit, to only count rows up to this timestamp. --range=\ [startKey],[endKey][;[startKey],[endKey]...]] --expectedCount=\ expected number of rows to be count. For performance, consider the following configuration properties: -Dhbase.client.scanner.caching=100 -Dmapreduce.map.speculative=false --- * [HBASE-24217](https://issues.apache.org/jira/browse/HBASE-24217) | *Major* | **Add hadoop 3.2.x support** CI coverage has been extended to include Hadoop 3.2.x for HBase 2.2+. --- * [HBASE-23055](https://issues.apache.org/jira/browse/HBASE-23055) | *Major* | **Alter hbase:meta** Adds being able to edit hbase:meta table schema. For example, hbase(main):006:0\> alter 'hbase:meta', {NAME =\> 'info', DATA\_BLOCK\_ENCODING =\> 'ROW\_INDEX\_V1'} Updating all regions with the new schema... All regions updated. Done. Took 1.2138 seconds You can even add columnfamilies. Howevert, you cannot delete any of the core hbase:meta column families such as 'info' and 'table'. --- * [HBASE-15161](https://issues.apache.org/jira/browse/HBASE-15161) | *Major* | **Umbrella: Miscellaneous improvements from production usage** This ticket summarizes significant improvements and expansion to the metrics surface area. Interested users should review the individual sub-tasks. --- * [HBASE-24545](https://issues.apache.org/jira/browse/HBASE-24545) | *Major* | **Add backoff to SCP check on WAL split completion** Adds backoff in ServerCrashProcedure wait on WAL split to complete if large backlog of files to split (Its possible to avoid SCP blocking, waiting on WALs to split if you use procedure-based splitting -- set 'hbase.split.wal.zk.coordinated' to false to enable procedure based wal splitting.) --- * [HBASE-24524](https://issues.apache.org/jira/browse/HBASE-24524) | *Minor* | **SyncTable logging improvements** Notice this has changed log level for mismatching row keys, originally those were being logged at INFO level, now it's logged at DEBUG level. This is consistent with the logging of mismatching cells. Also, for missing row keys, it now logs row key values in human readable format, making it more meaningful for operators troubleshooting mismatches. --- * [HBASE-24359](https://issues.apache.org/jira/browse/HBASE-24359) | *Major* | **Optionally ignore edits for deleted CFs for replication.** Introduce a new config hbase.replication.drop.on.deleted.columnfamily, default is false. When config to true, the replication will drop the edits for columnfamily that has been deleted from the replication source and target. --- * [HBASE-24418](https://issues.apache.org/jira/browse/HBASE-24418) | *Major* | **Consolidate Normalizer implementations** This change extends the Normalizer with a handful of new configurations. The configuration points supported are: * `hbase.normalizer.split.enabled` Whether to split a region as part of normalization. Default: `true`. * `hbase.normalizer.merge.enabled` Whether to merge a region as part of normalization. Default `true`. * `hbase.normalizer.min.region.count` The minimum number of regions in a table to consider it for merge normalization. Default: 3. * `hbase.normalizer.merge.min_region_age.days` The minimum age for a region to be considered for a merge, in days. Default: 3. * `hbase.normalizer.merge.min_region_size.mb` The minimum size for a region to be considered for a merge, in whole MBs. Default: 1. --- * [HBASE-24309](https://issues.apache.org/jira/browse/HBASE-24309) | *Major* | **Avoid introducing log4j and slf4j-log4j dependencies for modules other than hbase-assembly** Add a hbase-logging module, put the log4j related code in this module only so other modules do not need to depend on log4j at compile scope. See the comments of Log4jUtils and InternalLog4jUtils for more details. Add a log4j.properties to the test jar of hbase-logging module, so for other sub modules we just need to depend on the test jar of hbase-logging module at test scope to output the log to console, without placing a log4j.properties in the test resources as they all (almost) have the same content. And this test module will not be included in the assembly tarball so it will not mess up the binary distribution. Ban direct commons-logging dependency, and ban commons-logging and log4j imports in non-test code, to avoid mess up the downstream users logging framework. In hbase-logging module we do need to use log4j classes and the trick is to use full class name. Add jcl-over-slf4j and jul-to-slf4j dependencies, as some of our dependencies use jcl or jul as logging framework, we should also redirect their log message to slf4j. --- * [HBASE-21406](https://issues.apache.org/jira/browse/HBASE-21406) | *Minor* | **"status 'replication'" should not show SINK if the cluster does not act as sink** Added new metric to differentiate sink startup time from last OP applied time. Original behaviour was to always set startup time to TimestampsOfLastAppliedOp, and always show it on "status 'replication'" command, regardless if the sink ever applied any OP. This was confusing, specially for scenarios where cluster was just acting as source, the output could lead to wrong interpretations about sink not applying edits or replication being stuck. With the new metric, we now compare the two metrics values, assuming that if both are the same, there's never been any OP shipped to the given sink, so output would reflect it more clearly, to something as for example: SINK: TimeStampStarted=Thu Dec 06 23:59:47 GMT 2018, Waiting for OPs... --- * [HBASE-24132](https://issues.apache.org/jira/browse/HBASE-24132) | *Major* | **Upgrade to Apache ZooKeeper 3.5.7** HBase ships ZooKeeper 3.5.x. Was the EOL'd 3.4.x. 3.5.x client can talk to 3.4.x ensemble. The ZooKeeper project has built a [FAQ](https://cwiki.apache.org/confluence/display/ZOOKEEPER/Upgrade+FAQ) that documents known issues and work-arounds when upgrading existing deployments. --- * [HBASE-22287](https://issues.apache.org/jira/browse/HBASE-22287) | *Major* | **inifinite retries on failed server in RSProcedureDispatcher** Add backoff. Avoid retrying every 100ms. --- * [HBASE-24425](https://issues.apache.org/jira/browse/HBASE-24425) | *Major* | **Run hbck\_chore\_run and catalogjanitor\_run on draw of 'HBCK Report' page** Runs 'catalogjanitor\_run' and 'hbck\_chore\_run' inline with the loading of the 'HBCK Report' page. Pass '?cache=true' to skip inline invocation of 'catalogjanitor\_run' and 'hbck\_chore\_run' drawing the page. --- * [HBASE-24408](https://issues.apache.org/jira/browse/HBASE-24408) | *Blocker* | **Introduce a general 'local region' to store data on master** Introduced a general 'local region' at master side to store the procedure data, etc. The hfile of this region will be stored on the root fs while the wal will be stored on the wal fs. This issue supercedes part of the code for HBASE-23326, as now we store the data in 'MasterData' directory instead of 'MasterProcs'. The old hfiles will be moved to the global hfile archived directory with the suffix $-masterlocalhfile-$. The wal files will be moved to the global old wal directory with the suffix $masterlocalwal$. The TimeToLiveMasterLocalStoreHFileCleaner and TimeToLiveMasterLocalStoreWALCleaner are configured by default for cleaning the old hfiles and wal files, and the default TTLs are both 7 days. --- * [HBASE-24115](https://issues.apache.org/jira/browse/HBASE-24115) | *Major* | **Relocate test-only REST "client" from src/ to test/ and mark Private** Relocate test-only REST RemoteHTable and RemoteAdmin from src/ to test/. And mark them as InterfaceAudience.Private. --- * [HBASE-23938](https://issues.apache.org/jira/browse/HBASE-23938) | *Major* | **Replicate slow/large RPC calls to HDFS** Config key: hbase.regionserver.slowlog.systable.enabled Default value: false This config can be enabled if hbase.regionserver.slowlog.buffer.enabled is already enabled. While hbase.regionserver.slowlog.buffer.enabled ensures that any slow/large RPC logs with complete details are written to ring buffer available at each RegionServer, hbase.regionserver.slowlog.systable.enabled would ensure that all such logs are also persisted in new system table hbase:slowlog. Operator can scan hbase:slowlog with filters to retrieve specific attribute matching records and this table would be useful to capture historical performance of slowness of RPC calls with detailed analysis. hbase:slowlog consists of single ColumnFamily info. info consists of multiple qualifiers similar to the attributes available to query as part of Admin API: get\_slowlog\_responses. One example of a row from hbase:slowlog scan result (Attached a sample screenshot in the Jira) : \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:call\_details, timestamp=2020-05-16T14:59:58.764Z, value=Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest) \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:client\_address, timestamp=2020-05-16T14:59:58.764Z, value=172.20.10.2:57348 \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:method\_name, timestamp=2020-05-16T14:59:58.764Z, value=Scan \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:param, timestamp=2020-05-16T14:59:58.764Z, value=region { type: REGION\_NAME value: "cluster\_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf." } scan { a ttribute { name: "\_isolationlevel\_" value: "\\x5C000" } start\_row: "cccccccc" time\_range { from: 0 to: 9223372036854775807 } max\_versions: 1 cache\_blocks: true max\_result\_size: 2 097152 caching: 2147483647 include\_stop\_row: false } number\_of\_rows: 2147483647 close\_scanner: false client\_handles\_partials: true client\_handles\_heartbeats: true track\_scan\_met rics: false \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:processing\_time, timestamp=2020-05-16T14:59:58.764Z, value=24 \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:queue\_time, timestamp=2020-05-16T14:59:58.764Z, value=0 \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:region\_name, timestamp=2020-05-16T14:59:58.764Z, value=cluster\_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf. \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:response\_size, timestamp=2020-05-16T14:59:58.764Z, value=211227 \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:server\_class, timestamp=2020-05-16T14:59:58.764Z, value=HRegionServer \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:start\_time, timestamp=2020-05-16T14:59:58.764Z, value=1589640743932 \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:type, timestamp=2020-05-16T14:59:58.764Z, value=ALL \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:username, timestamp=2020-05-16T14:59:58.764Z, value=vjasani --- * [HBASE-24271](https://issues.apache.org/jira/browse/HBASE-24271) | *Major* | **Set values in \`conf/hbase-site.xml\` that enable running on \`LocalFileSystem\` out of the box** HBASE-24271 makes changes the the default `conf/hbase-site.xml` such that `bin/hbase` will run directly out of the binary tarball or a compiled source tree without any configuration modifications vs. Hadoop 2.8+. This changes our long-standing history of shipping no configured values in `conf/hbase-site.xml`, so existing processes that assume this file is empty of configuration properties may require attention. --- * [HBASE-24310](https://issues.apache.org/jira/browse/HBASE-24310) | *Major* | **Use Slf4jRequestLog for hbase-http** Use Slf4jRequestLog instead of the log4j HttpRequestLogAppender in HttpServer. The request log is disabled by default in conf/log4j.properties by the following lines: # Disable request log by default, you can enable this by changing the appender log4j.category.http.requests=INFO,NullAppender log4j.additivity.http.requests=false Change the 'NullAppender' to what ever you want if you want to enable request log. Notice that, the logger name for master status http server is 'http.requests.master', and for region server it is 'http.requests.regionserver' --- * [HBASE-24335](https://issues.apache.org/jira/browse/HBASE-24335) | *Major* | **Support deleteall with ts but without column in shell mode** Use a empty string to represent no column specified for deleteall in shell mode. useage: deleteall 'test','r1','',12345 deleteall 'test', {ROWPREFIXFILTER =\> 'prefix'}, '', 12345 --- * [HBASE-24304](https://issues.apache.org/jira/browse/HBASE-24304) | *Major* | **Separate a hbase-asyncfs module** Added a new hbase-asyncfs module to hold the asynchronous dfs output stream implementation for implementing WAL. --- * [HBASE-22710](https://issues.apache.org/jira/browse/HBASE-22710) | *Major* | **Wrong result in one case of scan that use raw and versions and filter together** Make the logic of the versions chosen more reasonable for raw scan, to avoid lose result when using filter. --- * [HBASE-24285](https://issues.apache.org/jira/browse/HBASE-24285) | *Major* | **Move to hbase-thirdparty-3.3.0** Moved to hbase-thirdparty 3.3.0. --- * [HBASE-24252](https://issues.apache.org/jira/browse/HBASE-24252) | *Major* | **Implement proxyuser/doAs mechanism for hbase-http** This feature enables the HBase Web UI's to accept a 'proxyuser' via the HTTP Request's query string. When the parameter \`hbase.security.authentication.spnego.kerberos.proxyuser.enable\` is set to \`true\` in hbase-site.xml (default is \`false\`), the HBase UI will attempt to impersonate the user specified by the query parameter "doAs". This query parameter is checked case-insensitively. When this option is not provided, the user who executed the request is the "real" user and there is no ability to execute impersonation against the WebUI. For example, if the user "bob" with Kerberos credentials executes a request against the WebUI with this feature enabled and a query string which includes \`doAs=alice\`, the HBase UI will treat this request as executed as \`alice\`, not \`bob\`. The standard Hadoop proxyuser configuration properties to limit users who may impersonate others apply to this change (e.g. to enable \`bob\` to impersonate \`alice\`). See the Hadoop documentation for more information on how to configure these proxyuser rules. --- * [HBASE-24143](https://issues.apache.org/jira/browse/HBASE-24143) | *Major* | **[JDK11] Switch default garbage collector from CMS** `bin/hbase` will now dynamically select a Garbage Collector implementation based on the detected JVM version. JDKs 8,9,10 use `-XX:+UseConcMarkSweepGC`, while JDK11+ use `-XX:+UseG1GC`. Notice a slight compatibility change. Previously, the garbage collector choice would always be appended to a user-provided value for `HBASE_OPTS`. As of this change, this setting will only be applied when `HBASE_OPTS` is unset. That means that operators who provide a value for this variable will now need to also specify the collector. This is especially important for those on JDK8, where the vm default GC is not the recommended ConcMarkSweep. --- * [HBASE-24024](https://issues.apache.org/jira/browse/HBASE-24024) | *Major* | **Optionally reject multi() requests with very high no of rows** New Config: hbase.rpc.rows.size.threshold.reject ----------------------------------------------------------------------- Default value: false Description: If value is true, RegionServer will abort batch requests of Put/Delete with number of rows in a batch operation exceeding threshold defined by value of config: hbase.rpc.rows.warning.threshold. --- * [HBASE-24139](https://issues.apache.org/jira/browse/HBASE-24139) | *Critical* | **Balancer should avoid leaving idle region servers** StochasticLoadBalancer functional improvement: StochasticLoadBalancer would rebalance the cluster if there are any idle RegionServers in the cluster (RegionServer having no region), while other RegionServers have at least 1 region available. --- * [HBASE-24196](https://issues.apache.org/jira/browse/HBASE-24196) | *Major* | **[Shell] Add rename rsgroup command in hbase shell** user or admin can now use hbase shell \> rename\_rsgroup 'oldname', 'newname' to rename rsgroup. --- * [HBASE-24218](https://issues.apache.org/jira/browse/HBASE-24218) | *Major* | **Add hadoop 3.2.x in hadoop check** Add hadoop-3.2.0 and hadoop-3.2.1 in hadoop check and when '--quick-hadoopcheck' we will only check hadoop-3.2.1. Notice that, for aligning the personality scripts across all the active branches, we will commit the patch to all active branches, but the hadoop-3.2.x support in hadoopcheck is only applied to branch-2.2+. --- * [HBASE-23829](https://issues.apache.org/jira/browse/HBASE-23829) | *Major* | **Get \`-PrunSmallTests\` passing on JDK11** \`-PrunSmallTests\` now pass on JDK11 when using \`-Phadoop.profile=3.0\`. --- * [HBASE-24185](https://issues.apache.org/jira/browse/HBASE-24185) | *Major* | **Junit tests do not behave well with System.exit or Runtime.halt or JVM exits in general.** Tests that fail because a process -- RegionServer or Master -- called System.exit, will now instead throw an exception. --- * [HBASE-24072](https://issues.apache.org/jira/browse/HBASE-24072) | *Major* | **Nightlies reporting OutOfMemoryError: unable to create new native thread** Hadoop hosts have had their ulimit -u raised from 10000 to 30000 (per user, by INFRA). The Docker build container has had its limit raised from 10000 to 12500. --- * [HBASE-24112](https://issues.apache.org/jira/browse/HBASE-24112) | *Major* | **[RSGroup] Support renaming rsgroup** Support RSGroup renaming in core codebase. New API Admin#renameRSGroup(String, String) is introduced in 3.0.0. --- * [HBASE-23994](https://issues.apache.org/jira/browse/HBASE-23994) | *Trivial* | ** Add WebUI to Canary** The Canary tool now offers a WebUI when run in `region` mode (the default mode). It is enabled by default, and by default, it binds to `0.0.0.0:16050`. This can be overridden by setting `hbase.canary.info.bindAddress` and `hbase.canary.info.port`. To disable entirely, set the port to `-1`. --- * [HBASE-23779](https://issues.apache.org/jira/browse/HBASE-23779) | *Major* | **Up the default fork count to make builds complete faster; make count relative to CPU count** Pass --threads=2 building on jenkins. It shortens nightly build times by about ~25%. It works by running module build/test in parallel when dependencies allow. Upping the forkcount beyond the pom default of 0.25C would have us broach our CPU budget on jenkins when two modules are running in parallel (2 modules at 0.25% of CPU each makes 0.5C and on jenkins, hadoop nodes run two jenkins executors per host). Higher forkcounts also seems to threaten build stability. For running tests locally, to go faster, up fork count. $ x="0.5C" ; mvn --threads=2 -Dsurefire.firstPartForkCount=$x -Dsurefire.secondPartForkCount=$x test -PrunAllTests You could up the x from 0.5C to 1.0C but YMMV (On overcommitted hardware, tests start bombing out pretty soon after startup). You could try upping thread count but on occasion are likely to overcommit hardware. --- * [HBASE-24126](https://issues.apache.org/jira/browse/HBASE-24126) | *Major* | **Up the container nproc uplimit from 10000 to 12500** Start docker with upped ulimit for nproc passing '--ulimit nproc=12500'. It was 10000, the default, but made it 12500. Then, set PROC\_LIMIT in hbase-personality so when yetus runs, it is w/ the new 12500 value. --- * [HBASE-24150](https://issues.apache.org/jira/browse/HBASE-24150) | *Major* | **Allow module tests run in parallel** Pass -T2 to mvn. Makes it so we do two modules-at-a-time dependencies willing. Helps speed build and testing. Doubles the resource usage when running modules in parallel. --- * [HBASE-24121](https://issues.apache.org/jira/browse/HBASE-24121) | *Major* | **[Authorization] ServiceAuthorizationManager isn't dynamically updatable. And it should be.** Master & RegionService now support refresh policy authorization defined in hbase-policy.xml without restarting service. To refresh policy, please execute hbase shell command: update\_config or update\_config\_all after policy file updated and synced on all nodes. --- * [HBASE-24099](https://issues.apache.org/jira/browse/HBASE-24099) | *Major* | **Use a fair ReentrantReadWriteLock for the region close lock** This change modifies the default acquisition policy for the region's close lock in order to prevent observed starvation of close requests. The new boolean configuration parameter 'hbase.regionserver.fair.region.close.lock' controls the lock acquisition policy: if true, the lock is created in fair mode (default); if false, the lock is created in nonfair mode (the old default). --- * [HBASE-23153](https://issues.apache.org/jira/browse/HBASE-23153) | *Major* | **PrimaryRegionCountSkewCostFunction SLB function should implement CostFunction#isNeeded** The `PrimaryRegionCountSkewCostFunction` for the `StochasticLoadBalancer` is only needed when the read replicas feature is enabled. With this change, that function now properly indicates that it is not needed when the read replica feature is off. If this improvement is not available, operators with clusters that are not using the read replica feature should manually disable it by setting `hbase.master.balancer.stochastic.primaryRegionCountCost` to `0.0` in hbase-site.xml for all HBase Masters. --- * [HBASE-24055](https://issues.apache.org/jira/browse/HBASE-24055) | *Major* | **Make AsyncFSWAL can run on EC cluster** Now AsyncFSWAL can also be used against the directory which has EC enabled. Need to make sure you also make use of the hadoop 3.x client as the option is only available in hadoop 3.x. --- * [HBASE-24113](https://issues.apache.org/jira/browse/HBASE-24113) | *Major* | **Upgrade the maven we use from 3.5.4 to 3.6.3 in nightlies** Branches-2.3+ use maven 3.5.3 building. Older branches use 3.5.4 still. --- * [HBASE-24122](https://issues.apache.org/jira/browse/HBASE-24122) | *Major* | **Change machine ulimit-l to ulimit-a so dumps full ulimit rather than just 'max locked memory'** Our 'Build Artifacts' have a machine directory under which we emit vitals on the host the build was run on. We used to emit the result of 'ulimit -l' as a file named 'ulimit-l'. This has been hijacked to instead emit result of running 'ulimit -a' which includes stat on ulimit -l. --- * [HBASE-23678](https://issues.apache.org/jira/browse/HBASE-23678) | *Major* | **Literate builder API for version management in schema** ColumnFamilyDescriptor new builder API: /\*\* \* Retain all versions for a given TTL(retentionInterval), and then only a specific number \* of versions(versionAfterInterval) after that interval elapses. \* \* @param retentionInterval Retain all versions for this interval \* @param versionAfterInterval Retain no of versions to retain after retentionInterval \*/ public ModifyableColumnFamilyDescriptor setVersionsWithTimeToLive( final int retentionInterval, final int versionAfterInterval) --- * [HBASE-24050](https://issues.apache.org/jira/browse/HBASE-24050) | *Major* | **Deprecated PBType on all 2.x branches** org.apache.hadoop.hbase.types.PBType is marked as deprecated without any replacement. It will be moved to hbase-example module and marked as IA.Private in 3.0.0. This is a mistake as it should not be part of our public API. Users who depend on this class should just copy the code your own code base. --- * [HBASE-8868](https://issues.apache.org/jira/browse/HBASE-8868) | *Minor* | **add metric to report client shortcircuit reads** Expose file system level read metrics for RegionServer. If the HBase RS runs on top of HDFS, calculate the aggregation of ReadStatistics of each HdfsFileInputStream. These metrics include: (1) total number of bytes read from HDFS. (2) total number of bytes read from local DataNode. (3) total number of bytes read locally through short-circuit read. (4) total number of bytes read locally through zero-copy read. Because HDFS ReadStatistics is calculated per input stream, it is not feasible to update the aggregated number in real time. Instead, the metrics are updated when an input stream is closed. --- * [HBASE-24032](https://issues.apache.org/jira/browse/HBASE-24032) | *Major* | **[RSGroup] Assign created tables to respective rsgroup automatically instead of manual operations** Admin can determine which tables go to which rsgroup by script (setting hbase.rsgroup.table.mapping.script with local filystem path) on Master side which aims to lighten the burden of admin operations. Note, since HBase 3+, rsgroup can be specified in TableDescriptor as well, if clients specify this, master will skip the determination from script. Here is a simple example of script: {code} # Input consists of two string, 1st is the namespace of the table, 2nd is the table name of the table #!/bin/bash namespace=$1 tablename=$2 if [[ $namespace == test ]]; then echo test elif [[ $tablename == \*foo\* ]]; then echo other else echo default fi {code} --- * [HBASE-23993](https://issues.apache.org/jira/browse/HBASE-23993) | *Major* | **Use loopback for zk standalone server in minizkcluster** MiniZKCluster now puts up its standalone node listening on loopback/127.0.0.1 rather than "localhost". --- * [HBASE-23986](https://issues.apache.org/jira/browse/HBASE-23986) | *Major* | **Bump hadoop-two.version to 2.10.0 on master and branch-2** Bumped hadoop-two.version to 2.10.0, which means we will drop the support for hadoop-2.8.x and hadoop-2.9.x. --- * [HBASE-23930](https://issues.apache.org/jira/browse/HBASE-23930) | *Minor* | **Shell should attempt to format \`timestamp\` attributes as ISO-8601** Change timestamp display to be ISO8601 when toString on Cell and outputting in shell.... User used to see.... column=table:state, timestamp=1583967620343 ..... ... but now sees: column=table:state, timestamp=2020-03-11T23:00:20.343Z .... --- * [HBASE-22827](https://issues.apache.org/jira/browse/HBASE-22827) | *Major* | **Expose multi-region merge in shell and Admin API** merge\_region shell command can now be used to merge more than 2 regions as well. It takes a list of regions as comma separated values or as an array of regions, and not just 2 regions. The full regionnames and encoded regionnames are continued to be accepted. --- * [HBASE-23767](https://issues.apache.org/jira/browse/HBASE-23767) | *Major* | **Add JDK11 compilation and unit test support to Github precommit** Rebuild our Dockerfile with support for multiple JDK versions. Use multiple stages in the Jenkinsfile instead of yetus's multijdk because of YETUS-953. Run those multiple stages in parallel to speed up results. Note that multiple stages means multiple Yetus invocations means multiple comments on the PreCommit. This should become more obvious to users once we can make use of GitHub Checks API, HBASE-23902. --- * [HBASE-22978](https://issues.apache.org/jira/browse/HBASE-22978) | *Minor* | **Online slow response log** get\_slowlog\_responses and clear\_slowlog\_responses are used to retrieve and clear slow RPC logs from RingBuffer maintained by RegionServers. New Admin APIs: 1. List\ getSlowLogResponses(final Set\ serverNames, final SlowLogQueryFilter slowLogQueryFilter) throws IOException; 2. List\ clearSlowLogResponses(final Set\ serverNames) throws IOException; Configs: 1. hbase.regionserver.slowlog.ringbuffer.size: Default size of ringbuffer to be maintained by each RegionServer in order to store online slowlog responses. This is an in-memory ring buffer of requests that were judged to be too slow in addition to the responseTooSlow logging. The in-memory representation would be complete. For more details, please look into Doc Section: Get Slow Response Log from shell Default 256 2. hbase.regionserver.slowlog.buffer.enabled: Indicates whether RegionServers have ring buffer running for storing Online Slow logs in FIFO manner with limited entries. The size of the ring buffer is indicated by config: hbase.regionserver.slowlog.ringbuffer.size The default value is false, turn this on and get latest slowlog responses with complete data. Default false For more details, please look into "Get Slow Response Log from shell" section from HBase book. --- * [HBASE-23926](https://issues.apache.org/jira/browse/HBASE-23926) | *Major* | **[Flakey Tests] Down the flakies re-run ferocity; it makes for too many fails.** Down the flakey re-rerun fork count from 1.0C -- i.e. a fork per CPU -- to 0.25C. On a recent run, the machine had 16 cores. 0.25 is 4 cores. We'd hardcoded fork count at 3 previous to changes made by parent. --- * [HBASE-23146](https://issues.apache.org/jira/browse/HBASE-23146) | *Major* | **Support CheckAndMutate with multiple conditions** Add a checkAndMutate(row, filter) method in the AsyncTable interface and the Table interface. This method atomically checks if the row matches the specified filter. If it does, it adds the Put/Delete/RowMutations. This is a fluent style API, the code is like: For Table interface: {code} table.checkAndMutate(row, filter).thenPut(put); {code} For AsyncTable interface: {code} table.checkAndMutate(row, filter).thenPut(put) .thenAccept(succ -\> { if (succ) { System.out.println("Check and put succeeded"); } else { System.out.println("Check and put failed"); } }); {code} --- * [HBASE-23874](https://issues.apache.org/jira/browse/HBASE-23874) | *Minor* | **Move Jira-attached file precommit definition from script in Jenkins config to dev-support** The Jira Precommit job (https://builds.apache.org/job/PreCommit-HBASE-Build/) will now look for a file within the source tree (dev-support/jenkins\_precommit\_jira\_yetus.sh) instead of depending on a script section embedded in the job. --- * [HBASE-23865](https://issues.apache.org/jira/browse/HBASE-23865) | *Major* | **Up flakey history from 5 to 10** Changed flakey list reporting to show 5 rather than 10 items. Also changed the second and first part fort counts to be 1C rather than hardcoded 3. --- * [HBASE-23554](https://issues.apache.org/jira/browse/HBASE-23554) | *Major* | **Encoded regionname to regionname utility** Adds shell command regioninfo: hbase(main):001:0\> regioninfo '0e6aa5c19ae2b2627649dc7708ce27d0' {ENCODED =\> 0e6aa5c19ae2b2627649dc7708ce27d0, NAME =\> 'TestTable,,1575941375972.0e6aa5c19ae2b2627649dc7708ce27d0.', STARTKEY =\> '', ENDKEY =\> '00000000000000000000299441'} Took 0.4737 seconds --- * [HBASE-23350](https://issues.apache.org/jira/browse/HBASE-23350) | *Major* | **Make compaction files cacheonWrite configurable based on threshold** This JIRA adds a new configuration - \`hbase.rs.cachecompactedblocksonwrite.threshold\`. This configuration is the maximum total size (in bytes) of the compacted files below which the configuration \`hbase.rs.cachecompactedblocksonwrite\` is honoured. If the total size of the compacted fies exceeds this threshold, even when \`hbase.rs.cachecompactedblocksonwrite\` is enabled, the data blocks are not cached. Caching index and bloom blocks is not affected by this configuration (user configuration is always honoured). Default value of this configuration is Long.MAX\_VALUE. This means whatever the total size of the compacted files, it wil be cached. --- * [HBASE-17115](https://issues.apache.org/jira/browse/HBASE-17115) | *Major* | **HMaster/HRegion Info Server does not honour admin.acl** Implements authorization for the HBase Web UI by limiting access to certain endpoints which could be used to extract sensitive information from HBase. Access to these restricted endpoints can be limited to a group of administrators, identified either by a list of users (hbase.security.authentication.spnego.admin.users) or by a list of groups (hbase.security.authentication.spnego.admin.groups). By default, neither of these values are set which will preserve backwards compatibility (allowing all authenticated users to access all endpoints). Further, users who have sensitive information in the HBase service configuration can set hbase.security.authentication.ui.config.protected to true which will treat the configuration endpoint as a protected, admin-only resource. By default, all authenticated users may access the configuration endpoint. --- * [HBASE-23647](https://issues.apache.org/jira/browse/HBASE-23647) | *Major* | **Make MasterRegistry the default registry impl** Enables master based registry as the default registry used by clients to fetch connection metadata. Refer to the section "Master Registry" in the client documentation for more details and advantages of this implementation over the default Zookeeper based registry. Configuration parameter that controls the registry in use: `hbase.client.registry.impl` Where to set this: HBase client configuration (hbase-site.xml) Possible values: - `org.apache.hadoop.hbase.client.ZKConnectionRegistry` (For ZK based registry implementation) - `org.apache.hadoop.hbase.client.MasterRegistry` (New, for master based registry implementation) Notes on defaults: - For v3.0.0 and later, MasterRegistry is the default registry - For all releases in 2.x line, ZK based registry is the default. This feature has been back ported to 2.3.0 and later releases. MasterRegistry can be enabled by setting the following client configuration. ``` hbase.client.registry.impl org.apache.hadoop.hbase.client.MasterRegistry ``` --- * [HBASE-23069](https://issues.apache.org/jira/browse/HBASE-23069) | *Critical* | **periodic dependency bump for Sep 2019** caffeine: 2.6.2 =\> 2.8.1 commons-codec: 1.10 =\> 1.13 commons-io: 2.5 =\> 2.6 disrupter: 3.3.6 =\> 3.4.2 httpcore: 4.4.6 =\> 4.4.13 jackson: 2.9.10 =\> 2.10.1 jackson.databind: 2.9.10.1 =\> 2.10.1 jetty: 9.3.27.v20190418 =\> 9.3.28.v20191105 protobuf.plugin: 0.5.0 =\> 0.6.1 zookeeper: 3.4.10 =\> 3.4.14 slf4j: 1.7.25 =\> 1.7.30 rat: 0.12 =\> 0.13 asciidoctor: 1.5.5 =\> 1.5.8 asciidoctor.pdf: 1.5.0-alpha.15 =\> 1.5.0-rc.2 error-prone: 2.3.3 =\> 2.3.4 --- * [HBASE-23686](https://issues.apache.org/jira/browse/HBASE-23686) | *Major* | **Revert binary incompatible change and remove reflection** - Reverts a binary incompatible binary change for ByteRangeUtils - Usage of reflection inside CommonFSUtils removed --- * [HBASE-23347](https://issues.apache.org/jira/browse/HBASE-23347) | *Major* | **Pluggable RPC authentication** This change introduces an internal abstraction layer which allows for new SASL-based authentication mechanisms to be used inside HBase services. All existing SASL-based authentication mechanism were ported to the new abstraction, making no external change in runtime semantics, client API, or RPC serialization format. Developers familiar with extending HBase can implement authentication mechanism beyond simple Kerberos and DelegationTokens which authenticate HBase users against some other user database. HBase service authentication (Master to/from RegionServer) continue to operate solely over Kerberos. --- * [HBASE-23156](https://issues.apache.org/jira/browse/HBASE-23156) | *Major* | **start-hbase.sh failed with ClassNotFoundException when build with hadoop3** Introduce a new hbase-assembly/src/main/assembly/hadoop-three-compat.xml for build with hadoop 3.x. --- * [HBASE-23680](https://issues.apache.org/jira/browse/HBASE-23680) | *Major* | **RegionProcedureStore missing cleaning of hfile archive** Add a new config to hbase-default.xml \ \hbase.procedure.store.region.hfilecleaner.plugins\ \org.apache.hadoop.hbase.master.cleaner.TimeToLiveHFileCleaner\ \A comma-separated list of BaseHFileCleanerDelegate invoked by the RegionProcedureStore HFileCleaner service. These HFiles cleaners are called in order, so put the cleaner that prunes the most files in front. To implement your own BaseHFileCleanerDelegate, just put it in HBase's classpath and add the fully qualified class name here. Always add the above default hfile cleaners in the list as they will be overwritten in hbase-site.xml.\ \ It will share the same TTL with other HFileCleaners. And you can also implement your own cleaner and change this property to enable it. --- * [HBASE-23675](https://issues.apache.org/jira/browse/HBASE-23675) | *Minor* | **Move to Apache parent POM version 22** Updated parent pom to Apache version 22. --- * [HBASE-23679](https://issues.apache.org/jira/browse/HBASE-23679) | *Critical* | **FileSystem instance leaks due to bulk loads with Kerberos enabled** This issues fixes an issue with Bulk Loading on installations with Kerberos enabled and more than a single RegionServer. When multiple tables are involved in hosting a table's regions which are being bulk-loaded into, all but the RegionServer hosting the table's first Region will "leak" one DistributedFileSystem object onto the heap, never freeing that memory. Eventually, with enough bulk loads, this will create a situation for RegionServers where they have no free heap space and will either spend all time in JVM GC, lose their ZK session, or crash with an OutOfMemoryError. The only mitigation for this issue is to periodically restart RegionServers. All earlier versions of HBase 2.x are subject to this issue (2.0.x, \<=2.1.8, \<=2.2.3) --- * [HBASE-23286](https://issues.apache.org/jira/browse/HBASE-23286) | *Major* | **Improve MTTR: Split WAL to HFile** Add a new feature to improve MTTR which have 3 steps to failover: 1. Read WAL and write HFile to region’s column family’s recovered.hfiles directory. 2. Open region. 3. Bulkload the recovered.hfiles for every column family. Compared to DLS(distributed log split), this feature will reduce region open time significantly. Config hbase.wal.split.to.hfile to true to enable this featue. --- * [HBASE-23619](https://issues.apache.org/jira/browse/HBASE-23619) | *Trivial* | **Use built-in formatting for logging in hbase-zookeeper** Changed the logging in hbase-zookeeper to use built-in formatting --- * [HBASE-23628](https://issues.apache.org/jira/browse/HBASE-23628) | *Minor* | **Replace Apache Commons Digest Base64 with JDK8 Base64** From the PR: "Yes. The two create the same output... I just wrote a small test suite to increase my confidence on that. I generated many tens of millions of random byte patterns and compared the output of the two algorithms. They came back identical every time. "Just in case any inquiring minds would like to know, there is no longer an encoding required when generating the strings. The JDK implementation specifically specifies that strings returned are StandardCharsets.ISO\_8859\_1. This does not change anything because UTF8 and ISO\_8859 overlap for the limited character set (64 characters) the encoding uses." --- * [HBASE-23651](https://issues.apache.org/jira/browse/HBASE-23651) | *Major* | **Region balance throttling can be disabled** Set hbase.balancer.max.balancing to a int value which \<=0 will disable region balance throttling. --- * [HBASE-23588](https://issues.apache.org/jira/browse/HBASE-23588) | *Major* | **Cache index blocks and bloom blocks on write if CacheCompactedBlocksOnWrite is enabled** If cacheOnWrite is enabled during flush or compaction, index and bloom blocks(with data blocks) would be automatically cached during write. --- * [HBASE-23369](https://issues.apache.org/jira/browse/HBASE-23369) | *Major* | **Auto-close 'unknown' Regions reported as OPEN on RegionServers** If a RegionServer reports a Region as OPEN in disagreement with Master's status on the Region, the Master now tells the RegionServer to silently close the Region. --- * [HBASE-23596](https://issues.apache.org/jira/browse/HBASE-23596) | *Major* | **HBCKServerCrashProcedure can double assign** Makes it so the recently added HBCKServerCrashProcedure -- the SCP that gets invoked when an operator schedules an SCP via hbck2 scheduleRecoveries command -- now works the same as SCP EXCEPT if master knows nothing of the scheduled servername. In this latter case, HBCKSCP will do a full scan of hbase:meta looking for instances of the passed servername. If any found it will attempt cleanup of hbase:meta references by reassigning any found OPEN or OPENING and by closing any in CLOSING state. Used to fix instances of what the 'HBCK Report' page shows as 'Unknown Servers'. --- * [HBASE-23624](https://issues.apache.org/jira/browse/HBASE-23624) | *Major* | **Add a tool to dump the procedure info in HFile** Use ./hbase org.apache.hadoop.hbase.procedure2.store.region.HFileProcedurePrettyPrinter to run the tool. --- * [HBASE-23590](https://issues.apache.org/jira/browse/HBASE-23590) | *Major* | **Update maxStoreFileRefCount to maxCompactedStoreFileRefCount** RegionsRecoveryChore introduced as part of HBASE-22460 tries to reopen regions based on config: hbase.regions.recovery.store.file.ref.count. Region reopen needs to take into consideration all compacted away store files that belong to the region and not store files(non-compacted). Fixed this bug as part of this Jira. Updated description for corresponding configs: 1. hbase.master.regions.recovery.check.interval : Regions Recovery Chore interval in milliseconds. This chore keeps running at this interval to find all regions with configurable max store file ref count and reopens them. Defaults to 20 mins 2. hbase.regions.recovery.store.file.ref.count : Very large number of ref count on a compacted store file indicates that it is a ref leak on that object(compacted store file). Such files can not be removed after it is invalidated via compaction. Only way to recover in such scenario is to reopen the region which can release all resources, like the refcount, leases, etc. This config represents Store files Ref Count threshold value considered for reopening regions. Any region with compacted store files ref count \> this value would be eligible for reopening by master. Here, we get the max refCount among all refCounts on all compacted away store files that belong to a particular region. Default value -1 indicates this feature is turned off. Only positive integer value should be provided to enable this feature. --- * [HBASE-23618](https://issues.apache.org/jira/browse/HBASE-23618) | *Major* | **Add a tool to dump procedure info in the WAL file** Use ./hbase org.apache.hadoop.hbase.procedure2.store.region.WALProcedurePrettyPrinter to run the tool. --- * [HBASE-23617](https://issues.apache.org/jira/browse/HBASE-23617) | *Major* | **Add a stress test tool for region based procedure store** Use ./hbase org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStorePerformanceEvaluation to run the tool. --- * [HBASE-23326](https://issues.apache.org/jira/browse/HBASE-23326) | *Critical* | **Implement a ProcedureStore which stores procedures in a HRegion** Use a region based procedure store to replace the old customized WAL based procedure store. The procedure data migration is done automatically during upgrading. After upgrading, the MasterProcWALs directory will be deleted and a new MasterProc directory will be created. And notice that a region will still write WAL so we still have WAL files and they will be moved to the oldWALs directory. The file name is mostly like a normal WAL file, and the only difference is that it is ended with "$masterproc$". --- * [HBASE-23320](https://issues.apache.org/jira/browse/HBASE-23320) | *Major* | **Upgrade surefire plugin to 3.0.0-M4** Bumped surefire plugin to 3.0.0-M4 --- * [HBASE-20461](https://issues.apache.org/jira/browse/HBASE-20461) | *Major* | **Implement fsync for AsyncFSWAL** Now AsyncFSWAL also supports Durability.FSYNC\_WAL. --- * [HBASE-23066](https://issues.apache.org/jira/browse/HBASE-23066) | *Minor* | **Create a config that forces to cache blocks on compaction** The configuration 'hbase.rs.cacheblocksonwrite' was used to enable caching the blocks on write. But purposefully we were not caching the blocks when we do compaction (since it may be very aggressive) as the caching happens as and when the writer completes a block. In cloud environments since they have bigger sized caches - though they try to enable 'hbase.rs.prefetchblocksonopen' (non - aggressive way of caching the blocks proactively on reader creation) it does not help them because it takes time to cache the compacted blocks. This feature creates a new configuration 'hbase.rs.cachecompactedblocksonwrite' which when set to 'true' will enable the blocks created out of compaction. Remember that since it is aggressive caching the user should be having enough cache space - if not it may lead to other active blocks getting evicted. From the shell this can be enabled by using the option per Column Family also by using the below format {code} create 't1', 'f1', {NUMREGIONS =\> 15, SPLITALGO =\> 'HexStringSplit', CONFIGURATION =\> {'hbase.rs.cachecompactedblocksonwrite' =\> 'true'}} {code} --- * [HBASE-23239](https://issues.apache.org/jira/browse/HBASE-23239) | *Major* | **Reporting on status of backing MOB files from client-facing cells** Users of the MOB feature can now use the `mobrefs` utility to get statistics about data in the MOB system and verify the health of backing files on HDFS. ``` HADOOP_CLASSPATH=/etc/hbase/conf:$(hbase mapredcp) yarn jar \ /some/path/to/hbase-shaded-mapreduce.jar mobrefs mobrefs-report-output some_table foo ``` See javadocs of the class `MobRefReporter` for more details. the reference guide has added some information about MOB internals and troubleshooting. --- * [HBASE-23549](https://issues.apache.org/jira/browse/HBASE-23549) | *Minor* | **Document steps to disable MOB for a column family** The reference guide now includes a walk through of disabling the MOB feature if needed while maintaining availability. --- * [HBASE-23582](https://issues.apache.org/jira/browse/HBASE-23582) | *Minor* | **Unbalanced braces in string representation of table descriptor** Fixed unbalanced braces in string representation within HBase shell --- * [HBASE-23293](https://issues.apache.org/jira/browse/HBASE-23293) | *Minor* | **[REPLICATION] make ship edits timeout configurable** The default rpc timeout for ReplicationSourceShipper#shipEdits is 60s, when bulkload replication enabled, timeout exception may be occurred. Now we can conf the timeout value through replication.source.shipedits.timeout, and it’s adaptive. --- * [HBASE-23312](https://issues.apache.org/jira/browse/HBASE-23312) | *Major* | **HBase Thrift SPNEGO configs (HBASE-19852) should be backwards compatible** The newer HBase Thrift SPNEGO configs should not be required. The hbase.thrift.spnego.keytab.file and hbase.thrift.spnego.principal configs will fall back to the hbase.thrift.keytab.file and hbase.thrift.kerberos.principal original configs. The older configs will log a deprecation warning. It is preferred to new the newer SPNEGO configurations. --- * [HBASE-22969](https://issues.apache.org/jira/browse/HBASE-22969) | *Minor* | **A new binary component comparator(BinaryComponentComparator) to perform comparison of arbitrary length and position** With BinaryComponentCompartor applications will be able to design diverse and powerful set of filters for rows and columns. See https://issues.apache.org/jira/browse/HBASE-22969 for example. In general, the comparator can be used with any filter taking ByteArrayComparable. As of now, following filters take ByteArrayComparable: 1. RowFilter 2. ValueFilter 3. QualifierFilter 4. FamilyFilter 5. ColumnValueFilter --- * [HBASE-23234](https://issues.apache.org/jira/browse/HBASE-23234) | *Major* | **Provide .editorconfig based on checkstyle configuration** Adds a .editorconfig file with configurations populated by IntelliJ, based on our checkstyle configuration. There's lots of IntelliJ-specific configs in here that I assume are not replicated to Eclipse or Netbeans users. Any devs using those tools should push whatever updates they see fit, but please start with the checkstyle configs as the origin of truth. --- * [HBASE-23322](https://issues.apache.org/jira/browse/HBASE-23322) | *Minor* | **[hbck2] Simplification on HBCKSCP scheduling** An hbck2 scheduleRecoveries will run a subclass of ServerCrashProcedure which asks Master what Regions were on the dead Server but it will also do a hbase:meta table scan to see if any vestiges of the old Server remain (for the case where an SCP failed mid-point leaving references in place or where Master and hbase:meta deviated in accounting). --- * [HBASE-23321](https://issues.apache.org/jira/browse/HBASE-23321) | *Minor* | **[hbck2] fixHoles of fixMeta doesn't update in-memory state** If holes in hbase:meta, hbck2 fixMeta now will update Master in-memory state so you do not need to restart master just so you can assign the new hole-bridging regions. --- * [HBASE-23282](https://issues.apache.org/jira/browse/HBASE-23282) | *Major* | **HBCKServerCrashProcedure for 'Unknown Servers'** hbck2 scheduleRecoveries will now run a SCP that also looks in hbase:meta for any references to the scheduled server -- not just consult Master in-memory state -- just in case vestiges of the server are leftover in hbase:meta --- * [HBASE-19450](https://issues.apache.org/jira/browse/HBASE-19450) | *Minor* | **Add log about average execution time for ScheduledChore** HBase internal chores now log a moving average of how long execution of each chore takes at `INFO` level for the logger `org.apache.hadoop.hbase.ScheduledChore`. Such messages will happen at most once per five minutes. --- * [HBASE-23250](https://issues.apache.org/jira/browse/HBASE-23250) | *Minor* | **Log message about CleanerChore delegate initialization should be at INFO** CleanerChore delegate initialization is now logged at INFO level instead of DEBUG --- * [HBASE-23243](https://issues.apache.org/jira/browse/HBASE-23243) | *Major* | **[pv2] Filter out SUCCESS procedures; on decent-sized cluster, plethora overwhelms problems** The 'Procedures & Locks' tab in Master UI only displays problematic Procedures now (RUNNABLE, WAITING-TIMEOUT, etc.). It no longer notes procedures whose state is SUCCESS. --- * [HBASE-23227](https://issues.apache.org/jira/browse/HBASE-23227) | *Blocker* | **Upgrade jackson-databind to 2.9.10.1 to avoid recent CVEs** the Apache HBase REST Proxy now uses Jackson Databind version 2.9.10.1 to address the following CVEs - CVE-2019-16942 - CVE-2019-16943 Users of prior releases with Jackson Databind 2.9.10 are advised to either upgrade to this release or to upgrade their local Jackson Databind jar directly. --- * [HBASE-23222](https://issues.apache.org/jira/browse/HBASE-23222) | *Critical* | **Better logging and mitigation for MOB compaction failures** The MOB compaction process in the HBase Master now logs more about its activity. In the event that you run into the problems described in HBASE-22075, there is a new HFileCleanerDelegate that will stop all removal of MOB hfiles from the archive area. It can be configured by adding `org.apache.hadoop.hbase.mob.ManualMobMaintHFileCleaner` to the list configured for `hbase.master.hfilecleaner.plugins`. This new cleaner delegate will cause your archive area to grow unbounded; you will have to manually prune files which may be prohibitively complex. Consider if your use case will allow you to mitigate by disabling mob compactions instead. Caveats: * Be sure the list of cleaner delegates still includes the default cleaners you will likely need: ttl, snapshot, and hlink. * Be mindful that if you enable this cleaner delegate then there will be *no* automated process for removing these mob hfiles. You should see a single region per table in `%hbase_root%/archive` that accumulates files over time. You will have to determine which of these files are safe or not to remove. * You should list this cleaner delegate after the snapshot and hlink delegates so that you can enable sufficient logging to determine when an archived mob hfile is needed by those subsystems. When set to `TRACE` logging, the CleanerChore logger will include archive retention decision justifications. * If your use case creates a large number of uniquely named tables, this new delegate will cause memory pressure on the master. --- * [HBASE-15519](https://issues.apache.org/jira/browse/HBASE-15519) | *Major* | **Add per-user metrics** Adds per-user metrics for reads/writes to each RegionServer. These metrics are exported by default. hbase.regionserver.user.metrics.enabled can be used to disable the feature if desired for any reason. --- * [HBASE-22460](https://issues.apache.org/jira/browse/HBASE-22460) | *Minor* | **Reopen a region if store reader references may have leaked** Leaked store files can not be removed even after it is invalidated via compaction. A reasonable mitigation for a reader reference leak would be a fast reopen of the region on the same server. Configs: 1. hbase.master.regions.recovery.check.interval : Regions Recovery Chore interval in milliseconds. This chore keeps running at this interval to find all regions with configurable max store file ref count and reopens them. Defaults to 20 mins 2. hbase.regions.recovery.store.file.ref.count : This config represents Store files Ref Count threshold value considered for reopening regions. Any region with store files ref count \> this value would be eligible for reopening by master. Default value -1 indicates this feature is turned off. Only positive integer value should be provided to enable this feature. --- * [HBASE-23172](https://issues.apache.org/jira/browse/HBASE-23172) | *Minor* | **HBase Canary region success count metrics reflect column family successes, not region successes** Added a comment to make clear that read/write success counts are tallying column family success counts, not region success counts. Additionally, the region read and write latencies previously only stored the latencies of the last column family of the region reads/writes. This has been fixed by using a map of each region to a list of read and write latency values. --- * [HBASE-23177](https://issues.apache.org/jira/browse/HBASE-23177) | *Major* | **If fail to open reference because FNFE, make it plain it is a Reference** Changes the message on the FNFE exception thrown when the file a Reference points to is missing; the message now includes detail on Reference as well as pointed-to file so can connect how FNFE relates to region open. --- * [HBASE-20626](https://issues.apache.org/jira/browse/HBASE-20626) | *Major* | **Change the value of "Requests Per Second" on WEBUI** Use 'totalRowActionRequestCount' to calculate QPS on web UI. --- * [HBASE-22874](https://issues.apache.org/jira/browse/HBASE-22874) | *Critical* | **Define a public interface for Canary and move existing implementation to LimitedPrivate** Downstream users who wish to programmatically check the health of their HBase cluster may now rely on a public interface derived from the previously private implementation of the canary cli tool. The interface is named `Canary` and can be found in the user facing javadocs. Downstream users who previously relied on the invoking the canary via the Java classname (either on the command line or programmatically) will need to change how they do so because the non-public implementation has moved. --- * [HBASE-23035](https://issues.apache.org/jira/browse/HBASE-23035) | *Major* | **Retain region to the last RegionServer make the failover slower** Since 2.0.0,when one regionserver crashed and back online again, AssignmentManager will retain the region locations and try assign the regions to this regionserver(same host:port with the crashed one) again. But for 1.x.x, the behavior is round-robin assignment for the regions belong to the crashed regionserver. This jira change the "retain" assignment to round-robin assignment, which is same with 1.x.x version. This change will make the failover faster and improve availability. --- * [HBASE-23046](https://issues.apache.org/jira/browse/HBASE-23046) | *Minor* | **Remove compatibility case from truncate command** Remove backward compatibility from \`truncate\` and \`truncate\_preserve\` shell commands. This means that these commands from HBase Clients are not compatible with pre-0.99 HBase clusters. --- * [HBASE-23040](https://issues.apache.org/jira/browse/HBASE-23040) | *Minor* | **region mover gives NullPointerException instead of saying a host isn't in the cluster** giving the region mover "unload" command a region server name that isn't recognized by the cluster results in a "I don't know about that host" message instead of a NPE. set log level to DEBUG if you'd like the region mover to log the set of region server names it got back from the cluster. --- * [HBASE-21874](https://issues.apache.org/jira/browse/HBASE-21874) | *Major* | **Bucket cache on Persistent memory** Added a new IOEngine type for Bucket cache ie Persistent memory. In order to use BC over pmem configure IOEngine as \ \hbase.bucketcache.ioengine\ \ pmem:///path in persistent memory \ \ --- * [HBASE-22760](https://issues.apache.org/jira/browse/HBASE-22760) | *Major* | **Stop/Resume Snapshot Auto-Cleanup activity with shell command** By default, snapshot auto cleanup based on TTL would be enabled for any new cluster. At any point in time, if snapshot cleanup is supposed to be stopped due to some snapshot restore activity or any other reason, it is advisable to disable it using shell command: hbase\> snapshot\_cleanup\_switch false We can re-enable it using: hbase\> snapshot\_cleanup\_switch true We can query whether snapshot auto cleanup is enabled for cluster using: hbase\> snapshot\_cleanup\_enabled --- * [HBASE-22796](https://issues.apache.org/jira/browse/HBASE-22796) | *Major* | **[HBCK2] Add fix of overlaps to fixMeta hbck Service** Adds fix of overlaps to the fixMeta hbck service method. Uses the bulk-merge facility. Merges a max of 10 at a time. Set hbase.master.metafixer.max.merge.count to higher if you want to do more than 10 in the one go. --- * [HBASE-21745](https://issues.apache.org/jira/browse/HBASE-21745) | *Critical* | **Make HBCK2 be able to fix issues other than region assignment** This issue adds via its subtasks: \* An 'HBCK Report' page to the Master UI added by HBASE-22527+HBASE-22709+HBASE-22723+ (since 2.1.6, 2.2.1, 2.3.0). Lists consistency or anomalies found via new hbase:meta consistency checking extensions added to CatalogJanitor (holes, overlaps, bad servers) and by a new 'HBCK chore' that runs at a lesser periodicity that will note filesystem orphans and overlaps as well as the following conditions: \*\* Master thought this region opened, but no regionserver reported it. \*\* Master thought this region opened on Server1, but regionserver reported Server2 \*\* More than one regionservers reported opened this region Both chores can be triggered from the shell to regenerate ‘new’ reports. \* Means of scheduling a ServerCrashProcedure (HBASE-21393). \* An ‘offline’ hbase:meta rebuild (HBASE-22680). \* Offline replace of hbase.version and hbase.id \* Documentation on how to use completebulkload tool to ‘adopt’ orphaned data found by new HBCK2 ‘filesystem’ check (see below) and ‘HBCK chore’ (HBASE-22859) \* A ‘holes’ and ‘overlaps’ fix that runs in the master that uses new bulk-merge facility to collapse many overlaps in the one go. \* hbase-operator-tools HBCK2 client tool got a bunch of additions: \*\* A specialized 'fix' for the case where operators ran old hbck 'offlinemeta' repair and destroyed their hbase:meta; it ties together holes in meta with orphaned data in the fs (HBASE-22567) \*\* A ‘filesystem’ command that reports on orphan data as well as bad references and hlinks with a ‘fix’ for the latter two options (based on hbck1 facility updated). \*\* Adds back the ‘replication’ fix facility from hbck1 (HBASE-22717) The compound result is that hbck2 is now in excess of hbck1 abilities. The provided functionality is disaggregated as per the hbck2 philosophy of providing 'plumbing' rather than 'porcelain' so there is work to do still adding fix-it playbooks, scripting across outages, and automation. --- * [HBASE-22802](https://issues.apache.org/jira/browse/HBASE-22802) | *Major* | **Avoid temp ByteBuffer allocation in FileIOEngine#read** HBASE-21879 introduces a utility class (org.apache.hadoop.hbase.io.ByteBuffAllocator) used for allocating/freeing ByteBuffers from/to NIO ByteBuffer pool, when BucketCache enabled with file or mmap engine, we will use this ByteBuffer pool to avoid temp ByteBuffer allocation a lot. --- * [HBASE-11062](https://issues.apache.org/jira/browse/HBASE-11062) | *Major* | **hbtop** Introduces hbtop that's a real-time monitoring tool for HBase like Unix's top command. See the ref guide for the details: https://hbase.apache.org/book.html#hbtop --- * [HBASE-21879](https://issues.apache.org/jira/browse/HBASE-21879) | *Major* | **Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose** Before this issue, read path was 100% offheap when block is in the BucketCache. But if a cache miss, then the RS needs to read the block via an on-heap API which causes high young-GC pressure. This issue adds reading the block via offheap even if reading the block from filesystem directly. It requires hadoop version(\>=2.9.3) but can also work with older hadoop versions (all works but we continue to read block onheap). It also requires HBASE-21946 which is not yet in place as of this writing/hbase-2.3.0. We have written a careful doc about the implementation, performance and practice here: https://docs.google.com/document/d/1xSy9axGxafoH-Qc17zbD2Bd--rWjjI00xTWQZ8ZwI\_E/edit#heading=h.nch5d72p27ex --- * [HBASE-22618](https://issues.apache.org/jira/browse/HBASE-22618) | *Major* | **added the possibility to load custom cost functions** Extends `StochasticLoadBalancer` to support user-provided cost function. These are loaded in addition to the default set of cost functions. Custom function implementations must extend `StochasticLoadBalancer$CostFunction`. Enable any additional functions by placing them on the master class path and configuring `hbase.master.balancer.stochastic.additionalCostFunctions` with a comma-separated list of fully-qualified class names. --- * [HBASE-22867](https://issues.apache.org/jira/browse/HBASE-22867) | *Critical* | **The ForkJoinPool in CleanerChore will spawn thousands of threads in our cluster with thousands table** Replace the ForkJoinPool in CleanerChore by ThreadPoolExecutor which can limit the spawn thread size and avoid the master GC frequently. The replacement is an internal implementation in CleanerChore, so no config key change, the upstream users can just upgrade the hbase master without any other change. --- * [HBASE-22810](https://issues.apache.org/jira/browse/HBASE-22810) | *Major* | **Initialize an separate ThreadPoolExecutor for taking/restoring snapshot** Introduced a new config key for the snapshot taking/restoring operations at master side: hbase.master.executor.snapshot.threads, its default value is 3. means we can have 3 snapshot operations running at the same time. --- * [HBASE-22863](https://issues.apache.org/jira/browse/HBASE-22863) | *Major* | **Avoid Jackson versions and dependencies with known CVEs** 1. Stopped exposing vulnerable Jackson1 dependencies so that downstreamers would not pull it in from HBase. 2. However, since Hadoop requires some Jackson1 dependencies, put vulnerable Jackson mapper at test scope in some HBase modules and hence, HBase tarball created by hbase-assembly contains Jackson1 mapper jar in lib. Still, downsteam applications can't pull in Jackson1 from HBase. --- * [HBASE-22841](https://issues.apache.org/jira/browse/HBASE-22841) | *Major* | **TimeRange's factory functions do not support ranges, only \`allTime\` and \`at\`** Add serveral API in TimeRange class for avoiding using the deprecated TimeRange constructor: \* TimeRange#from: Represents the time interval [minStamp, Long.MAX\_VALUE) \* TimeRange#until: Represents the time interval [0, maxStamp) \* TimeRange#between: Represents the time interval [minStamp, maxStamp) --- * [HBASE-22833](https://issues.apache.org/jira/browse/HBASE-22833) | *Minor* | **MultiRowRangeFilter should provide a method for creating a filter which is functionally equivalent to multiple prefix filters** Provide a public method in MultiRowRangeFilter class to speed the requirement of filtering with multiple row prefixes, it will expand the row prefixes as multiple rowkey ranges by MultiRowRangeFilter, it's more efficient. {code} public MultiRowRangeFilter(byte[][] rowKeyPrefixes); {code} --- * [HBASE-22856](https://issues.apache.org/jira/browse/HBASE-22856) | *Major* | **HBASE-Find-Flaky-Tests fails with pip error** Update the base docker image to ubuntu 18.04 for the find flaky tests jenkins job. --- * [HBASE-22771](https://issues.apache.org/jira/browse/HBASE-22771) | *Major* | **[HBCK2] fixMeta method and server-side support** Adds a fixMeta method to hbck Service. Fixes holes in hbase:meta. Follow-up to fix overlaps. See HBASE-22567 also. Follow-on is adding a client-side to hbase-operator-tools that can exploit this new addition (HBASE-22825) --- * [HBASE-22777](https://issues.apache.org/jira/browse/HBASE-22777) | *Major* | **Add a multi-region merge (for fixing overlaps, etc.)** Changes merge so you can merge more than two regions at a time. Currently only available inside HBase. HBASE-22827, a follow-on, is about exposing the facility in the Admin API (and then via the shell). --- * [HBASE-15666](https://issues.apache.org/jira/browse/HBASE-15666) | *Critical* | **shaded dependencies for hbase-testing-util** New shaded artifact for testing: hbase-shaded-testing-util. --- * [HBASE-22776](https://issues.apache.org/jira/browse/HBASE-22776) | *Major* | **Rename config names in user scan snapshot feature** After HBASE-22776, the steps to config user scan snapshot feature is as followings: 1. Check HDFS configuration 2. Add master coprocessor: hbase.coprocessor.master.classes= “org.apache.hadoop.hbase.security.access.AccessController, org.apache.hadoop.hbase.security.access.SnapshotScannerHDFSAclController” 3. Enable this feature: hbase.acl.sync.to.hdfs.enable=true 4. Modify table scheme to enable this feature for a table: alter 't1', CONFIGURATION =\> {'hbase.acl.sync.to.hdfs.enable' =\> 'true'} --- * [HBASE-22539](https://issues.apache.org/jira/browse/HBASE-22539) | *Blocker* | **WAL corruption due to early DBBs re-use when Durability.ASYNC\_WAL is used** We found a critical bug which can lead to WAL corruption when Durability.ASYNC\_WAL is used. The reason is that we release a ByteBuffer before actually persist the content into WAL file. The problem maybe lead to several errors, for example, ArrayIndexOfOutBounds when replaying WAL. This is because that the ByteBuffer is reused by others. ERROR org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while processing event RS\_LOG\_REPLAY java.lang.ArrayIndexOutOfBoundsException: 18056 at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1365) at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1358) at org.apache.hadoop.hbase.PrivateCellUtil.matchingFamily(PrivateCellUtil.java:735) at org.apache.hadoop.hbase.CellUtil.matchingFamily(CellUtil.java:816) at org.apache.hadoop.hbase.wal.WALEdit.isMetaEditFamily(WALEdit.java:143) at org.apache.hadoop.hbase.wal.WALEdit.isMetaEdit(WALEdit.java:148) at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:297) at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:195) at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:100) And may even cause segmentation fault and crash the JVM directly. You will see a hs\_err\_pidXXX.log file and usually the problem is SIGSEGV. This is usually because that the ByteBuffer has already been returned to the OS and used for other purpose. The problem has been reported several times in the past and this time Wellington Ramos Chevreuil provided the full logs and deeply analyzed the logs so we can find the root cause. And Lijin Bin figured out that the problem may only happen when Durability.ASYNC\_WAL is used. Thanks to them. The problem only effects the 2.x releases, all users are highly recommand to upgrade to a release which has this fix in, especially that if you use Durability.ASYNC\_WAL. --- * [HBASE-22737](https://issues.apache.org/jira/browse/HBASE-22737) | *Major* | **Add a new admin method and shell cmd to trigger the hbck chore to run** Add a new method runHbckChore in Hbck interface and a new shell cmd hbck\_chore\_run to request HBCK chore to run at master side. --- * [HBASE-22741](https://issues.apache.org/jira/browse/HBASE-22741) | *Major* | **Show catalogjanitor consistency complaints in new 'HBCK Report' page** Adds a "CatalogJanitor hbase:meta Consistency Issues" section to the new 'HBCK Report' page added by HBASE-22709. This section is empty unless the most recent CatalogJanitor scan turned up problems. If so, will show table of issues found. --- * [HBASE-22723](https://issues.apache.org/jira/browse/HBASE-22723) | *Major* | **Have CatalogJanitor report holes and overlaps; i.e. problems it sees when doing its regular scan of hbase:meta** When CatalogJanitor runs, it now checks for holes, overlaps, empty info:regioninfo columns and bad servers. Dumps findings into log. Follow-up adds report to new 'HBCK Report' linked off the Master UI. NOTE: All features but the badserver check made it into branch-2.1 and branch-2.0 backports. --- * [HBASE-22714](https://issues.apache.org/jira/browse/HBASE-22714) | *Trivial* | **BuffferedMutatorParams opertationTimeOut() is misspelt** The misspelled BufferedMutatorParams.opertationTimeout method has been marked as deprecated, and will be removed in 4.0.0. Please use the BufferedMutatorParams.operationTimeout method instead. --- * [HBASE-22580](https://issues.apache.org/jira/browse/HBASE-22580) | *Major* | **Add a table attribute to make user scan snapshot feature configurable for table** If a table user scan snapshots of the table, please config the following table scheme attribute to make granted users' ACLs are added to hfiles: alter 't1', CONFIGURATION =\> {'hbase.user.scan.snapshot.enable' =\> 'true'} --- * [HBASE-22709](https://issues.apache.org/jira/browse/HBASE-22709) | *Major* | **Add a chore thread in master to do hbck checking and display results in 'HBCK Report' page** 1. Add a new chore thread in master to do hbck checking 2. Add a new web ui "HBCK Report" page to display checking results. This feature is enabled by default. And the hbck chore run per 60 minutes by default. You can config "hbase.master.hbck.checker.interval" to a value lesser than or equal to 0 for disabling the chore. Notice: the config "hbase.master.hbck.checker.interval" was renamed to "hbase.master.hbck.chore.interval" in HBASE-22737. --- * [HBASE-22578](https://issues.apache.org/jira/browse/HBASE-22578) | *Major* | **HFileCleaner should not delete empty ns/table directories used for user san snapshot feature** The HFileCleaner will clean the empty directories under archive, but if enable user scan snaphot feature, the user ACLs are set at there directories, so please config the following cleaner to make the directories with user ACLs not be cleaned: hbase.master.hfilecleaner.plugins=org.apache.hadoop.hbase.security.access.SnapshotScannerHDFSAclCleaner --- * [HBASE-22722](https://issues.apache.org/jira/browse/HBASE-22722) | *Blocker* | **Upgrade jackson databind dependencies to 2.9.9.1** Upgrade jackson databind dependency to 2.9.9.1 due to CVEs https://nvd.nist.gov/vuln/detail/CVE-2019-12814 https://nvd.nist.gov/vuln/detail/CVE-2019-12384 --- * [HBASE-22527](https://issues.apache.org/jira/browse/HBASE-22527) | *Major* | **[hbck2] Add a master web ui to show the problematic regions** Add a new master web UI to show the potentially problematic opened regions. There are three case: 1. Master thought this region opened, but no regionserver reported it. 2. Master thought this region opened on Server1, but regionserver reported Server2 3. More than one regionservers reported opened this region --- * [HBASE-22648](https://issues.apache.org/jira/browse/HBASE-22648) | *Minor* | **Snapshot TTL** Feature: Take a Snapshot With TTL for auto-cleanup Attribute: 1. TTL - Specify TTL in sec while creating snapshot. e.g. snapshot 'mytable', 'snapshot1234', {TTL =\> 86400} (snapshot to be auto-cleaned after 24 hr) Configs: 1. Default Snapshot TTL: - FOREVER by default - User specified Default TTL(sec) with config: hbase.master.snapshot.ttl 2. If Snapshot cleanup is supposed to be stopped due to some snapshot restore activity, disable it with config: - hbase.master.cleaner.snapshot.disable: "true" With this config, HMaster needs restart just like any other hbase-site config. For more details, see the section "Take a Snapshot With TTL" in the HBase Reference Guide. --- * [HBASE-22610](https://issues.apache.org/jira/browse/HBASE-22610) | *Trivial* | **[BucketCache] Rename "hbase.offheapcache.minblocksize"** The config point "hbase.offheapcache.minblocksize" was wrong and is now deprecated. The new config point is "hbase.blockcache.minblocksize". --- * [HBASE-22690](https://issues.apache.org/jira/browse/HBASE-22690) | *Major* | **Deprecate / Remove OfflineMetaRepair in hbase-2+** OfflineMetaRepair is no longer supported in HBase-2+. Please refer to https://hbase.apache.org/book.html#HBCK2 This tool is deprecated in 2.x and will be removed in 3.0. --- * [HBASE-22673](https://issues.apache.org/jira/browse/HBASE-22673) | *Major* | **Avoid to expose protobuf stuff in Hbck interface** Mark the Hbck#scheduleServerCrashProcedure(List\ serverNames) as deprecated. Use Hbck#scheduleServerCrashProcedures(List\ serverNames) instead. --- * [HBASE-22617](https://issues.apache.org/jira/browse/HBASE-22617) | *Blocker* | **Recovered WAL directories not getting cleaned up** In HBASE-20734 we moved the recovered.edits onto the wal file system but when constructing the directory we missed the BASE\_NAMESPACE\_DIR('data'). So when using the default config, you will find that there are lots of new directories at the same level with the 'data' directory. In this issue, we add the BASE\_NAMESPACE\_DIR back, and also try our best to clean up the wrong directories. But we can only clean up the region level directories, so if you want a clean fs layout on HDFS you still need to manually delete the empty directories at the same level with 'data'. The effect versions are 2.2.0, 2.1.[1-5], 1.4.[8-10], 1.3.[3-5]. --- * [HBASE-21995](https://issues.apache.org/jira/browse/HBASE-21995) | *Major* | **Add a coprocessor to set HDFS ACL for hbase granted user** Add a coprocessor to set HDFS acls to make hbase granted users with READ permission have the access to scan snapshots. To use this feature, please make sure the HDFS config is set: dfs.namenode.acls.enabled=true fs.permissions.umask-mode=027 and set the HBase config: hbase.coprocessor.master.classes="org.apache.hadoop.hbase.security.access.AccessController,org.apache.hadoop.hbase.security.access.SnapshotScannerHDFSAclController" hbase.user.scan.snapshot.enable=true --- * [HBASE-22596](https://issues.apache.org/jira/browse/HBASE-22596) | *Minor* | **[Chore] Separate the execution period between CompactionChecker and PeriodicMemStoreFlusher** hbase.regionserver.compaction.check.period is used for controlling how often the compaction checker runs. If unset, will use hbase.server.thread.wakefrequency as default value. hbase.regionserver.flush.check.period is used for controlling how ofter the flush checker runs. If unset, will use hbase.server.thread.wakefrequency as default value. --- * [HBASE-22588](https://issues.apache.org/jira/browse/HBASE-22588) | *Major* | **Upgrade jaxws-ri dependency to 2.3.2** When run with JDK11 HBase now uses more recent version of the jaxws reference implementation (v2.3.2). --- * [HBASE-21536](https://issues.apache.org/jira/browse/HBASE-21536) | *Trivial* | **Fix completebulkload usage instructions** Added completebulkload short name for BulkLoadHFilesTool to bin/hbase. --- * [HBASE-22500](https://issues.apache.org/jira/browse/HBASE-22500) | *Blocker* | **Modify pom and jenkins jobs for hadoop versions** Change the default hadoop-3 version to 3.1.2. Drop the support for the releases which are effected by CVE-2018-8029, see this email https://lists.apache.org/thread.html/3d6831c3893cd27b6850aea2feff7d536888286d588e703c6ffd2e82@%3Cuser.hadoop.apache.org%3E --- * [HBASE-22459](https://issues.apache.org/jira/browse/HBASE-22459) | *Minor* | **Expose store reader reference count** This change exposes the aggregate count of store reader references for a given store as 'storeRefCount' in region metrics and ClusterStatus. --- * [HBASE-22469](https://issues.apache.org/jira/browse/HBASE-22469) | *Minor* | **replace md5 checksum in saveVersion script with sha512 for hbase version information** The HBase "source checksum" now uses SHA512 instead of MD5. --- * [HBASE-22148](https://issues.apache.org/jira/browse/HBASE-22148) | *Blocker* | **Provide an alternative to CellUtil.setTimestamp** The `CellUtil.setTimestamp` method changes to be an API with audience `LimitedPrivate(COPROC)` in HBase 3.0. With that designation the API should remain stable within a given minor release line, but may change between minor releases. Previously, this method was deprecated in HBase 2.0 for removal in HBase 3.0. Deprecation messages in HBase 2.y releases have been updated to indicate the expected API audience change. --- * [HBASE-20782](https://issues.apache.org/jira/browse/HBASE-20782) | *Minor* | **Fix duplication of TestServletFilter.access** The access method was used to the HttpServerFunctionalTest class as a common place. --- * [HBASE-21991](https://issues.apache.org/jira/browse/HBASE-21991) | *Major* | **Fix MetaMetrics issues - [Race condition, Faulty remove logic], few improvements** The class LossyCounting was unintentionally marked Public but was never intended to be part of our public API. This oversight has been corrected and LossyCounting is now marked as Private and going forward may be subject to additional breaking changes or removal without notice. If you have taken a dependency on this class we recommend cloning it locally into your project before upgrading to this release. --- * [HBASE-22226](https://issues.apache.org/jira/browse/HBASE-22226) | *Trivial* | **Incorrect level for headings in asciidoc** Warnings for level headings are corrected in the book for the HBase Incompatibilities section. --- * [HBASE-20970](https://issues.apache.org/jira/browse/HBASE-20970) | *Major* | **Update hadoop check versions for hadoop3 in hbase-personality** Add hadoop 3.0.3, 3.1.1 3.1.2 in our hadoop check jobs. --- * [HBASE-21784](https://issues.apache.org/jira/browse/HBASE-21784) | *Major* | **Dump replication queue should show list of wal files ordered chronologically** The DumpReplicationQueues tool will now list replication queues sorted in chronological order. --- * [HBASE-21048](https://issues.apache.org/jira/browse/HBASE-21048) | *Major* | **Get LogLevel is not working from console in secure environment** Support get\|set LogLevel in secure(kerberized) environment. --- * [HBASE-22384](https://issues.apache.org/jira/browse/HBASE-22384) | *Minor* | **Formatting issues in administration section of book** Fixes a formatting issue in the administration section of the book, where listing indentation were a little bit off. --- * [HBASE-22377](https://issues.apache.org/jira/browse/HBASE-22377) | *Major* | **Provide API to check the existence of a namespace which does not require ADMIN permissions** This change adds the new method listNamespaces to the Admin interface, which can be used to retrieve a list of the namespaces present in the schema as an unprivileged operation. Formerly the only available method for accomplishing this was listNamespaceDescriptors, which requires GLOBAL CREATE or ADMIN permissions. --- * [HBASE-22399](https://issues.apache.org/jira/browse/HBASE-22399) | *Major* | **Change default hadoop-two.version to 2.8.x and remove the 2.7.x hadoop checks** Now the default hadoop-two.version has been changed to 2.8.5, and all hadoop versions before 2.8.2(exclude) will not be supported any more. --- * [HBASE-22392](https://issues.apache.org/jira/browse/HBASE-22392) | *Trivial* | **Remove extra/useless +** Removed extra + in HRegion, HStore and LoadIncrementalHFiles for branch-2 and HRegion and HStore for branch-1. --- * [HBASE-20494](https://issues.apache.org/jira/browse/HBASE-20494) | *Major* | **Upgrade com.yammer.metrics dependency** Updated metrics core from 3.2.1 to 3.2.6. --- * [HBASE-22358](https://issues.apache.org/jira/browse/HBASE-22358) | *Minor* | **Change rubocop configuration for method length** The rubocop definition for the maximum method length was set to 75. --- * [HBASE-22379](https://issues.apache.org/jira/browse/HBASE-22379) | *Minor* | **Fix Markdown for "Voting on Release Candidates" in book** Fixes the formatting of the "Voting on Release Candidates" to actually show the quote and code formatting of the RAT check. --- * [HBASE-20851](https://issues.apache.org/jira/browse/HBASE-20851) | *Minor* | **Change rubocop config for max line length of 100** The rubocop configuration in the hbase-shell module now allows a line length with 100 characters, instead of 80 as before. For everything before 2.1.5 this change introduces rubocop itself. --- * [HBASE-22301](https://issues.apache.org/jira/browse/HBASE-22301) | *Minor* | **Consider rolling the WAL if the HDFS write pipeline is slow** This change adds new conditions for rolling the WAL for when syncs on the HDFS writer pipeline are perceived to be slow. As before the configuration parameter hbase.regionserver.wal.slowsync.ms sets the slow sync warning threshold. If we encounter hbase.regionserver.wal.slowsync.roll.threshold number of slow syncs (default 100) within the interval defined by hbase.regionserver.wal.slowsync.roll.interval.ms (default 1 minute), we will request a WAL roll. Or, if the time for any sync exceeds the threshold set by hbase.regionserver.wal.roll.on.sync.ms (default 10 seconds) we will request a WAL roll immediately. Operators can monitor how often these new thresholds result in a WAL roll by looking at newly added metrics to the WAL related metric group: \* slowSyncRollRequest - How many times a roll was requested due to sync too slow on the write pipeline. Additionally, as a part of this change there are also additional metrics for existing reasons for a WAL roll: \* errorRollRequest - How many times a roll was requested due to I/O or other errors. \* sizeRollRequest - How many times a roll was requested due to file size roll threshold. --- * [HBASE-21883](https://issues.apache.org/jira/browse/HBASE-21883) | *Minor* | **Enhancements to Major Compaction tool** MajorCompactorTTL Tool allows to compact all regions in a table that have been TTLed out. This saves space on DFS and is useful for tables which are similar to time series data. This is typically scheduled to run frequently (say via cron) to cleanup old data on an ongoing basis. RSGroupMajorCompactionTTL tool is similar to MajorCompactorTTL but runs at a region server group level. If multiple tables in an rsgroup are similar to time-series data, then it runs a single command to clean them up. As more tables are added/removed from rsgroup, it's easy to have a single command to take care of all of them. --- * [HBASE-22054](https://issues.apache.org/jira/browse/HBASE-22054) | *Minor* | **Space Quota: Compaction is not working for super user in case of NO\_WRITES\_COMPACTIONS** This change allows the system and superusers to initiate compactions, even when a space quota violation policy disallows compactions from happening. The original intent behind disallowing of compactions was to prevent end-user compactions from creating undue I/O load, not disallowing \*any\* compaction in the system. --- * [HBASE-22083](https://issues.apache.org/jira/browse/HBASE-22083) | *Minor* | **move eclipse specific configs into a profile** Maven project integration for Eclipse has been isolated into a maven profile to ensure it only is active when in an Eclipse project. Things should continue to behave the same for Eclipse users. If something should go wrong folks should manually activate the `eclipse-specific` profile. --- * [HBASE-22307](https://issues.apache.org/jira/browse/HBASE-22307) | *Major* | **Deprecated Preemptive Fail Fast** Deprecated Preemptive Fail Fast related constants in HConstants, the support of this feature will be removed in 3.0.0 so use these constants will have no effect for 3.0.0+ releases. And the constants will be kept till 4.0.0. Users can use 'hbase.client.perserver.requests.threshold' to control the number of concurrent requests to the same region server. Please see the release note of HBASE-16388 for more details. --- * [HBASE-22292](https://issues.apache.org/jira/browse/HBASE-22292) | *Blocker* | **PreemptiveFastFailInterceptor clean repeatedFailuresMap issue** Adds new configuration hbase.client.failure.map.cleanup.interval which defaults to ten minutes. --- * [HBASE-19222](https://issues.apache.org/jira/browse/HBASE-19222) | *Major* | **update jruby to 9.1.17.0** The default version of JRuby shipped with HBase has been updated to the JRuby 9.1.17.0 release. For details on changes see [the release notes for JRuby 9.1.17.0](https://www.jruby.org/2018/04/23/jruby-9-1-17-0) --- * [HBASE-22279](https://issues.apache.org/jira/browse/HBASE-22279) | *Major* | **Add a getRegionLocator method in Table/AsyncTable interface** Add below method in Table interface: RegionLocator getRegionLocator() throws IOException; Add below methods in AsyncTable interface: AsyncTableRegionLocator getRegionLocator(); CompletableFuture\ getDescriptor(); --- * [HBASE-15560](https://issues.apache.org/jira/browse/HBASE-15560) | *Major* | **TinyLFU-based BlockCache** LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and recency of the working set. It achieves concurrency by using an O(n) background thread to prioritize the entries and evict. Accessing an entry is O(1) by a hash table lookup, recording its logical access time, and setting a frequency flag. A write is performed in O(1) time by updating the hash table and triggering an async eviction thread. This provides ideal concurrency and minimizes the latencies by penalizing the thread instead of the caller. However the policy does not age the frequencies and may not be resilient to various workload patterns. This change introduces a new L1 policy, TinyLfuBlockCache, which records the frequency in a counting sketch, ages periodically by halving the counters, and orders entries by SLRU. An entry is discarded by comparing the frequency of the new arrival to the SLRU's victim, and keeping the one with the highest frequency. This allows the operations to be performed in O(1) time and, though the use of a compact sketch, a much larger history is retained beyond the current working set. In a variety of real world traces the policy had near optimal hit rates. New configuration variable hfile.block.cache.policy sets the eviction policy for the L1 block cache. The default is "LRU" (LruBlockCache). Set to "TinyLFU" to use TinyLfuBlockCache instead. --- * [HBASE-22178](https://issues.apache.org/jira/browse/HBASE-22178) | *Major* | **Introduce a createTableAsync with TableDescriptor method in Admin** Introduced Future\ createTableAsync(TableDescriptor); --- * [HBASE-22108](https://issues.apache.org/jira/browse/HBASE-22108) | *Major* | **Avoid passing null in Admin methods** Introduced these methods: void move(byte[]); void move(byte[], ServerName); Future\ splitRegionAsync(byte[]); These methods are deprecated: void move(byte[], byte[]) --- * [HBASE-22152](https://issues.apache.org/jira/browse/HBASE-22152) | *Major* | **Create a jenkins file for yetus to processing GitHub PR** Add a new jenkins file for running pre commit check for GitHub PR. --- * [HBASE-22007](https://issues.apache.org/jira/browse/HBASE-22007) | *Major* | **Add restoreSnapshot and cloneSnapshot with acl methods in AsyncAdmin** Add cloneSnapshot/restoreSnapshot with acl methods in AsyncAdmin. --- * [HBASE-22123](https://issues.apache.org/jira/browse/HBASE-22123) | *Minor* | **REST gateway reports Insufficient permissions exceptions as 404 Not Found** When insufficient permissions, you now get: HTTP/1.1 403 Forbidden on the HTTP side, and in the message Forbidden org.apache.hadoop.hbase.security.AccessDeniedException: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions for user ‘myuser',action: get, tableName:mytable, family:cf. at org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor.authorizeAccess(RangerAuthorizationCoprocessor.java:547) and the rest of the ADE stack --- * [HBASE-22100](https://issues.apache.org/jira/browse/HBASE-22100) | *Minor* | **False positive for error prone warnings in pre commit job** Now we will sort the javac WARNING/ERROR before generating diff in pre-commit so we can get a stable output for the error prone. The downside is that we just sort the output lexicographically so the line number will also be sorted lexicographically, which is a bit strange to human. --- * [HBASE-22057](https://issues.apache.org/jira/browse/HBASE-22057) | *Major* | **Impose upper-bound on size of ZK ops sent in a single multi()** Exposes a new configuration property "zookeeper.multi.max.size" which dictates the maximum size of deletes that HBase will make to ZooKeeper in a single RPC. This property defaults to 1MB, which should fall beneath the default ZooKeeper limit of 2MB, controlled by "jute.maxbuffer". --- * [HBASE-22052](https://issues.apache.org/jira/browse/HBASE-22052) | *Major* | **pom cleaning; filter out jersey-core in hadoop2 to match hadoop3 and remove redunant version specifications** Fixed awkward dependency issue that prevented site building. #### note specific to HBase 2.1.4 HBase 2.1.4 shipped with an early version of this fix that incorrectly altered the libraries included in our binary assembly for using Apache Hadoop 2.7 (the current build default Hadoop version for 2.1.z). For folks running out of the box against a Hadoop 2.7 cluster (or folks who skip the installation step of [replacing the bundled Hadoop libraries](http://hbase.apache.org/book.html#hadoop)) this will result in a failure at Region Server startup due to a missing class definition. e.g.: ``` 2019-03-27 09:02:05,779 ERROR [main] regionserver.HRegionServer: Failed construction RegionServer java.lang.NoClassDefFoundError: org/apache/htrace/SamplerBuilder at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:644) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:628) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2701) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2683) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:372) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:171) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:356) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.hadoop.hbase.util.CommonFSUtils.getRootDir(CommonFSUtils.java:362) at org.apache.hadoop.hbase.util.CommonFSUtils.isValidWALRootDir(CommonFSUtils.java:411) at org.apache.hadoop.hbase.util.CommonFSUtils.getWALRootDir(CommonFSUtils.java:387) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeFileSystem(HRegionServer.java:704) at org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:613) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:3029) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:63) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149) at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:3047) Caused by: java.lang.ClassNotFoundException: org.apache.htrace.SamplerBuilder at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 26 more ``` Workaround via any _one_ of the following: * If you are running against a Hadoop cluster that is 2.8+, ensure you replace the Hadoop libaries in the default binary assembly with those for your version. * If you are running against a Hadoop cluster that is 2.8+, build the binary assembly from the source release while specifying your Hadoop version. * If you are running against a Hadoop cluster that is a supported 2.7 release, ensure the `hadoop` executable is in the `PATH` seen at Region Server startup and that you are not using the `HBASE_DISABLE_HADOOP_CLASSPATH_LOOKUP` bypass. * For any supported Hadoop version, manually make the Apache HTrace artifact `htrace-core-3.1.0-incubating.jar` available to all Region Servers via the HBASE_CLASSPATH environment variable. * For any supported Hadoop version, manually make the Apache HTrace artifact `htrace-core-3.1.0-incubating.jar` available to all Region Servers by copying it into the directory `${HBASE_HOME}/lib/client-facing-thirdparty/`. --- * [HBASE-22065](https://issues.apache.org/jira/browse/HBASE-22065) | *Major* | **Add listTableDescriptors(List\) method in AsyncAdmin** Add a listTableDescriptors(List\) method in the AsyncAdmin interface, to align with the Admin interface. --- * [HBASE-22063](https://issues.apache.org/jira/browse/HBASE-22063) | *Major* | **Deprecated Admin.deleteSnapshot(byte[])** Deprecate Admin.deleteSnapshot(byte[]), please use the String version instead. --- * [HBASE-22040](https://issues.apache.org/jira/browse/HBASE-22040) | *Major* | **Add mergeRegionsAsync with a List of region names method in AsyncAdmin** Add a mergeRegionsAsync(byte[][], boolean) method in the AsyncAdmin interface. Instead of using assert, now we will throw IllegalArgumentException when you want to merge less than 2 regions at client side. And also, at master side, instead of using assert, now we will throw DoNotRetryIOException if you want merge more than 2 regions, since we only support merging two regions at once for now. --- * [HBASE-22039](https://issues.apache.org/jira/browse/HBASE-22039) | *Major* | **Should add the synchronous parameter for the XXXSwitch method in AsyncAdmin** Add drainXXX parameter for balancerSwitch/splitSwitch/mergeSwitch methods in the AsyncAdmin interface, which has the same meaning with the synchronous parameter for these methods in the Admin interface. --- * [HBASE-22044](https://issues.apache.org/jira/browse/HBASE-22044) | *Major* | **ByteBufferUtils should not be IA.Public API** As of HBase 3.0, the ByteBufferUtils class is now marked as a Private API for internal project use only. Downstream users are advised that it no longer has any compatibility promises across releases. As of earlier HBase release lines the class is now marked as deprecated to call attention to this planned transition. --- * [HBASE-21810](https://issues.apache.org/jira/browse/HBASE-21810) | *Major* | **bulkload support set hfile compression on client** bulkload (HFileOutputFormat2) support config the compression on client ,you can set the job configuration "hbase.mapreduce.hfileoutputformat.compression" override the auto-detection of the target table's compression --- * [HBASE-22001](https://issues.apache.org/jira/browse/HBASE-22001) | *Major* | **Polish the Admin interface** Add a cloneSnapshotAsync method with restoreAcl parameter. Deprecated restoreSnapshotAsync method as it just ignores the failsafe configuration. Make snapshotAsync method returns a Future\. Deprecated the snapshot related methods which take a 'byte[]' as the snapshot name. Use default methods to reduce the code base for implementation classes. --- * [HBASE-22000](https://issues.apache.org/jira/browse/HBASE-22000) | *Major* | **Deprecated isTableAvailable with splitKeys** Deprecated AsyncTable.isTableAvailable(TableName, byte[][]). --- * [HBASE-21871](https://issues.apache.org/jira/browse/HBASE-21871) | *Major* | **Support to specify a peer table name in VerifyReplication tool** After HBASE-21871, we can specify a peer table name with --peerTableName in VerifyReplication tool like the following: hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication --peerTableName=peerTable 5 TestTable In addition, we can compare any 2 tables in any remote clusters with specifying both peerId and --peerTableName. For example: hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication --peerTableName=peerTable zk1,zk2,zk3:2181/hbase TestTable --- * [HBASE-15728](https://issues.apache.org/jira/browse/HBASE-15728) | *Major* | **Add remaining per-table region / store / flush / compaction related metrics** Adds below flush, split, and compaction metrics + // split related metrics + private MutableFastCounter splitRequest; + private MutableFastCounter splitSuccess; + private MetricHistogram splitTimeHisto; + + // flush related metrics + private MetricHistogram flushTimeHisto; + private MetricHistogram flushMemstoreSizeHisto; + private MetricHistogram flushOutputSizeHisto; + private MutableFastCounter flushedMemstoreBytes; + private MutableFastCounter flushedOutputBytes; + + // compaction related metrics + private MetricHistogram compactionTimeHisto; + private MetricHistogram compactionInputFileCountHisto; + private MetricHistogram compactionInputSizeHisto; + private MetricHistogram compactionOutputFileCountHisto; + private MetricHistogram compactionOutputSizeHisto; + private MutableFastCounter compactedInputBytes; + private MutableFastCounter compactedOutputBytes; + + private MetricHistogram majorCompactionTimeHisto; + private MetricHistogram majorCompactionInputFileCountHisto; + private MetricHistogram majorCompactionInputSizeHisto; + private MetricHistogram majorCompactionOutputFileCountHisto; + private MetricHistogram majorCompactionOutputSizeHisto; + private MutableFastCounter majorCompactedInputBytes; + private MutableFastCounter majorCompactedOutputBytes; --- * [HBASE-21481](https://issues.apache.org/jira/browse/HBASE-21481) | *Major* | **[acl] Superuser's permissions should not be granted or revoked by any non-su global admin** HBASE-21481 improves the quality of access control, by strengthening the protection of super users's privileges. --- * [HBASE-21082](https://issues.apache.org/jira/browse/HBASE-21082) | *Critical* | **Reimplement assign/unassign related procedure metrics** Now we have four types of RIT procedure metrics, assign, unassign, move, reopen. The meaning of assign/unassign is changed, as we will not increase the unassign metric and then the assign metric when moving a region. Also introduced two new procedure metrics, open and close, which are used to track the open/close region calls to region server. We may send open/close multiple times to finish a RIT since we may retry multiple times. --- * [HBASE-20724](https://issues.apache.org/jira/browse/HBASE-20724) | *Critical* | **Sometimes some compacted storefiles are still opened after region failover** Problem: This is an old problem since HBASE-2231. The compaction event marker was only writed to WAL. But after flush, the WAL may be archived, which means an useful compaction event marker be deleted, too. So the compacted store files cannot be archived when region open and replay WAL. Solution: After this jira, the compaction event tracker will be writed to HFile. When region open and load store files, read the compaction evnet tracker from HFile and archive the compacted store files which still exist. --- * [HBASE-21820](https://issues.apache.org/jira/browse/HBASE-21820) | *Major* | **Implement CLUSTER quota scope** HBase contains two quota scopes: MACHINE and CLUSTER. Before this patch, set quota operations did not expose scope option to client api and use MACHINE as default, CLUSTER scope can not be set and used. Shell commands are as follows: set\_quota, TYPE =\> THROTTLE, TABLE =\> 't1', LIMIT =\> '10req/sec' This issue implements CLUSTER scope in a simple way: For user, namespace, user over namespace quota, use [ClusterLimit / RSNum] as machine limit. For table and user over table quota, use [ClusterLimit / TotalTableRegionNum \* MachineTableRegionNum] as machine limit. After this patch, user can set CLUSTER scope quota, but MACHINE is still default if user ignore scope. Shell commands are as follows: set\_quota, TYPE =\> THROTTLE, TABLE =\> 't1', LIMIT =\> '10req/sec' set\_quota, TYPE =\> THROTTLE, TABLE =\> 't1', LIMIT =\> '10req/sec', SCOPE =\> MACHINE set\_quota, TYPE =\> THROTTLE, TABLE =\> 't1', LIMIT =\> '10req/sec', SCOPE =\> CLUSTER --- * [HBASE-21057](https://issues.apache.org/jira/browse/HBASE-21057) | *Minor* | **upgrade to latest spotbugs** Change spotbugs version to 3.1.11. --- * [HBASE-21505](https://issues.apache.org/jira/browse/HBASE-21505) | *Major* | **Several inconsistencies on information reported for Replication Sources by hbase shell status 'replication' command.** This modifies "status 'replication'" output, fixing inconsistencies on the reporting times and ages of last shipped edits, as well as wrong calculation of replication lags. It also introduces additional info for each recovery queue, which was not accounted by this command before. The new output for "status 'replication'" command is explained in details below: a) Source started, target stopped, no edits arrived on source yet: ... SOURCE: PeerID=1 Normal Queue: 1 No Ops shipped since last restart, SizeOfLogQueue=1, No edits for this source since it started, Replication Lag=0 ... b) Source started, target stopped, add edit on source: ... Normal Queue: 1 No Ops shipped since last restart, SizeOfLogQueue=1, TimeStampOfLastArrivedInSource=Wed Nov 21 07:21:00 GMT 2018, Replication Lag=2459 ... c) Source started, target stopped, edit added on source, restart source: ... SOURCE: PeerID=1 Normal Queue: 1 No Ops shipped since last restart, SizeOfLogQueue=1, No edits for this source since it started, Replication Lag=0 Recovered Queue: 1-hbase01.home,16020,1542784524057 No Ops shipped since last restart, SizeOfLogQueue=1, TimeStampOfLastArrivedInSource=Wed Nov 21 07:23:00 GMT 2018, Replication Lag=201495 ... d) Source started, target stopped, add edit on source, restart source, add another edit on source: ... SOURCE: PeerID=1 Normal Queue: 1 No Ops shipped since last restart, SizeOfLogQueue=1, TimeStampOfLastArrivedInSource=Wed Nov 21 07:02:28 GMT 2018, Replication Lag=6349 Recovered Queue: 1-hbase01.home,16020,1542782758742 No Ops shipped since last restart, SizeOfLogQueue=0, TimeStampOfLastArrivedInSource=Wed Nov 21 06:53:05 GMT 2018, Replication Lag=569394 ... e) Source started, target stopped, add edit on source, restart source, add another edit on source, start target: ... SOURCE: PeerID=1 Normal Queue: 1 AgeOfLastShippedOp=30000, TimeStampOfLastShippedOp=Wed Nov 21 07:07:58 GMT 2018, SizeOfLogQueue=1, TimeStampOfLastArrivedInSource=Wed Nov 21 07:02:28 GMT 2018, Replication Lag=0 ... f) Source started, target stopped, add edit on source, restart source, restart target: ... SOURCE: PeerID=1 Normal Queue: 1 No Ops shipped since last restart, SizeOfLogQueue=1, No edits for this source since it started, Replication Lag=0 ... --- * [HBASE-21922](https://issues.apache.org/jira/browse/HBASE-21922) | *Major* | **BloomContext#sanityCheck may failed when use ROWPREFIX\_DELIMITED bloom filter** Remove bloom filter type ROWPREFIX\_DELIMITED. May add it back when find a better solution. --- * [HBASE-21783](https://issues.apache.org/jira/browse/HBASE-21783) | *Major* | **Support exceed user/table/ns throttle quota if region server has available quota** Support enable or disable exceed throttle quota. Exceed throttle quota means, user can over consume user/namespace/table quota if region server has additional available quota because other users don't consume at the same time. Use the following shell commands to enable/disable exceed throttle quota: enable\_exceed\_throttle\_quota disable\_exceed\_throttle\_quota There are two limits when enable exceed throttle quota: 1. Must set at least one read and one write region server throttle quota; 2. All region server throttle quotas must be in seconds time unit. Because once previous requests exceed their quota and consume region server quota, quota in other time units may be refilled in a long time, this may affect later requests. --- * [HBASE-20587](https://issues.apache.org/jira/browse/HBASE-20587) | *Major* | **Replace Jackson with shaded thirdparty gson** Remove jackson dependencies from most hbase modules except hbase-rest, use shaded gson instead. The output json will be a bit different since jackson can use getter/setter, but gson will always use the fields. --- * [HBASE-21928](https://issues.apache.org/jira/browse/HBASE-21928) | *Major* | **Deprecated HConstants.META\_QOS** Mark HConstants.META\_QOS as deprecated. It is for internal use only, which is the highest priority. You should not try to set a priority greater than or equal to this value, although it is no harm but also useless. --- * [HBASE-17942](https://issues.apache.org/jira/browse/HBASE-17942) | *Major* | **Disable region splits and merges per table** This patch adds the ability to disable split and/or merge for a table (By default, split and merge are enabled for a table). --- * [HBASE-21636](https://issues.apache.org/jira/browse/HBASE-21636) | *Major* | **Enhance the shell scan command to support missing scanner specifications like ReadType, IsolationLevel etc.** Allows shell to set Scan options previously not exposed. See additions as part of the scan help by typing following hbase shell: hbase\> help 'scan' --- * [HBASE-21201](https://issues.apache.org/jira/browse/HBASE-21201) | *Major* | **Support to run VerifyReplication MR tool without peerid** We can specify peerQuorumAddress instead of peerId in VerifyReplication tool. So it no longer requires peerId to be setup when using this tool. For example: hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication zk1,zk2,zk3:2181/hbase testTable --- * [HBASE-21838](https://issues.apache.org/jira/browse/HBASE-21838) | *Major* | **Create a special ReplicationEndpoint just for verifying the WAL entries are fine** Introduce a VerifyWALEntriesReplicationEndpoint which replicates nothing but only verifies if all the cells are valid. It can be used to capture bugs for writing WAL, as most times we will not read the WALs again after writing it if there are no region server crashes. --- * [HBASE-21764](https://issues.apache.org/jira/browse/HBASE-21764) | *Major* | **Size of in-memory compaction thread pool should be configurable** Introduced an new config key in this issue: hbase.regionserver.inmemory.compaction.pool.size. the default value would be 10. you can configure this to set the pool size of in-memory compaction pool. Note that all memstores in one region server will share the same pool, so if you have many regions in one region server, you need to set this larger to compact faster for better read performance. --- * [HBASE-21684](https://issues.apache.org/jira/browse/HBASE-21684) | *Major* | **Throw DNRIOE when connection or rpc client is closed** Make StoppedRpcClientException extend DoNotRetryIOException. --- * [HBASE-21739](https://issues.apache.org/jira/browse/HBASE-21739) | *Major* | **Move grant/revoke from regionserver to master** To implement user permission control in Precedure V2, move grant and revoke method from AccessController to master firstly. Mark AccessController#grant and AccessController#revoke as deprecated and please use Admin#grant and Admin#revoke instead. --- * [HBASE-21791](https://issues.apache.org/jira/browse/HBASE-21791) | *Blocker* | **Upgrade thrift dependency to 0.12.0** IMPORTANT: Due to security issues, all users who use hbase thrift should avoid using releases which do not have this fix. The effect releases are: 2.1.x: 2.1.2 and below 2.0.x: 2.0.4 and below 1.x: 1.4.x and below If you are using the effect releases above, please consider upgrading to a newer release ASAP. --- * [HBASE-20894](https://issues.apache.org/jira/browse/HBASE-20894) | *Major* | **Move BucketCache from java serialization to protobuf** For users who have configured hbase.bucketcache.ioengine with either the file:, files:, or mmap: prefix, and configured it to be persistent via the hbase.bucketcache.persistent.path property, the serialization format of the bucket cache has changed between versions. The old state will not be read during startup, and there is currently no migration path. The impact is expected to be minimal, however, since the cache will rebuild over time as access patterns dictate. # HBASE 2.3.0 Release Notes These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements. --- * [HBASE-24545](https://issues.apache.org/jira/browse/HBASE-24545) | *Major* | **Add backoff to SCP check on WAL split completion** Adds backoff in ServerCrashProcedure wait on WAL split to complete if large backlog of files to split (Its possible to avoid SCP blocking, waiting on WALs to split if you use procedure-based splitting -- set 'hbase.split.wal.zk.coordinated' to false to enable procedure based wal splitting.) --- * [HBASE-24524](https://issues.apache.org/jira/browse/HBASE-24524) | *Minor* | **SyncTable logging improvements** Notice this has changed log level for mismatching row keys, originally those were being logged at INFO level, now it's logged at DEBUG level. This is consistent with the logging of mismatching cells. Also, for missing row keys, it now logs row key values in human readable format, making it more meaningful for operators troubleshooting mismatches. --- * [HBASE-24359](https://issues.apache.org/jira/browse/HBASE-24359) | *Major* | **Optionally ignore edits for deleted CFs for replication.** Introduce a new config hbase.replication.drop.on.deleted.columnfamily, default is false. When config to true, the replication will drop the edits for columnfamily that has been deleted from the replication source and target. --- * [HBASE-24418](https://issues.apache.org/jira/browse/HBASE-24418) | *Major* | **Consolidate Normalizer implementations** This change extends the Normalizer with a handful of new configurations. The configuration points supported are: * `hbase.normalizer.split.enabled` Whether to split a region as part of normalization. Default: `true`. * `hbase.normalizer.merge.enabled` Whether to merge a region as part of normalization. Default `true`. * `hbase.normalizer.min.region.count` The minimum number of regions in a table to consider it for merge normalization. Default: 3. * `hbase.normalizer.merge.min_region_age.days` The minimum age for a region to be considered for a merge, in days. Default: 3. * `hbase.normalizer.merge.min_region_size.mb` The minimum size for a region to be considered for a merge, in whole MBs. Default: 1. --- * [HBASE-24309](https://issues.apache.org/jira/browse/HBASE-24309) | *Major* | **Avoid introducing log4j and slf4j-log4j dependencies for modules other than hbase-assembly** Add a hbase-logging module, put the log4j related code in this module only so other modules do not need to depend on log4j at compile scope. See the comments of Log4jUtils and InternalLog4jUtils for more details. Add a log4j.properties to the test jar of hbase-logging module, so for other sub modules we just need to depend on the test jar of hbase-logging module at test scope to output the log to console, without placing a log4j.properties in the test resources as they all (almost) have the same content. And this test module will not be included in the assembly tarball so it will not mess up the binary distribution. Ban direct commons-logging dependency, and ban commons-logging and log4j imports in non-test code, to avoid mess up the downstream users logging framework. In hbase-logging module we do need to use log4j classes and the trick is to use full class name. Add jcl-over-slf4j and jul-to-slf4j dependencies, as some of our dependencies use jcl or jul as logging framework, we should also redirect their log message to slf4j. --- * [HBASE-21406](https://issues.apache.org/jira/browse/HBASE-21406) | *Minor* | **"status 'replication'" should not show SINK if the cluster does not act as sink** Added new metric to differentiate sink startup time from last OP applied time. Original behaviour was to always set startup time to TimestampsOfLastAppliedOp, and always show it on "status 'replication'" command, regardless if the sink ever applied any OP. This was confusing, specially for scenarios where cluster was just acting as source, the output could lead to wrong interpretations about sink not applying edits or replication being stuck. With the new metric, we now compare the two metrics values, assuming that if both are the same, there's never been any OP shipped to the given sink, so output would reflect it more clearly, to something as for example: SINK: TimeStampStarted=Thu Dec 06 23:59:47 GMT 2018, Waiting for OPs... --- * [HBASE-24132](https://issues.apache.org/jira/browse/HBASE-24132) | *Major* | **Upgrade to Apache ZooKeeper 3.5.7** HBase ships ZooKeeper 3.5.x. Was the EOL'd 3.4.x. 3.5.x client can talk to 3.4.x ensemble. The ZooKeeper project has built a [FAQ](https://cwiki.apache.org/confluence/display/ZOOKEEPER/Upgrade+FAQ) that documents known issues and work-arounds when upgrading existing deployments. --- * [HBASE-22287](https://issues.apache.org/jira/browse/HBASE-22287) | *Major* | **inifinite retries on failed server in RSProcedureDispatcher** Add backoff. Avoid retrying every 100ms. --- * [HBASE-24425](https://issues.apache.org/jira/browse/HBASE-24425) | *Major* | **Run hbck\_chore\_run and catalogjanitor\_run on draw of 'HBCK Report' page** Runs 'catalogjanitor\_run' and 'hbck\_chore\_run' inline with the loading of the 'HBCK Report' page. Pass '?cache=true' to skip inline invocation of 'catalogjanitor\_run' and 'hbck\_chore\_run' drawing the page. --- * [HBASE-24408](https://issues.apache.org/jira/browse/HBASE-24408) | *Blocker* | **Introduce a general 'local region' to store data on master** Introduced a general 'local region' at master side to store the procedure data, etc. The hfile of this region will be stored on the root fs while the wal will be stored on the wal fs. This issue supercedes part of the code for HBASE-23326, as now we store the data in 'MasterData' directory instead of 'MasterProcs'. The old hfiles will be moved to the global hfile archived directory with the suffix $-masterlocalhfile-$. The wal files will be moved to the global old wal directory with the suffix $masterlocalwal$. The TimeToLiveMasterLocalStoreHFileCleaner and TimeToLiveMasterLocalStoreWALCleaner are configured by default for cleaning the old hfiles and wal files, and the default TTLs are both 7 days. --- * [HBASE-24115](https://issues.apache.org/jira/browse/HBASE-24115) | *Major* | **Relocate test-only REST "client" from src/ to test/ and mark Private** Relocate test-only REST RemoteHTable and RemoteAdmin from src/ to test/. And mark them as InterfaceAudience.Private. --- * [HBASE-23938](https://issues.apache.org/jira/browse/HBASE-23938) | *Major* | **Replicate slow/large RPC calls to HDFS** Config key: hbase.regionserver.slowlog.systable.enabled Default value: false This config can be enabled if hbase.regionserver.slowlog.buffer.enabled is already enabled. While hbase.regionserver.slowlog.buffer.enabled ensures that any slow/large RPC logs with complete details are written to ring buffer available at each RegionServer, hbase.regionserver.slowlog.systable.enabled would ensure that all such logs are also persisted in new system table hbase:slowlog. Operator can scan hbase:slowlog with filters to retrieve specific attribute matching records and this table would be useful to capture historical performance of slowness of RPC calls with detailed analysis. hbase:slowlog consists of single ColumnFamily info. info consists of multiple qualifiers similar to the attributes available to query as part of Admin API: get\_slowlog\_responses. One example of a row from hbase:slowlog scan result (Attached a sample screenshot in the Jira) : \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:call\_details, timestamp=2020-05-16T14:59:58.764Z, value=Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest) \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:client\_address, timestamp=2020-05-16T14:59:58.764Z, value=172.20.10.2:57348 \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:method\_name, timestamp=2020-05-16T14:59:58.764Z, value=Scan \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:param, timestamp=2020-05-16T14:59:58.764Z, value=region { type: REGION\_NAME value: "cluster\_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf." } scan { a ttribute { name: "\_isolationlevel\_" value: "\\x5C000" } start\_row: "cccccccc" time\_range { from: 0 to: 9223372036854775807 } max\_versions: 1 cache\_blocks: true max\_result\_size: 2 097152 caching: 2147483647 include\_stop\_row: false } number\_of\_rows: 2147483647 close\_scanner: false client\_handles\_partials: true client\_handles\_heartbeats: true track\_scan\_met rics: false \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:processing\_time, timestamp=2020-05-16T14:59:58.764Z, value=24 \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:queue\_time, timestamp=2020-05-16T14:59:58.764Z, value=0 \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:region\_name, timestamp=2020-05-16T14:59:58.764Z, value=cluster\_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf. \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:response\_size, timestamp=2020-05-16T14:59:58.764Z, value=211227 \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:server\_class, timestamp=2020-05-16T14:59:58.764Z, value=HRegionServer \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:start\_time, timestamp=2020-05-16T14:59:58.764Z, value=1589640743932 \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:type, timestamp=2020-05-16T14:59:58.764Z, value=ALL \\x024\\xC1\\x06X\\x81\\xF6\\xEC column=info:username, timestamp=2020-05-16T14:59:58.764Z, value=vjasani --- * [HBASE-24271](https://issues.apache.org/jira/browse/HBASE-24271) | *Major* | **Set values in \`conf/hbase-site.xml\` that enable running on \`LocalFileSystem\` out of the box** HBASE-24271 makes changes the the default `conf/hbase-site.xml` such that `bin/hbase` will run directly out of the binary tarball or a compiled source tree without any configuration modifications vs. Hadoop 2.8+. This changes our long-standing history of shipping no configured values in `conf/hbase-site.xml`, so existing processes that assume this file is empty of configuration properties may require attention. --- * [HBASE-24310](https://issues.apache.org/jira/browse/HBASE-24310) | *Major* | **Use Slf4jRequestLog for hbase-http** Use Slf4jRequestLog instead of the log4j HttpRequestLogAppender in HttpServer. The request log is disabled by default in conf/log4j.properties by the following lines: # Disable request log by default, you can enable this by changing the appender log4j.category.http.requests=INFO,NullAppender log4j.additivity.http.requests=false Change the 'NullAppender' to what ever you want if you want to enable request log. Notice that, the logger name for master status http server is 'http.requests.master', and for region server it is 'http.requests.regionserver' --- * [HBASE-24335](https://issues.apache.org/jira/browse/HBASE-24335) | *Major* | **Support deleteall with ts but without column in shell mode** Use a empty string to represent no column specified for deleteall in shell mode. useage: deleteall 'test','r1','',12345 deleteall 'test', {ROWPREFIXFILTER =\> 'prefix'}, '', 12345 --- * [HBASE-24304](https://issues.apache.org/jira/browse/HBASE-24304) | *Major* | **Separate a hbase-asyncfs module** Added a new hbase-asyncfs module to hold the asynchronous dfs output stream implementation for implementing WAL. --- * [HBASE-22710](https://issues.apache.org/jira/browse/HBASE-22710) | *Major* | **Wrong result in one case of scan that use raw and versions and filter together** Make the logic of the versions chosen more reasonable for raw scan, to avoid lose result when using filter. --- * [HBASE-24285](https://issues.apache.org/jira/browse/HBASE-24285) | *Major* | **Move to hbase-thirdparty-3.3.0** Moved to hbase-thirdparty 3.3.0. --- * [HBASE-24252](https://issues.apache.org/jira/browse/HBASE-24252) | *Major* | **Implement proxyuser/doAs mechanism for hbase-http** This feature enables the HBase Web UI's to accept a 'proxyuser' via the HTTP Request's query string. When the parameter \`hbase.security.authentication.spnego.kerberos.proxyuser.enable\` is set to \`true\` in hbase-site.xml (default is \`false\`), the HBase UI will attempt to impersonate the user specified by the query parameter "doAs". This query parameter is checked case-insensitively. When this option is not provided, the user who executed the request is the "real" user and there is no ability to execute impersonation against the WebUI. For example, if the user "bob" with Kerberos credentials executes a request against the WebUI with this feature enabled and a query string which includes \`doAs=alice\`, the HBase UI will treat this request as executed as \`alice\`, not \`bob\`. The standard Hadoop proxyuser configuration properties to limit users who may impersonate others apply to this change (e.g. to enable \`bob\` to impersonate \`alice\`). See the Hadoop documentation for more information on how to configure these proxyuser rules. --- * [HBASE-24143](https://issues.apache.org/jira/browse/HBASE-24143) | *Major* | **[JDK11] Switch default garbage collector from CMS** `bin/hbase` will now dynamically select a Garbage Collector implementation based on the detected JVM version. JDKs 8,9,10 use `-XX:+UseConcMarkSweepGC`, while JDK11+ use `-XX:+UseG1GC`. Notice a slight compatibility change. Previously, the garbage collector choice would always be appended to a user-provided value for `HBASE_OPTS`. As of this change, this setting will only be applied when `HBASE_OPTS` is unset. That means that operators who provide a value for this variable will now need to also specify the collector. This is especially important for those on JDK8, where the vm default GC is not the recommended ConcMarkSweep. --- * [HBASE-24024](https://issues.apache.org/jira/browse/HBASE-24024) | *Major* | **Optionally reject multi() requests with very high no of rows** New Config: hbase.rpc.rows.size.threshold.reject ----------------------------------------------------------------------- Default value: false Description: If value is true, RegionServer will abort batch requests of Put/Delete with number of rows in a batch operation exceeding threshold defined by value of config: hbase.rpc.rows.warning.threshold. --- * [HBASE-24139](https://issues.apache.org/jira/browse/HBASE-24139) | *Critical* | **Balancer should avoid leaving idle region servers** StochasticLoadBalancer functional improvement: StochasticLoadBalancer would rebalance the cluster if there are any idle RegionServers in the cluster (RegionServer having no region), while other RegionServers have at least 1 region available. --- * [HBASE-24196](https://issues.apache.org/jira/browse/HBASE-24196) | *Major* | **[Shell] Add rename rsgroup command in hbase shell** user or admin can now use hbase shell \> rename\_rsgroup 'oldname', 'newname' to rename rsgroup. --- * [HBASE-24218](https://issues.apache.org/jira/browse/HBASE-24218) | *Major* | **Add hadoop 3.2.x in hadoop check** Add hadoop-3.2.0 and hadoop-3.2.1 in hadoop check and when '--quick-hadoopcheck' we will only check hadoop-3.2.1. Notice that, for aligning the personality scripts across all the active branches, we will commit the patch to all active branches, but the hadoop-3.2.x support in hadoopcheck is only applied to branch-2.2+. --- * [HBASE-23829](https://issues.apache.org/jira/browse/HBASE-23829) | *Major* | **Get \`-PrunSmallTests\` passing on JDK11** \`-PrunSmallTests\` now pass on JDK11 when using \`-Phadoop.profile=3.0\`. --- * [HBASE-24185](https://issues.apache.org/jira/browse/HBASE-24185) | *Major* | **Junit tests do not behave well with System.exit or Runtime.halt or JVM exits in general.** Tests that fail because a process -- RegionServer or Master -- called System.exit, will now instead throw an exception. --- * [HBASE-24072](https://issues.apache.org/jira/browse/HBASE-24072) | *Major* | **Nightlies reporting OutOfMemoryError: unable to create new native thread** Hadoop hosts have had their ulimit -u raised from 10000 to 30000 (per user, by INFRA). The Docker build container has had its limit raised from 10000 to 12500. --- * [HBASE-24112](https://issues.apache.org/jira/browse/HBASE-24112) | *Major* | **[RSGroup] Support renaming rsgroup** Support RSGroup renaming in core codebase. New API Admin#renameRSGroup(String, String) is introduced in 3.0.0. --- * [HBASE-23994](https://issues.apache.org/jira/browse/HBASE-23994) | *Trivial* | ** Add WebUI to Canary** The Canary tool now offers a WebUI when run in `region` mode (the default mode). It is enabled by default, and by default, it binds to `0.0.0.0:16050`. This can be overridden by setting `hbase.canary.info.bindAddress` and `hbase.canary.info.port`. To disable entirely, set the port to `-1`. --- * [HBASE-23779](https://issues.apache.org/jira/browse/HBASE-23779) | *Major* | **Up the default fork count to make builds complete faster; make count relative to CPU count** Pass --threads=2 building on jenkins. It shortens nightly build times by about ~25%. It works by running module build/test in parallel when dependencies allow. Upping the forkcount beyond the pom default of 0.25C would have us broach our CPU budget on jenkins when two modules are running in parallel (2 modules at 0.25% of CPU each makes 0.5C and on jenkins, hadoop nodes run two jenkins executors per host). Higher forkcounts also seems to threaten build stability. For running tests locally, to go faster, up fork count. $ x="0.5C" ; mvn --threads=2 -Dsurefire.firstPartForkCount=$x -Dsurefire.secondPartForkCount=$x test -PrunAllTests You could up the x from 0.5C to 1.0C but YMMV (On overcommitted hardware, tests start bombing out pretty soon after startup). You could try upping thread count but on occasion are likely to overcommit hardware. --- * [HBASE-24126](https://issues.apache.org/jira/browse/HBASE-24126) | *Major* | **Up the container nproc uplimit from 10000 to 12500** Start docker with upped ulimit for nproc passing '--ulimit nproc=12500'. It was 10000, the default, but made it 12500. Then, set PROC\_LIMIT in hbase-personality so when yetus runs, it is w/ the new 12500 value. --- * [HBASE-24150](https://issues.apache.org/jira/browse/HBASE-24150) | *Major* | **Allow module tests run in parallel** Pass -T2 to mvn. Makes it so we do two modules-at-a-time dependencies willing. Helps speed build and testing. Doubles the resource usage when running modules in parallel. --- * [HBASE-24121](https://issues.apache.org/jira/browse/HBASE-24121) | *Major* | **[Authorization] ServiceAuthorizationManager isn't dynamically updatable. And it should be.** Master & RegionService now support refresh policy authorization defined in hbase-policy.xml without restarting service. To refresh policy, please execute hbase shell command: update\_config or update\_config\_all after policy file updated and synced on all nodes. --- * [HBASE-24099](https://issues.apache.org/jira/browse/HBASE-24099) | *Major* | **Use a fair ReentrantReadWriteLock for the region close lock** This change modifies the default acquisition policy for the region's close lock in order to prevent observed starvation of close requests. The new boolean configuration parameter 'hbase.regionserver.fair.region.close.lock' controls the lock acquisition policy: if true, the lock is created in fair mode (default); if false, the lock is created in nonfair mode (the old default). --- * [HBASE-23153](https://issues.apache.org/jira/browse/HBASE-23153) | *Major* | **PrimaryRegionCountSkewCostFunction SLB function should implement CostFunction#isNeeded** The `PrimaryRegionCountSkewCostFunction` for the `StochasticLoadBalancer` is only needed when the read replicas feature is enabled. With this change, that function now properly indicates that it is not needed when the read replica feature is off. If this improvement is not available, operators with clusters that are not using the read replica feature should manually disable it by setting `hbase.master.balancer.stochastic.primaryRegionCountCost` to `0.0` in hbase-site.xml for all HBase Masters. --- * [HBASE-24055](https://issues.apache.org/jira/browse/HBASE-24055) | *Major* | **Make AsyncFSWAL can run on EC cluster** Now AsyncFSWAL can also be used against the directory which has EC enabled. Need to make sure you also make use of the hadoop 3.x client as the option is only available in hadoop 3.x. --- * [HBASE-24113](https://issues.apache.org/jira/browse/HBASE-24113) | *Major* | **Upgrade the maven we use from 3.5.4 to 3.6.3 in nightlies** Branches-2.3+ use maven 3.5.3 building. Older branches use 3.5.4 still. --- * [HBASE-24122](https://issues.apache.org/jira/browse/HBASE-24122) | *Major* | **Change machine ulimit-l to ulimit-a so dumps full ulimit rather than just 'max locked memory'** Our 'Build Artifacts' have a machine directory under which we emit vitals on the host the build was run on. We used to emit the result of 'ulimit -l' as a file named 'ulimit-l'. This has been hijacked to instead emit result of running 'ulimit -a' which includes stat on ulimit -l. --- * [HBASE-23678](https://issues.apache.org/jira/browse/HBASE-23678) | *Major* | **Literate builder API for version management in schema** ColumnFamilyDescriptor new builder API: /\*\* \* Retain all versions for a given TTL(retentionInterval), and then only a specific number \* of versions(versionAfterInterval) after that interval elapses. \* \* @param retentionInterval Retain all versions for this interval \* @param versionAfterInterval Retain no of versions to retain after retentionInterval \*/ public ModifyableColumnFamilyDescriptor setVersionsWithTimeToLive( final int retentionInterval, final int versionAfterInterval) --- * [HBASE-24050](https://issues.apache.org/jira/browse/HBASE-24050) | *Major* | **Deprecated PBType on all 2.x branches** org.apache.hadoop.hbase.types.PBType is marked as deprecated without any replacement. It will be moved to hbase-example module and marked as IA.Private in 3.0.0. This is a mistake as it should not be part of our public API. Users who depend on this class should just copy the code your own code base. --- * [HBASE-8868](https://issues.apache.org/jira/browse/HBASE-8868) | *Minor* | **add metric to report client shortcircuit reads** Expose file system level read metrics for RegionServer. If the HBase RS runs on top of HDFS, calculate the aggregation of ReadStatistics of each HdfsFileInputStream. These metrics include: (1) total number of bytes read from HDFS. (2) total number of bytes read from local DataNode. (3) total number of bytes read locally through short-circuit read. (4) total number of bytes read locally through zero-copy read. Because HDFS ReadStatistics is calculated per input stream, it is not feasible to update the aggregated number in real time. Instead, the metrics are updated when an input stream is closed. --- * [HBASE-24032](https://issues.apache.org/jira/browse/HBASE-24032) | *Major* | **[RSGroup] Assign created tables to respective rsgroup automatically instead of manual operations** Admin can determine which tables go to which rsgroup by script (setting hbase.rsgroup.table.mapping.script with local filystem path) on Master side which aims to lighten the burden of admin operations. Note, since HBase 3+, rsgroup can be specified in TableDescriptor as well, if clients specify this, master will skip the determination from script. Here is a simple example of script: {code} # Input consists of two string, 1st is the namespace of the table, 2nd is the table name of the table #!/bin/bash namespace=$1 tablename=$2 if [[ $namespace == test ]]; then echo test elif [[ $tablename == \*foo\* ]]; then echo other else echo default fi {code} --- * [HBASE-23993](https://issues.apache.org/jira/browse/HBASE-23993) | *Major* | **Use loopback for zk standalone server in minizkcluster** MiniZKCluster now puts up its standalone node listening on loopback/127.0.0.1 rather than "localhost". --- * [HBASE-23986](https://issues.apache.org/jira/browse/HBASE-23986) | *Major* | **Bump hadoop-two.version to 2.10.0 on master and branch-2** Bumped hadoop-two.version to 2.10.0, which means we will drop the support for hadoop-2.8.x and hadoop-2.9.x. --- * [HBASE-23930](https://issues.apache.org/jira/browse/HBASE-23930) | *Minor* | **Shell should attempt to format \`timestamp\` attributes as ISO-8601** Change timestamp display to be ISO8601 when toString on Cell and outputting in shell.... User used to see.... column=table:state, timestamp=1583967620343 ..... ... but now sees: column=table:state, timestamp=2020-03-11T23:00:20.343Z .... --- * [HBASE-22827](https://issues.apache.org/jira/browse/HBASE-22827) | *Major* | **Expose multi-region merge in shell and Admin API** merge\_region shell command can now be used to merge more than 2 regions as well. It takes a list of regions as comma separated values or as an array of regions, and not just 2 regions. The full regionnames and encoded regionnames are continued to be accepted. --- * [HBASE-23767](https://issues.apache.org/jira/browse/HBASE-23767) | *Major* | **Add JDK11 compilation and unit test support to Github precommit** Rebuild our Dockerfile with support for multiple JDK versions. Use multiple stages in the Jenkinsfile instead of yetus's multijdk because of YETUS-953. Run those multiple stages in parallel to speed up results. Note that multiple stages means multiple Yetus invocations means multiple comments on the PreCommit. This should become more obvious to users once we can make use of GitHub Checks API, HBASE-23902. --- * [HBASE-22978](https://issues.apache.org/jira/browse/HBASE-22978) | *Minor* | **Online slow response log** get\_slowlog\_responses and clear\_slowlog\_responses are used to retrieve and clear slow RPC logs from RingBuffer maintained by RegionServers. New Admin APIs: 1. List\ getSlowLogResponses(final Set\ serverNames, final SlowLogQueryFilter slowLogQueryFilter) throws IOException; 2. List\ clearSlowLogResponses(final Set\ serverNames) throws IOException; Configs: 1. hbase.regionserver.slowlog.ringbuffer.size: Default size of ringbuffer to be maintained by each RegionServer in order to store online slowlog responses. This is an in-memory ring buffer of requests that were judged to be too slow in addition to the responseTooSlow logging. The in-memory representation would be complete. For more details, please look into Doc Section: Get Slow Response Log from shell Default 256 2. hbase.regionserver.slowlog.buffer.enabled: Indicates whether RegionServers have ring buffer running for storing Online Slow logs in FIFO manner with limited entries. The size of the ring buffer is indicated by config: hbase.regionserver.slowlog.ringbuffer.size The default value is false, turn this on and get latest slowlog responses with complete data. Default false For more details, please look into "Get Slow Response Log from shell" section from HBase book. --- * [HBASE-23926](https://issues.apache.org/jira/browse/HBASE-23926) | *Major* | **[Flakey Tests] Down the flakies re-run ferocity; it makes for too many fails.** Down the flakey re-rerun fork count from 1.0C -- i.e. a fork per CPU -- to 0.25C. On a recent run, the machine had 16 cores. 0.25 is 4 cores. We'd hardcoded fork count at 3 previous to changes made by parent. --- * [HBASE-23146](https://issues.apache.org/jira/browse/HBASE-23146) | *Major* | **Support CheckAndMutate with multiple conditions** Add a checkAndMutate(row, filter) method in the AsyncTable interface and the Table interface. This method atomically checks if the row matches the specified filter. If it does, it adds the Put/Delete/RowMutations. This is a fluent style API, the code is like: For Table interface: {code} table.checkAndMutate(row, filter).thenPut(put); {code} For AsyncTable interface: {code} table.checkAndMutate(row, filter).thenPut(put) .thenAccept(succ -\> { if (succ) { System.out.println("Check and put succeeded"); } else { System.out.println("Check and put failed"); } }); {code} --- * [HBASE-23874](https://issues.apache.org/jira/browse/HBASE-23874) | *Minor* | **Move Jira-attached file precommit definition from script in Jenkins config to dev-support** The Jira Precommit job (https://builds.apache.org/job/PreCommit-HBASE-Build/) will now look for a file within the source tree (dev-support/jenkins\_precommit\_jira\_yetus.sh) instead of depending on a script section embedded in the job. --- * [HBASE-23865](https://issues.apache.org/jira/browse/HBASE-23865) | *Major* | **Up flakey history from 5 to 10** Changed flakey list reporting to show 5 rather than 10 items. Also changed the second and first part fort counts to be 1C rather than hardcoded 3. --- * [HBASE-23554](https://issues.apache.org/jira/browse/HBASE-23554) | *Major* | **Encoded regionname to regionname utility** Adds shell command regioninfo: hbase(main):001:0\> regioninfo '0e6aa5c19ae2b2627649dc7708ce27d0' {ENCODED =\> 0e6aa5c19ae2b2627649dc7708ce27d0, NAME =\> 'TestTable,,1575941375972.0e6aa5c19ae2b2627649dc7708ce27d0.', STARTKEY =\> '', ENDKEY =\> '00000000000000000000299441'} Took 0.4737 seconds --- * [HBASE-23350](https://issues.apache.org/jira/browse/HBASE-23350) | *Major* | **Make compaction files cacheonWrite configurable based on threshold** This JIRA adds a new configuration - \`hbase.rs.cachecompactedblocksonwrite.threshold\`. This configuration is the maximum total size (in bytes) of the compacted files below which the configuration \`hbase.rs.cachecompactedblocksonwrite\` is honoured. If the total size of the compacted fies exceeds this threshold, even when \`hbase.rs.cachecompactedblocksonwrite\` is enabled, the data blocks are not cached. Caching index and bloom blocks is not affected by this configuration (user configuration is always honoured). Default value of this configuration is Long.MAX\_VALUE. This means whatever the total size of the compacted files, it wil be cached. --- * [HBASE-17115](https://issues.apache.org/jira/browse/HBASE-17115) | *Major* | **HMaster/HRegion Info Server does not honour admin.acl** Implements authorization for the HBase Web UI by limiting access to certain endpoints which could be used to extract sensitive information from HBase. Access to these restricted endpoints can be limited to a group of administrators, identified either by a list of users (hbase.security.authentication.spnego.admin.users) or by a list of groups (hbase.security.authentication.spnego.admin.groups). By default, neither of these values are set which will preserve backwards compatibility (allowing all authenticated users to access all endpoints). Further, users who have sensitive information in the HBase service configuration can set hbase.security.authentication.ui.config.protected to true which will treat the configuration endpoint as a protected, admin-only resource. By default, all authenticated users may access the configuration endpoint. --- * [HBASE-23647](https://issues.apache.org/jira/browse/HBASE-23647) | *Major* | **Make MasterRegistry the default registry impl** Enables master based registry as the default registry used by clients to fetch connection metadata. Refer to the section "Master Registry" in the client documentation for more details and advantages of this implementation over the default Zookeeper based registry. Configuration parameter that controls the registry in use: `hbase.client.registry.impl` Where to set this: HBase client configuration (hbase-site.xml) Possible values: - `org.apache.hadoop.hbase.client.ZKConnectionRegistry` (For ZK based registry implementation) - `org.apache.hadoop.hbase.client.MasterRegistry` (New, for master based registry implementation) Notes on defaults: - For v3.0.0 and later, MasterRegistry is the default registry - For all releases in 2.x line, ZK based registry is the default. This feature has been back ported to 2.3.0 and later releases. MasterRegistry can be enabled by setting the following client configuration. ``` hbase.client.registry.impl org.apache.hadoop.hbase.client.MasterRegistry ``` --- * [HBASE-23069](https://issues.apache.org/jira/browse/HBASE-23069) | *Critical* | **periodic dependency bump for Sep 2019** caffeine: 2.6.2 =\> 2.8.1 commons-codec: 1.10 =\> 1.13 commons-io: 2.5 =\> 2.6 disrupter: 3.3.6 =\> 3.4.2 httpcore: 4.4.6 =\> 4.4.13 jackson: 2.9.10 =\> 2.10.1 jackson.databind: 2.9.10.1 =\> 2.10.1 jetty: 9.3.27.v20190418 =\> 9.3.28.v20191105 protobuf.plugin: 0.5.0 =\> 0.6.1 zookeeper: 3.4.10 =\> 3.4.14 slf4j: 1.7.25 =\> 1.7.30 rat: 0.12 =\> 0.13 asciidoctor: 1.5.5 =\> 1.5.8 asciidoctor.pdf: 1.5.0-alpha.15 =\> 1.5.0-rc.2 error-prone: 2.3.3 =\> 2.3.4 --- * [HBASE-23686](https://issues.apache.org/jira/browse/HBASE-23686) | *Major* | **Revert binary incompatible change and remove reflection** - Reverts a binary incompatible binary change for ByteRangeUtils - Usage of reflection inside CommonFSUtils removed --- * [HBASE-23055](https://issues.apache.org/jira/browse/HBASE-23055) | *Major* | **Alter hbase:meta** Adds being able to edit hbase:meta table schema. For example, hbase(main):006:0\> alter 'hbase:meta', {NAME =\> 'info', DATA\_BLOCK\_ENCODING =\> 'ROW\_INDEX\_V1'} Updating all regions with the new schema... All regions updated. Done. Took 1.2138 seconds You can even add columnfamilies. Howevert, you cannot delete any of the core hbase:meta column families such as 'info' and 'table'. --- * [HBASE-23347](https://issues.apache.org/jira/browse/HBASE-23347) | *Major* | **Pluggable RPC authentication** This change introduces an internal abstraction layer which allows for new SASL-based authentication mechanisms to be used inside HBase services. All existing SASL-based authentication mechanism were ported to the new abstraction, making no external change in runtime semantics, client API, or RPC serialization format. Developers familiar with extending HBase can implement authentication mechanism beyond simple Kerberos and DelegationTokens which authenticate HBase users against some other user database. HBase service authentication (Master to/from RegionServer) continue to operate solely over Kerberos. --- * [HBASE-23156](https://issues.apache.org/jira/browse/HBASE-23156) | *Major* | **start-hbase.sh failed with ClassNotFoundException when build with hadoop3** Introduce a new hbase-assembly/src/main/assembly/hadoop-three-compat.xml for build with hadoop 3.x. --- * [HBASE-23680](https://issues.apache.org/jira/browse/HBASE-23680) | *Major* | **RegionProcedureStore missing cleaning of hfile archive** Add a new config to hbase-default.xml \ \hbase.procedure.store.region.hfilecleaner.plugins\ \org.apache.hadoop.hbase.master.cleaner.TimeToLiveHFileCleaner\ \A comma-separated list of BaseHFileCleanerDelegate invoked by the RegionProcedureStore HFileCleaner service. These HFiles cleaners are called in order, so put the cleaner that prunes the most files in front. To implement your own BaseHFileCleanerDelegate, just put it in HBase's classpath and add the fully qualified class name here. Always add the above default hfile cleaners in the list as they will be overwritten in hbase-site.xml.\ \ It will share the same TTL with other HFileCleaners. And you can also implement your own cleaner and change this property to enable it. --- * [HBASE-23675](https://issues.apache.org/jira/browse/HBASE-23675) | *Minor* | **Move to Apache parent POM version 22** Updated parent pom to Apache version 22. --- * [HBASE-23679](https://issues.apache.org/jira/browse/HBASE-23679) | *Critical* | **FileSystem instance leaks due to bulk loads with Kerberos enabled** This issues fixes an issue with Bulk Loading on installations with Kerberos enabled and more than a single RegionServer. When multiple tables are involved in hosting a table's regions which are being bulk-loaded into, all but the RegionServer hosting the table's first Region will "leak" one DistributedFileSystem object onto the heap, never freeing that memory. Eventually, with enough bulk loads, this will create a situation for RegionServers where they have no free heap space and will either spend all time in JVM GC, lose their ZK session, or crash with an OutOfMemoryError. The only mitigation for this issue is to periodically restart RegionServers. All earlier versions of HBase 2.x are subject to this issue (2.0.x, \<=2.1.8, \<=2.2.3) --- * [HBASE-23286](https://issues.apache.org/jira/browse/HBASE-23286) | *Major* | **Improve MTTR: Split WAL to HFile** Add a new feature to improve MTTR which have 3 steps to failover: 1. Read WAL and write HFile to region’s column family’s recovered.hfiles directory. 2. Open region. 3. Bulkload the recovered.hfiles for every column family. Compared to DLS(distributed log split), this feature will reduce region open time significantly. Config hbase.wal.split.to.hfile to true to enable this featue. --- * [HBASE-23619](https://issues.apache.org/jira/browse/HBASE-23619) | *Trivial* | **Use built-in formatting for logging in hbase-zookeeper** Changed the logging in hbase-zookeeper to use built-in formatting --- * [HBASE-23628](https://issues.apache.org/jira/browse/HBASE-23628) | *Minor* | **Replace Apache Commons Digest Base64 with JDK8 Base64** From the PR: "Yes. The two create the same output... I just wrote a small test suite to increase my confidence on that. I generated many tens of millions of random byte patterns and compared the output of the two algorithms. They came back identical every time. "Just in case any inquiring minds would like to know, there is no longer an encoding required when generating the strings. The JDK implementation specifically specifies that strings returned are StandardCharsets.ISO\_8859\_1. This does not change anything because UTF8 and ISO\_8859 overlap for the limited character set (64 characters) the encoding uses." --- * [HBASE-23651](https://issues.apache.org/jira/browse/HBASE-23651) | *Major* | **Region balance throttling can be disabled** Set hbase.balancer.max.balancing to a int value which \<=0 will disable region balance throttling. --- * [HBASE-23588](https://issues.apache.org/jira/browse/HBASE-23588) | *Major* | **Cache index blocks and bloom blocks on write if CacheCompactedBlocksOnWrite is enabled** If cacheOnWrite is enabled during flush or compaction, index and bloom blocks(with data blocks) would be automatically cached during write. --- * [HBASE-23369](https://issues.apache.org/jira/browse/HBASE-23369) | *Major* | **Auto-close 'unknown' Regions reported as OPEN on RegionServers** If a RegionServer reports a Region as OPEN in disagreement with Master's status on the Region, the Master now tells the RegionServer to silently close the Region. --- * [HBASE-23596](https://issues.apache.org/jira/browse/HBASE-23596) | *Major* | **HBCKServerCrashProcedure can double assign** Makes it so the recently added HBCKServerCrashProcedure -- the SCP that gets invoked when an operator schedules an SCP via hbck2 scheduleRecoveries command -- now works the same as SCP EXCEPT if master knows nothing of the scheduled servername. In this latter case, HBCKSCP will do a full scan of hbase:meta looking for instances of the passed servername. If any found it will attempt cleanup of hbase:meta references by reassigning any found OPEN or OPENING and by closing any in CLOSING state. Used to fix instances of what the 'HBCK Report' page shows as 'Unknown Servers'. --- * [HBASE-23624](https://issues.apache.org/jira/browse/HBASE-23624) | *Major* | **Add a tool to dump the procedure info in HFile** Use ./hbase org.apache.hadoop.hbase.procedure2.store.region.HFileProcedurePrettyPrinter to run the tool. --- * [HBASE-23590](https://issues.apache.org/jira/browse/HBASE-23590) | *Major* | **Update maxStoreFileRefCount to maxCompactedStoreFileRefCount** RegionsRecoveryChore introduced as part of HBASE-22460 tries to reopen regions based on config: hbase.regions.recovery.store.file.ref.count. Region reopen needs to take into consideration all compacted away store files that belong to the region and not store files(non-compacted). Fixed this bug as part of this Jira. Updated description for corresponding configs: 1. hbase.master.regions.recovery.check.interval : Regions Recovery Chore interval in milliseconds. This chore keeps running at this interval to find all regions with configurable max store file ref count and reopens them. Defaults to 20 mins 2. hbase.regions.recovery.store.file.ref.count : Very large number of ref count on a compacted store file indicates that it is a ref leak on that object(compacted store file). Such files can not be removed after it is invalidated via compaction. Only way to recover in such scenario is to reopen the region which can release all resources, like the refcount, leases, etc. This config represents Store files Ref Count threshold value considered for reopening regions. Any region with compacted store files ref count \> this value would be eligible for reopening by master. Here, we get the max refCount among all refCounts on all compacted away store files that belong to a particular region. Default value -1 indicates this feature is turned off. Only positive integer value should be provided to enable this feature. --- * [HBASE-23618](https://issues.apache.org/jira/browse/HBASE-23618) | *Major* | **Add a tool to dump procedure info in the WAL file** Use ./hbase org.apache.hadoop.hbase.procedure2.store.region.WALProcedurePrettyPrinter to run the tool. --- * [HBASE-23617](https://issues.apache.org/jira/browse/HBASE-23617) | *Major* | **Add a stress test tool for region based procedure store** Use ./hbase org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStorePerformanceEvaluation to run the tool. --- * [HBASE-23326](https://issues.apache.org/jira/browse/HBASE-23326) | *Critical* | **Implement a ProcedureStore which stores procedures in a HRegion** Use a region based procedure store to replace the old customized WAL based procedure store. The procedure data migration is done automatically during upgrading. After upgrading, the MasterProcWALs directory will be deleted and a new MasterProc directory will be created. And notice that a region will still write WAL so we still have WAL files and they will be moved to the oldWALs directory. The file name is mostly like a normal WAL file, and the only difference is that it is ended with "$masterproc$". --- * [HBASE-23320](https://issues.apache.org/jira/browse/HBASE-23320) | *Major* | **Upgrade surefire plugin to 3.0.0-M4** Bumped surefire plugin to 3.0.0-M4 --- * [HBASE-20461](https://issues.apache.org/jira/browse/HBASE-20461) | *Major* | **Implement fsync for AsyncFSWAL** Now AsyncFSWAL also supports Durability.FSYNC\_WAL. --- * [HBASE-23066](https://issues.apache.org/jira/browse/HBASE-23066) | *Minor* | **Create a config that forces to cache blocks on compaction** The configuration 'hbase.rs.cacheblocksonwrite' was used to enable caching the blocks on write. But purposefully we were not caching the blocks when we do compaction (since it may be very aggressive) as the caching happens as and when the writer completes a block. In cloud environments since they have bigger sized caches - though they try to enable 'hbase.rs.prefetchblocksonopen' (non - aggressive way of caching the blocks proactively on reader creation) it does not help them because it takes time to cache the compacted blocks. This feature creates a new configuration 'hbase.rs.cachecompactedblocksonwrite' which when set to 'true' will enable the blocks created out of compaction. Remember that since it is aggressive caching the user should be having enough cache space - if not it may lead to other active blocks getting evicted. From the shell this can be enabled by using the option per Column Family also by using the below format {code} create 't1', 'f1', {NUMREGIONS =\> 15, SPLITALGO =\> 'HexStringSplit', CONFIGURATION =\> {'hbase.rs.cachecompactedblocksonwrite' =\> 'true'}} {code} --- * [HBASE-23239](https://issues.apache.org/jira/browse/HBASE-23239) | *Major* | **Reporting on status of backing MOB files from client-facing cells** Users of the MOB feature can now use the `mobrefs` utility to get statistics about data in the MOB system and verify the health of backing files on HDFS. ``` HADOOP_CLASSPATH=/etc/hbase/conf:$(hbase mapredcp) yarn jar \ /some/path/to/hbase-shaded-mapreduce.jar mobrefs mobrefs-report-output some_table foo ``` See javadocs of the class `MobRefReporter` for more details. the reference guide has added some information about MOB internals and troubleshooting. --- * [HBASE-23549](https://issues.apache.org/jira/browse/HBASE-23549) | *Minor* | **Document steps to disable MOB for a column family** The reference guide now includes a walk through of disabling the MOB feature if needed while maintaining availability. --- * [HBASE-23582](https://issues.apache.org/jira/browse/HBASE-23582) | *Minor* | **Unbalanced braces in string representation of table descriptor** Fixed unbalanced braces in string representation within HBase shell --- * [HBASE-23293](https://issues.apache.org/jira/browse/HBASE-23293) | *Minor* | **[REPLICATION] make ship edits timeout configurable** The default rpc timeout for ReplicationSourceShipper#shipEdits is 60s, when bulkload replication enabled, timeout exception may be occurred. Now we can conf the timeout value through replication.source.shipedits.timeout, and it’s adaptive. --- * [HBASE-23312](https://issues.apache.org/jira/browse/HBASE-23312) | *Major* | **HBase Thrift SPNEGO configs (HBASE-19852) should be backwards compatible** The newer HBase Thrift SPNEGO configs should not be required. The hbase.thrift.spnego.keytab.file and hbase.thrift.spnego.principal configs will fall back to the hbase.thrift.keytab.file and hbase.thrift.kerberos.principal original configs. The older configs will log a deprecation warning. It is preferred to new the newer SPNEGO configurations. --- * [HBASE-22969](https://issues.apache.org/jira/browse/HBASE-22969) | *Minor* | **A new binary component comparator(BinaryComponentComparator) to perform comparison of arbitrary length and position** With BinaryComponentCompartor applications will be able to design diverse and powerful set of filters for rows and columns. See https://issues.apache.org/jira/browse/HBASE-22969 for example. In general, the comparator can be used with any filter taking ByteArrayComparable. As of now, following filters take ByteArrayComparable: 1. RowFilter 2. ValueFilter 3. QualifierFilter 4. FamilyFilter 5. ColumnValueFilter --- * [HBASE-23234](https://issues.apache.org/jira/browse/HBASE-23234) | *Major* | **Provide .editorconfig based on checkstyle configuration** Adds a .editorconfig file with configurations populated by IntelliJ, based on our checkstyle configuration. There's lots of IntelliJ-specific configs in here that I assume are not replicated to Eclipse or Netbeans users. Any devs using those tools should push whatever updates they see fit, but please start with the checkstyle configs as the origin of truth. --- * [HBASE-23322](https://issues.apache.org/jira/browse/HBASE-23322) | *Minor* | **[hbck2] Simplification on HBCKSCP scheduling** An hbck2 scheduleRecoveries will run a subclass of ServerCrashProcedure which asks Master what Regions were on the dead Server but it will also do a hbase:meta table scan to see if any vestiges of the old Server remain (for the case where an SCP failed mid-point leaving references in place or where Master and hbase:meta deviated in accounting). --- * [HBASE-23321](https://issues.apache.org/jira/browse/HBASE-23321) | *Minor* | **[hbck2] fixHoles of fixMeta doesn't update in-memory state** If holes in hbase:meta, hbck2 fixMeta now will update Master in-memory state so you do not need to restart master just so you can assign the new hole-bridging regions. --- * [HBASE-23282](https://issues.apache.org/jira/browse/HBASE-23282) | *Major* | **HBCKServerCrashProcedure for 'Unknown Servers'** hbck2 scheduleRecoveries will now run a SCP that also looks in hbase:meta for any references to the scheduled server -- not just consult Master in-memory state -- just in case vestiges of the server are leftover in hbase:meta --- * [HBASE-19450](https://issues.apache.org/jira/browse/HBASE-19450) | *Minor* | **Add log about average execution time for ScheduledChore** HBase internal chores now log a moving average of how long execution of each chore takes at `INFO` level for the logger `org.apache.hadoop.hbase.ScheduledChore`. Such messages will happen at most once per five minutes. --- * [HBASE-23250](https://issues.apache.org/jira/browse/HBASE-23250) | *Minor* | **Log message about CleanerChore delegate initialization should be at INFO** CleanerChore delegate initialization is now logged at INFO level instead of DEBUG --- * [HBASE-23243](https://issues.apache.org/jira/browse/HBASE-23243) | *Major* | **[pv2] Filter out SUCCESS procedures; on decent-sized cluster, plethora overwhelms problems** The 'Procedures & Locks' tab in Master UI only displays problematic Procedures now (RUNNABLE, WAITING-TIMEOUT, etc.). It no longer notes procedures whose state is SUCCESS. --- * [HBASE-23227](https://issues.apache.org/jira/browse/HBASE-23227) | *Blocker* | **Upgrade jackson-databind to 2.9.10.1 to avoid recent CVEs** the Apache HBase REST Proxy now uses Jackson Databind version 2.9.10.1 to address the following CVEs - CVE-2019-16942 - CVE-2019-16943 Users of prior releases with Jackson Databind 2.9.10 are advised to either upgrade to this release or to upgrade their local Jackson Databind jar directly. --- * [HBASE-23222](https://issues.apache.org/jira/browse/HBASE-23222) | *Critical* | **Better logging and mitigation for MOB compaction failures** The MOB compaction process in the HBase Master now logs more about its activity. In the event that you run into the problems described in HBASE-22075, there is a new HFileCleanerDelegate that will stop all removal of MOB hfiles from the archive area. It can be configured by adding `org.apache.hadoop.hbase.mob.ManualMobMaintHFileCleaner` to the list configured for `hbase.master.hfilecleaner.plugins`. This new cleaner delegate will cause your archive area to grow unbounded; you will have to manually prune files which may be prohibitively complex. Consider if your use case will allow you to mitigate by disabling mob compactions instead. Caveats: * Be sure the list of cleaner delegates still includes the default cleaners you will likely need: ttl, snapshot, and hlink. * Be mindful that if you enable this cleaner delegate then there will be *no* automated process for removing these mob hfiles. You should see a single region per table in `%hbase_root%/archive` that accumulates files over time. You will have to determine which of these files are safe or not to remove. * You should list this cleaner delegate after the snapshot and hlink delegates so that you can enable sufficient logging to determine when an archived mob hfile is needed by those subsystems. When set to `TRACE` logging, the CleanerChore logger will include archive retention decision justifications. * If your use case creates a large number of uniquely named tables, this new delegate will cause memory pressure on the master. --- * [HBASE-15519](https://issues.apache.org/jira/browse/HBASE-15519) | *Major* | **Add per-user metrics** Adds per-user metrics for reads/writes to each RegionServer. These metrics are exported by default. hbase.regionserver.user.metrics.enabled can be used to disable the feature if desired for any reason. --- * [HBASE-22460](https://issues.apache.org/jira/browse/HBASE-22460) | *Minor* | **Reopen a region if store reader references may have leaked** Leaked store files can not be removed even after it is invalidated via compaction. A reasonable mitigation for a reader reference leak would be a fast reopen of the region on the same server. Configs: 1. hbase.master.regions.recovery.check.interval : Regions Recovery Chore interval in milliseconds. This chore keeps running at this interval to find all regions with configurable max store file ref count and reopens them. Defaults to 20 mins 2. hbase.regions.recovery.store.file.ref.count : This config represents Store files Ref Count threshold value considered for reopening regions. Any region with store files ref count \> this value would be eligible for reopening by master. Default value -1 indicates this feature is turned off. Only positive integer value should be provided to enable this feature. --- * [HBASE-23172](https://issues.apache.org/jira/browse/HBASE-23172) | *Minor* | **HBase Canary region success count metrics reflect column family successes, not region successes** Added a comment to make clear that read/write success counts are tallying column family success counts, not region success counts. Additionally, the region read and write latencies previously only stored the latencies of the last column family of the region reads/writes. This has been fixed by using a map of each region to a list of read and write latency values. --- * [HBASE-23177](https://issues.apache.org/jira/browse/HBASE-23177) | *Major* | **If fail to open reference because FNFE, make it plain it is a Reference** Changes the message on the FNFE exception thrown when the file a Reference points to is missing; the message now includes detail on Reference as well as pointed-to file so can connect how FNFE relates to region open. --- * [HBASE-20626](https://issues.apache.org/jira/browse/HBASE-20626) | *Major* | **Change the value of "Requests Per Second" on WEBUI** Use 'totalRowActionRequestCount' to calculate QPS on web UI. --- * [HBASE-22874](https://issues.apache.org/jira/browse/HBASE-22874) | *Critical* | **Define a public interface for Canary and move existing implementation to LimitedPrivate** Downstream users who wish to programmatically check the health of their HBase cluster may now rely on a public interface derived from the previously private implementation of the canary cli tool. The interface is named `Canary` and can be found in the user facing javadocs. Downstream users who previously relied on the invoking the canary via the Java classname (either on the command line or programmatically) will need to change how they do so because the non-public implementation has moved. --- * [HBASE-23035](https://issues.apache.org/jira/browse/HBASE-23035) | *Major* | **Retain region to the last RegionServer make the failover slower** Since 2.0.0,when one regionserver crashed and back online again, AssignmentManager will retain the region locations and try assign the regions to this regionserver(same host:port with the crashed one) again. But for 1.x.x, the behavior is round-robin assignment for the regions belong to the crashed regionserver. This jira change the "retain" assignment to round-robin assignment, which is same with 1.x.x version. This change will make the failover faster and improve availability. --- * [HBASE-23046](https://issues.apache.org/jira/browse/HBASE-23046) | *Minor* | **Remove compatibility case from truncate command** Remove backward compatibility from \`truncate\` and \`truncate\_preserve\` shell commands. This means that these commands from HBase Clients are not compatible with pre-0.99 HBase clusters. --- * [HBASE-23040](https://issues.apache.org/jira/browse/HBASE-23040) | *Minor* | **region mover gives NullPointerException instead of saying a host isn't in the cluster** giving the region mover "unload" command a region server name that isn't recognized by the cluster results in a "I don't know about that host" message instead of a NPE. set log level to DEBUG if you'd like the region mover to log the set of region server names it got back from the cluster. --- * [HBASE-21874](https://issues.apache.org/jira/browse/HBASE-21874) | *Major* | **Bucket cache on Persistent memory** Added a new IOEngine type for Bucket cache ie Persistent memory. In order to use BC over pmem configure IOEngine as \ \hbase.bucketcache.ioengine\ \ pmem:///path in persistent memory \ \ --- * [HBASE-22760](https://issues.apache.org/jira/browse/HBASE-22760) | *Major* | **Stop/Resume Snapshot Auto-Cleanup activity with shell command** By default, snapshot auto cleanup based on TTL would be enabled for any new cluster. At any point in time, if snapshot cleanup is supposed to be stopped due to some snapshot restore activity or any other reason, it is advisable to disable it using shell command: hbase\> snapshot\_cleanup\_switch false We can re-enable it using: hbase\> snapshot\_cleanup\_switch true We can query whether snapshot auto cleanup is enabled for cluster using: hbase\> snapshot\_cleanup\_enabled --- * [HBASE-22796](https://issues.apache.org/jira/browse/HBASE-22796) | *Major* | **[HBCK2] Add fix of overlaps to fixMeta hbck Service** Adds fix of overlaps to the fixMeta hbck service method. Uses the bulk-merge facility. Merges a max of 10 at a time. Set hbase.master.metafixer.max.merge.count to higher if you want to do more than 10 in the one go. --- * [HBASE-21745](https://issues.apache.org/jira/browse/HBASE-21745) | *Critical* | **Make HBCK2 be able to fix issues other than region assignment** This issue adds via its subtasks: \* An 'HBCK Report' page to the Master UI added by HBASE-22527+HBASE-22709+HBASE-22723+ (since 2.1.6, 2.2.1, 2.3.0). Lists consistency or anomalies found via new hbase:meta consistency checking extensions added to CatalogJanitor (holes, overlaps, bad servers) and by a new 'HBCK chore' that runs at a lesser periodicity that will note filesystem orphans and overlaps as well as the following conditions: \*\* Master thought this region opened, but no regionserver reported it. \*\* Master thought this region opened on Server1, but regionserver reported Server2 \*\* More than one regionservers reported opened this region Both chores can be triggered from the shell to regenerate ‘new’ reports. \* Means of scheduling a ServerCrashProcedure (HBASE-21393). \* An ‘offline’ hbase:meta rebuild (HBASE-22680). \* Offline replace of hbase.version and hbase.id \* Documentation on how to use completebulkload tool to ‘adopt’ orphaned data found by new HBCK2 ‘filesystem’ check (see below) and ‘HBCK chore’ (HBASE-22859) \* A ‘holes’ and ‘overlaps’ fix that runs in the master that uses new bulk-merge facility to collapse many overlaps in the one go. \* hbase-operator-tools HBCK2 client tool got a bunch of additions: \*\* A specialized 'fix' for the case where operators ran old hbck 'offlinemeta' repair and destroyed their hbase:meta; it ties together holes in meta with orphaned data in the fs (HBASE-22567) \*\* A ‘filesystem’ command that reports on orphan data as well as bad references and hlinks with a ‘fix’ for the latter two options (based on hbck1 facility updated). \*\* Adds back the ‘replication’ fix facility from hbck1 (HBASE-22717) The compound result is that hbck2 is now in excess of hbck1 abilities. The provided functionality is disaggregated as per the hbck2 philosophy of providing 'plumbing' rather than 'porcelain' so there is work to do still adding fix-it playbooks, scripting across outages, and automation. --- * [HBASE-22802](https://issues.apache.org/jira/browse/HBASE-22802) | *Major* | **Avoid temp ByteBuffer allocation in FileIOEngine#read** HBASE-21879 introduces a utility class (org.apache.hadoop.hbase.io.ByteBuffAllocator) used for allocating/freeing ByteBuffers from/to NIO ByteBuffer pool, when BucketCache enabled with file or mmap engine, we will use this ByteBuffer pool to avoid temp ByteBuffer allocation a lot. --- * [HBASE-11062](https://issues.apache.org/jira/browse/HBASE-11062) | *Major* | **hbtop** Introduces hbtop that's a real-time monitoring tool for HBase like Unix's top command. See the ref guide for the details: https://hbase.apache.org/book.html#hbtop --- * [HBASE-21879](https://issues.apache.org/jira/browse/HBASE-21879) | *Major* | **Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose** Before this issue, we've made the read path 100% offheap when block hit the BucketCache 100%, but if the cache missed then RS need to read the block by on-heap API, which would cause high young GC pressure. This issue will read the block by offheap even if reading the block from filesystem directly, it have some requirement for hadoop version(\>=2.9.3) but can also works with older hadoop version(means still works fine but will read block onheap). We have written a careful doc about the implementation, performance and practice here: https://docs.google.com/document/d/1xSy9axGxafoH-Qc17zbD2Bd--rWjjI00xTWQZ8ZwI\_E/edit#heading=h.nch5d72p27ex, for more details please read it. --- * [HBASE-22618](https://issues.apache.org/jira/browse/HBASE-22618) | *Major* | **added the possibility to load custom cost functions** Extends `StochasticLoadBalancer` to support user-provided cost function. These are loaded in addition to the default set of cost functions. Custom function implementations must extend `StochasticLoadBalancer$CostFunction`. Enable any additional functions by placing them on the master class path and configuring `hbase.master.balancer.stochastic.additionalCostFunctions` with a comma-separated list of fully-qualified class names. --- * [HBASE-22867](https://issues.apache.org/jira/browse/HBASE-22867) | *Critical* | **The ForkJoinPool in CleanerChore will spawn thousands of threads in our cluster with thousands table** Replace the ForkJoinPool in CleanerChore by ThreadPoolExecutor which can limit the spawn thread size and avoid the master GC frequently. The replacement is an internal implementation in CleanerChore, so no config key change, the upstream users can just upgrade the hbase master without any other change. --- * [HBASE-22810](https://issues.apache.org/jira/browse/HBASE-22810) | *Major* | **Initialize an separate ThreadPoolExecutor for taking/restoring snapshot** Introduced a new config key for the snapshot taking/restoring operations at master side: hbase.master.executor.snapshot.threads, its default value is 3. means we can have 3 snapshot operations running at the same time. --- * [HBASE-22863](https://issues.apache.org/jira/browse/HBASE-22863) | *Major* | **Avoid Jackson versions and dependencies with known CVEs** 1. Stopped exposing vulnerable Jackson1 dependencies so that downstreamers would not pull it in from HBase. 2. However, since Hadoop requires some Jackson1 dependencies, put vulnerable Jackson mapper at test scope in some HBase modules and hence, HBase tarball created by hbase-assembly contains Jackson1 mapper jar in lib. Still, downsteam applications can't pull in Jackson1 from HBase. --- * [HBASE-22841](https://issues.apache.org/jira/browse/HBASE-22841) | *Major* | **TimeRange's factory functions do not support ranges, only \`allTime\` and \`at\`** Add serveral API in TimeRange class for avoiding using the deprecated TimeRange constructor: \* TimeRange#from: Represents the time interval [minStamp, Long.MAX\_VALUE) \* TimeRange#until: Represents the time interval [0, maxStamp) \* TimeRange#between: Represents the time interval [minStamp, maxStamp) --- * [HBASE-22833](https://issues.apache.org/jira/browse/HBASE-22833) | *Minor* | **MultiRowRangeFilter should provide a method for creating a filter which is functionally equivalent to multiple prefix filters** Provide a public method in MultiRowRangeFilter class to speed the requirement of filtering with multiple row prefixes, it will expand the row prefixes as multiple rowkey ranges by MultiRowRangeFilter, it's more efficient. {code} public MultiRowRangeFilter(byte[][] rowKeyPrefixes); {code} --- * [HBASE-22856](https://issues.apache.org/jira/browse/HBASE-22856) | *Major* | **HBASE-Find-Flaky-Tests fails with pip error** Update the base docker image to ubuntu 18.04 for the find flaky tests jenkins job. --- * [HBASE-22771](https://issues.apache.org/jira/browse/HBASE-22771) | *Major* | **[HBCK2] fixMeta method and server-side support** Adds a fixMeta method to hbck Service. Fixes holes in hbase:meta. Follow-up to fix overlaps. See HBASE-22567 also. Follow-on is adding a client-side to hbase-operator-tools that can exploit this new addition (HBASE-22825) --- * [HBASE-22777](https://issues.apache.org/jira/browse/HBASE-22777) | *Major* | **Add a multi-region merge (for fixing overlaps, etc.)** Changes merge so you can merge more than two regions at a time. Currently only available inside HBase. HBASE-22827, a follow-on, is about exposing the facility in the Admin API (and then via the shell). --- * [HBASE-15666](https://issues.apache.org/jira/browse/HBASE-15666) | *Critical* | **shaded dependencies for hbase-testing-util** New shaded artifact for testing: hbase-shaded-testing-util. --- * [HBASE-22776](https://issues.apache.org/jira/browse/HBASE-22776) | *Major* | **Rename config names in user scan snapshot feature** After HBASE-22776, the steps to config user scan snapshot feature is as followings: 1. Check HDFS configuration 2. Add master coprocessor: hbase.coprocessor.master.classes= “org.apache.hadoop.hbase.security.access.AccessController, org.apache.hadoop.hbase.security.access.SnapshotScannerHDFSAclController” 3. Enable this feature: hbase.acl.sync.to.hdfs.enable=true 4. Modify table scheme to enable this feature for a table: alter 't1', CONFIGURATION =\> {'hbase.acl.sync.to.hdfs.enable' =\> 'true'} --- * [HBASE-22539](https://issues.apache.org/jira/browse/HBASE-22539) | *Blocker* | **WAL corruption due to early DBBs re-use when Durability.ASYNC\_WAL is used** We found a critical bug which can lead to WAL corruption when Durability.ASYNC\_WAL is used. The reason is that we release a ByteBuffer before actually persist the content into WAL file. The problem maybe lead to several errors, for example, ArrayIndexOfOutBounds when replaying WAL. This is because that the ByteBuffer is reused by others. ERROR org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while processing event RS\_LOG\_REPLAY java.lang.ArrayIndexOutOfBoundsException: 18056 at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1365) at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1358) at org.apache.hadoop.hbase.PrivateCellUtil.matchingFamily(PrivateCellUtil.java:735) at org.apache.hadoop.hbase.CellUtil.matchingFamily(CellUtil.java:816) at org.apache.hadoop.hbase.wal.WALEdit.isMetaEditFamily(WALEdit.java:143) at org.apache.hadoop.hbase.wal.WALEdit.isMetaEdit(WALEdit.java:148) at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:297) at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:195) at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:100) And may even cause segmentation fault and crash the JVM directly. You will see a hs\_err\_pidXXX.log file and usually the problem is SIGSEGV. This is usually because that the ByteBuffer has already been returned to the OS and used for other purpose. The problem has been reported several times in the past and this time Wellington Ramos Chevreuil provided the full logs and deeply analyzed the logs so we can find the root cause. And Lijin Bin figured out that the problem may only happen when Durability.ASYNC\_WAL is used. Thanks to them. The problem only effects the 2.x releases, all users are highly recommand to upgrade to a release which has this fix in, especially that if you use Durability.ASYNC\_WAL. --- * [HBASE-22737](https://issues.apache.org/jira/browse/HBASE-22737) | *Major* | **Add a new admin method and shell cmd to trigger the hbck chore to run** Add a new method runHbckChore in Hbck interface and a new shell cmd hbck\_chore\_run to request HBCK chore to run at master side. --- * [HBASE-22741](https://issues.apache.org/jira/browse/HBASE-22741) | *Major* | **Show catalogjanitor consistency complaints in new 'HBCK Report' page** Adds a "CatalogJanitor hbase:meta Consistency Issues" section to the new 'HBCK Report' page added by HBASE-22709. This section is empty unless the most recent CatalogJanitor scan turned up problems. If so, will show table of issues found. --- * [HBASE-22723](https://issues.apache.org/jira/browse/HBASE-22723) | *Major* | **Have CatalogJanitor report holes and overlaps; i.e. problems it sees when doing its regular scan of hbase:meta** When CatalogJanitor runs, it now checks for holes, overlaps, empty info:regioninfo columns and bad servers. Dumps findings into log. Follow-up adds report to new 'HBCK Report' linked off the Master UI. NOTE: All features but the badserver check made it into branch-2.1 and branch-2.0 backports. --- * [HBASE-22714](https://issues.apache.org/jira/browse/HBASE-22714) | *Trivial* | **BuffferedMutatorParams opertationTimeOut() is misspelt** The misspelled BufferedMutatorParams.opertationTimeout method has been marked as deprecated, and will be removed in 4.0.0. Please use the BufferedMutatorParams.operationTimeout method instead. --- * [HBASE-22580](https://issues.apache.org/jira/browse/HBASE-22580) | *Major* | **Add a table attribute to make user scan snapshot feature configurable for table** If a table user scan snapshots of the table, please config the following table scheme attribute to make granted users' ACLs are added to hfiles: alter 't1', CONFIGURATION =\> {'hbase.user.scan.snapshot.enable' =\> 'true'} --- * [HBASE-22709](https://issues.apache.org/jira/browse/HBASE-22709) | *Major* | **Add a chore thread in master to do hbck checking and display results in 'HBCK Report' page** 1. Add a new chore thread in master to do hbck checking 2. Add a new web ui "HBCK Report" page to display checking results. This feature is enabled by default. And the hbck chore run per 60 minutes by default. You can config "hbase.master.hbck.checker.interval" to a value lesser than or equal to 0 for disabling the chore. Notice: the config "hbase.master.hbck.checker.interval" was renamed to "hbase.master.hbck.chore.interval" in HBASE-22737. --- * [HBASE-21773](https://issues.apache.org/jira/browse/HBASE-21773) | *Critical* | **rowcounter utility should respond to pleas for help** This adds [-h\|-help] options to rowcounter. Passing either -h or -help will print rowcounter guide as below: $hbase rowcounter -h usage: hbase rowcounter \ [options] [\ \...] Options: --starttime=\ starting time filter to start counting rows from. --endtime=\ end time filter limit, to only count rows up to this timestamp. --range=\ [startKey],[endKey][;[startKey],[endKey]...]] --expectedCount=\ expected number of rows to be count. For performance, consider the following configuration properties: -Dhbase.client.scanner.caching=100 -Dmapreduce.map.speculative=false --- * [HBASE-22578](https://issues.apache.org/jira/browse/HBASE-22578) | *Major* | **HFileCleaner should not delete empty ns/table directories used for user san snapshot feature** The HFileCleaner will clean the empty directories under archive, but if enable user scan snaphot feature, the user ACLs are set at there directories, so please config the following cleaner to make the directories with user ACLs not be cleaned: hbase.master.hfilecleaner.plugins=org.apache.hadoop.hbase.security.access.SnapshotScannerHDFSAclCleaner --- * [HBASE-22722](https://issues.apache.org/jira/browse/HBASE-22722) | *Blocker* | **Upgrade jackson databind dependencies to 2.9.9.1** Upgrade jackson databind dependency to 2.9.9.1 due to CVEs https://nvd.nist.gov/vuln/detail/CVE-2019-12814 https://nvd.nist.gov/vuln/detail/CVE-2019-12384 --- * [HBASE-22527](https://issues.apache.org/jira/browse/HBASE-22527) | *Major* | **[hbck2] Add a master web ui to show the problematic regions** Add a new master web UI to show the potentially problematic opened regions. There are three case: 1. Master thought this region opened, but no regionserver reported it. 2. Master thought this region opened on Server1, but regionserver reported Server2 3. More than one regionservers reported opened this region --- * [HBASE-22648](https://issues.apache.org/jira/browse/HBASE-22648) | *Minor* | **Snapshot TTL** Feature: Take a Snapshot With TTL for auto-cleanup Attribute: 1. TTL - Specify TTL in sec while creating snapshot. e.g. snapshot 'mytable', 'snapshot1234', {TTL =\> 86400} (snapshot to be auto-cleaned after 24 hr) Configs: 1. Default Snapshot TTL: - FOREVER by default - User specified Default TTL(sec) with config: hbase.master.snapshot.ttl 2. If Snapshot cleanup is supposed to be stopped due to some snapshot restore activity, disable it with config: - hbase.master.cleaner.snapshot.disable: "true" With this config, HMaster needs restart just like any other hbase-site config. For more details, see the section "Take a Snapshot With TTL" in the HBase Reference Guide. --- * [HBASE-22610](https://issues.apache.org/jira/browse/HBASE-22610) | *Trivial* | **[BucketCache] Rename "hbase.offheapcache.minblocksize"** The config point "hbase.offheapcache.minblocksize" was wrong and is now deprecated. The new config point is "hbase.blockcache.minblocksize". --- * [HBASE-22690](https://issues.apache.org/jira/browse/HBASE-22690) | *Major* | **Deprecate / Remove OfflineMetaRepair in hbase-2+** OfflineMetaRepair is no longer supported in HBase-2+. Please refer to https://hbase.apache.org/book.html#HBCK2 This tool is deprecated in 2.x and will be removed in 3.0. --- * [HBASE-22673](https://issues.apache.org/jira/browse/HBASE-22673) | *Major* | **Avoid to expose protobuf stuff in Hbck interface** Mark the Hbck#scheduleServerCrashProcedure(List\ serverNames) as deprecated. Use Hbck#scheduleServerCrashProcedures(List\ serverNames) instead. --- * [HBASE-22617](https://issues.apache.org/jira/browse/HBASE-22617) | *Blocker* | **Recovered WAL directories not getting cleaned up** In HBASE-20734 we moved the recovered.edits onto the wal file system but when constructing the directory we missed the BASE\_NAMESPACE\_DIR('data'). So when using the default config, you will find that there are lots of new directories at the same level with the 'data' directory. In this issue, we add the BASE\_NAMESPACE\_DIR back, and also try our best to clean up the wrong directories. But we can only clean up the region level directories, so if you want a clean fs layout on HDFS you still need to manually delete the empty directories at the same level with 'data'. The effect versions are 2.2.0, 2.1.[1-5], 1.4.[8-10], 1.3.[3-5]. --- * [HBASE-21995](https://issues.apache.org/jira/browse/HBASE-21995) | *Major* | **Add a coprocessor to set HDFS ACL for hbase granted user** Add a coprocessor to set HDFS acls to make hbase granted users with READ permission have the access to scan snapshots. To use this feature, please make sure the HDFS config is set: dfs.namenode.acls.enabled=true fs.permissions.umask-mode=027 and set the HBase config: hbase.coprocessor.master.classes="org.apache.hadoop.hbase.security.access.AccessController,org.apache.hadoop.hbase.security.access.SnapshotScannerHDFSAclController" hbase.user.scan.snapshot.enable=true --- * [HBASE-22596](https://issues.apache.org/jira/browse/HBASE-22596) | *Minor* | **[Chore] Separate the execution period between CompactionChecker and PeriodicMemStoreFlusher** hbase.regionserver.compaction.check.period is used for controlling how often the compaction checker runs. If unset, will use hbase.server.thread.wakefrequency as default value. hbase.regionserver.flush.check.period is used for controlling how ofter the flush checker runs. If unset, will use hbase.server.thread.wakefrequency as default value. --- * [HBASE-22588](https://issues.apache.org/jira/browse/HBASE-22588) | *Major* | **Upgrade jaxws-ri dependency to 2.3.2** When run with JDK11 HBase now uses more recent version of the jaxws reference implementation (v2.3.2). --- * [HBASE-21536](https://issues.apache.org/jira/browse/HBASE-21536) | *Trivial* | **Fix completebulkload usage instructions** Added completebulkload short name for BulkLoadHFilesTool to bin/hbase. --- * [HBASE-22500](https://issues.apache.org/jira/browse/HBASE-22500) | *Blocker* | **Modify pom and jenkins jobs for hadoop versions** Change the default hadoop-3 version to 3.1.2. Drop the support for the releases which are effected by CVE-2018-8029, see this email https://lists.apache.org/thread.html/3d6831c3893cd27b6850aea2feff7d536888286d588e703c6ffd2e82@%3Cuser.hadoop.apache.org%3E --- * [HBASE-22459](https://issues.apache.org/jira/browse/HBASE-22459) | *Minor* | **Expose store reader reference count** This change exposes the aggregate count of store reader references for a given store as 'storeRefCount' in region metrics and ClusterStatus. --- * [HBASE-22469](https://issues.apache.org/jira/browse/HBASE-22469) | *Minor* | **replace md5 checksum in saveVersion script with sha512 for hbase version information** The HBase "source checksum" now uses SHA512 instead of MD5. --- * [HBASE-22148](https://issues.apache.org/jira/browse/HBASE-22148) | *Blocker* | **Provide an alternative to CellUtil.setTimestamp** The `CellUtil.setTimestamp` method changes to be an API with audience `LimitedPrivate(COPROC)` in HBase 3.0. With that designation the API should remain stable within a given minor release line, but may change between minor releases. Previously, this method was deprecated in HBase 2.0 for removal in HBase 3.0. Deprecation messages in HBase 2.y releases have been updated to indicate the expected API audience change. --- * [HBASE-20782](https://issues.apache.org/jira/browse/HBASE-20782) | *Minor* | **Fix duplication of TestServletFilter.access** The access method was used to the HttpServerFunctionalTest class as a common place. --- * [HBASE-21991](https://issues.apache.org/jira/browse/HBASE-21991) | *Major* | **Fix MetaMetrics issues - [Race condition, Faulty remove logic], few improvements** The class LossyCounting was unintentionally marked Public but was never intended to be part of our public API. This oversight has been corrected and LossyCounting is now marked as Private and going forward may be subject to additional breaking changes or removal without notice. If you have taken a dependency on this class we recommend cloning it locally into your project before upgrading to this release. --- * [HBASE-22226](https://issues.apache.org/jira/browse/HBASE-22226) | *Trivial* | **Incorrect level for headings in asciidoc** Warnings for level headings are corrected in the book for the HBase Incompatibilities section. --- * [HBASE-20970](https://issues.apache.org/jira/browse/HBASE-20970) | *Major* | **Update hadoop check versions for hadoop3 in hbase-personality** Add hadoop 3.0.3, 3.1.1 3.1.2 in our hadoop check jobs. --- * [HBASE-21784](https://issues.apache.org/jira/browse/HBASE-21784) | *Major* | **Dump replication queue should show list of wal files ordered chronologically** The DumpReplicationQueues tool will now list replication queues sorted in chronological order. --- * [HBASE-21048](https://issues.apache.org/jira/browse/HBASE-21048) | *Major* | **Get LogLevel is not working from console in secure environment** Support get\|set LogLevel in secure(kerberized) environment. --- * [HBASE-22384](https://issues.apache.org/jira/browse/HBASE-22384) | *Minor* | **Formatting issues in administration section of book** Fixes a formatting issue in the administration section of the book, where listing indentation were a little bit off. --- * [HBASE-22377](https://issues.apache.org/jira/browse/HBASE-22377) | *Major* | **Provide API to check the existence of a namespace which does not require ADMIN permissions** This change adds the new method listNamespaces to the Admin interface, which can be used to retrieve a list of the namespaces present in the schema as an unprivileged operation. Formerly the only available method for accomplishing this was listNamespaceDescriptors, which requires GLOBAL CREATE or ADMIN permissions. --- * [HBASE-22399](https://issues.apache.org/jira/browse/HBASE-22399) | *Major* | **Change default hadoop-two.version to 2.8.x and remove the 2.7.x hadoop checks** Now the default hadoop-two.version has been changed to 2.8.5, and all hadoop versions before 2.8.2(exclude) will not be supported any more. --- * [HBASE-22392](https://issues.apache.org/jira/browse/HBASE-22392) | *Trivial* | **Remove extra/useless +** Removed extra + in HRegion, HStore and LoadIncrementalHFiles for branch-2 and HRegion and HStore for branch-1. --- * [HBASE-20494](https://issues.apache.org/jira/browse/HBASE-20494) | *Major* | **Upgrade com.yammer.metrics dependency** Updated metrics core from 3.2.1 to 3.2.6. --- * [HBASE-22358](https://issues.apache.org/jira/browse/HBASE-22358) | *Minor* | **Change rubocop configuration for method length** The rubocop definition for the maximum method length was set to 75. --- * [HBASE-22379](https://issues.apache.org/jira/browse/HBASE-22379) | *Minor* | **Fix Markdown for "Voting on Release Candidates" in book** Fixes the formatting of the "Voting on Release Candidates" to actually show the quote and code formatting of the RAT check. --- * [HBASE-20851](https://issues.apache.org/jira/browse/HBASE-20851) | *Minor* | **Change rubocop config for max line length of 100** The rubocop configuration in the hbase-shell module now allows a line length with 100 characters, instead of 80 as before. For everything before 2.1.5 this change introduces rubocop itself. --- * [HBASE-22301](https://issues.apache.org/jira/browse/HBASE-22301) | *Minor* | **Consider rolling the WAL if the HDFS write pipeline is slow** This change adds new conditions for rolling the WAL for when syncs on the HDFS writer pipeline are perceived to be slow. As before the configuration parameter hbase.regionserver.wal.slowsync.ms sets the slow sync warning threshold. If we encounter hbase.regionserver.wal.slowsync.roll.threshold number of slow syncs (default 100) within the interval defined by hbase.regionserver.wal.slowsync.roll.interval.ms (default 1 minute), we will request a WAL roll. Or, if the time for any sync exceeds the threshold set by hbase.regionserver.wal.roll.on.sync.ms (default 10 seconds) we will request a WAL roll immediately. Operators can monitor how often these new thresholds result in a WAL roll by looking at newly added metrics to the WAL related metric group: \* slowSyncRollRequest - How many times a roll was requested due to sync too slow on the write pipeline. Additionally, as a part of this change there are also additional metrics for existing reasons for a WAL roll: \* errorRollRequest - How many times a roll was requested due to I/O or other errors. \* sizeRollRequest - How many times a roll was requested due to file size roll threshold. --- * [HBASE-21883](https://issues.apache.org/jira/browse/HBASE-21883) | *Minor* | **Enhancements to Major Compaction tool** MajorCompactorTTL Tool allows to compact all regions in a table that have been TTLed out. This saves space on DFS and is useful for tables which are similar to time series data. This is typically scheduled to run frequently (say via cron) to cleanup old data on an ongoing basis. RSGroupMajorCompactionTTL tool is similar to MajorCompactorTTL but runs at a region server group level. If multiple tables in an rsgroup are similar to time-series data, then it runs a single command to clean them up. As more tables are added/removed from rsgroup, it's easy to have a single command to take care of all of them. --- * [HBASE-22054](https://issues.apache.org/jira/browse/HBASE-22054) | *Minor* | **Space Quota: Compaction is not working for super user in case of NO\_WRITES\_COMPACTIONS** This change allows the system and superusers to initiate compactions, even when a space quota violation policy disallows compactions from happening. The original intent behind disallowing of compactions was to prevent end-user compactions from creating undue I/O load, not disallowing \*any\* compaction in the system. --- * [HBASE-22083](https://issues.apache.org/jira/browse/HBASE-22083) | *Minor* | **move eclipse specific configs into a profile** Maven project integration for Eclipse has been isolated into a maven profile to ensure it only is active when in an Eclipse project. Things should continue to behave the same for Eclipse users. If something should go wrong folks should manually activate the `eclipse-specific` profile. --- * [HBASE-22307](https://issues.apache.org/jira/browse/HBASE-22307) | *Major* | **Deprecated Preemptive Fail Fast** Deprecated Preemptive Fail Fast related constants in HConstants, the support of this feature will be removed in 3.0.0 so use these constants will have no effect for 3.0.0+ releases. And the constants will be kept till 4.0.0. Users can use 'hbase.client.perserver.requests.threshold' to control the number of concurrent requests to the same region server. Please see the release note of HBASE-16388 for more details. --- * [HBASE-22292](https://issues.apache.org/jira/browse/HBASE-22292) | *Blocker* | **PreemptiveFastFailInterceptor clean repeatedFailuresMap issue** Adds new configuration hbase.client.failure.map.cleanup.interval which defaults to ten minutes. --- * [HBASE-19222](https://issues.apache.org/jira/browse/HBASE-19222) | *Major* | **update jruby to 9.1.17.0** The default version of JRuby shipped with HBase has been updated to the JRuby 9.1.17.0 release. For details on changes see [the release notes for JRuby 9.1.17.0](https://www.jruby.org/2018/04/23/jruby-9-1-17-0) --- * [HBASE-22279](https://issues.apache.org/jira/browse/HBASE-22279) | *Major* | **Add a getRegionLocator method in Table/AsyncTable interface** Add below method in Table interface: RegionLocator getRegionLocator() throws IOException; Add below methods in AsyncTable interface: AsyncTableRegionLocator getRegionLocator(); CompletableFuture\ getDescriptor(); --- * [HBASE-15560](https://issues.apache.org/jira/browse/HBASE-15560) | *Major* | **TinyLFU-based BlockCache** LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and recency of the working set. It achieves concurrency by using an O(n) background thread to prioritize the entries and evict. Accessing an entry is O(1) by a hash table lookup, recording its logical access time, and setting a frequency flag. A write is performed in O(1) time by updating the hash table and triggering an async eviction thread. This provides ideal concurrency and minimizes the latencies by penalizing the thread instead of the caller. However the policy does not age the frequencies and may not be resilient to various workload patterns. This change introduces a new L1 policy, TinyLfuBlockCache, which records the frequency in a counting sketch, ages periodically by halving the counters, and orders entries by SLRU. An entry is discarded by comparing the frequency of the new arrival to the SLRU's victim, and keeping the one with the highest frequency. This allows the operations to be performed in O(1) time and, though the use of a compact sketch, a much larger history is retained beyond the current working set. In a variety of real world traces the policy had near optimal hit rates. New configuration variable hfile.block.cache.policy sets the eviction policy for the L1 block cache. The default is "LRU" (LruBlockCache). Set to "TinyLFU" to use TinyLfuBlockCache instead. --- * [HBASE-22178](https://issues.apache.org/jira/browse/HBASE-22178) | *Major* | **Introduce a createTableAsync with TableDescriptor method in Admin** Introduced Future\ createTableAsync(TableDescriptor); --- * [HBASE-22108](https://issues.apache.org/jira/browse/HBASE-22108) | *Major* | **Avoid passing null in Admin methods** Introduced these methods: void move(byte[]); void move(byte[], ServerName); Future\ splitRegionAsync(byte[]); These methods are deprecated: void move(byte[], byte[]) --- * [HBASE-22152](https://issues.apache.org/jira/browse/HBASE-22152) | *Major* | **Create a jenkins file for yetus to processing GitHub PR** Add a new jenkins file for running pre commit check for GitHub PR. --- * [HBASE-22007](https://issues.apache.org/jira/browse/HBASE-22007) | *Major* | **Add restoreSnapshot and cloneSnapshot with acl methods in AsyncAdmin** Add cloneSnapshot/restoreSnapshot with acl methods in AsyncAdmin. --- * [HBASE-22123](https://issues.apache.org/jira/browse/HBASE-22123) | *Minor* | **REST gateway reports Insufficient permissions exceptions as 404 Not Found** When insufficient permissions, you now get: HTTP/1.1 403 Forbidden on the HTTP side, and in the message Forbidden org.apache.hadoop.hbase.security.AccessDeniedException: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions for user ‘myuser',action: get, tableName:mytable, family:cf. at org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor.authorizeAccess(RangerAuthorizationCoprocessor.java:547) and the rest of the ADE stack --- * [HBASE-22100](https://issues.apache.org/jira/browse/HBASE-22100) | *Minor* | **False positive for error prone warnings in pre commit job** Now we will sort the javac WARNING/ERROR before generating diff in pre-commit so we can get a stable output for the error prone. The downside is that we just sort the output lexicographically so the line number will also be sorted lexicographically, which is a bit strange to human. --- * [HBASE-22057](https://issues.apache.org/jira/browse/HBASE-22057) | *Major* | **Impose upper-bound on size of ZK ops sent in a single multi()** Exposes a new configuration property "zookeeper.multi.max.size" which dictates the maximum size of deletes that HBase will make to ZooKeeper in a single RPC. This property defaults to 1MB, which should fall beneath the default ZooKeeper limit of 2MB, controlled by "jute.maxbuffer". --- * [HBASE-22052](https://issues.apache.org/jira/browse/HBASE-22052) | *Major* | **pom cleaning; filter out jersey-core in hadoop2 to match hadoop3 and remove redunant version specifications** Fixed awkward dependency issue that prevented site building. #### note specific to HBase 2.1.4 HBase 2.1.4 shipped with an early version of this fix that incorrectly altered the libraries included in our binary assembly for using Apache Hadoop 2.7 (the current build default Hadoop version for 2.1.z). For folks running out of the box against a Hadoop 2.7 cluster (or folks who skip the installation step of [replacing the bundled Hadoop libraries](http://hbase.apache.org/book.html#hadoop)) this will result in a failure at Region Server startup due to a missing class definition. e.g.: ``` 2019-03-27 09:02:05,779 ERROR [main] regionserver.HRegionServer: Failed construction RegionServer java.lang.NoClassDefFoundError: org/apache/htrace/SamplerBuilder at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:644) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:628) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2701) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2683) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:372) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:171) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:356) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.hadoop.hbase.util.CommonFSUtils.getRootDir(CommonFSUtils.java:362) at org.apache.hadoop.hbase.util.CommonFSUtils.isValidWALRootDir(CommonFSUtils.java:411) at org.apache.hadoop.hbase.util.CommonFSUtils.getWALRootDir(CommonFSUtils.java:387) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeFileSystem(HRegionServer.java:704) at org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:613) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:3029) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:63) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149) at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:3047) Caused by: java.lang.ClassNotFoundException: org.apache.htrace.SamplerBuilder at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 26 more ``` Workaround via any _one_ of the following: * If you are running against a Hadoop cluster that is 2.8+, ensure you replace the Hadoop libaries in the default binary assembly with those for your version. * If you are running against a Hadoop cluster that is 2.8+, build the binary assembly from the source release while specifying your Hadoop version. * If you are running against a Hadoop cluster that is a supported 2.7 release, ensure the `hadoop` executable is in the `PATH` seen at Region Server startup and that you are not using the `HBASE_DISABLE_HADOOP_CLASSPATH_LOOKUP` bypass. * For any supported Hadoop version, manually make the Apache HTrace artifact `htrace-core-3.1.0-incubating.jar` available to all Region Servers via the HBASE_CLASSPATH environment variable. * For any supported Hadoop version, manually make the Apache HTrace artifact `htrace-core-3.1.0-incubating.jar` available to all Region Servers by copying it into the directory `${HBASE_HOME}/lib/client-facing-thirdparty/`. --- * [HBASE-22065](https://issues.apache.org/jira/browse/HBASE-22065) | *Major* | **Add listTableDescriptors(List\) method in AsyncAdmin** Add a listTableDescriptors(List\) method in the AsyncAdmin interface, to align with the Admin interface. --- * [HBASE-22063](https://issues.apache.org/jira/browse/HBASE-22063) | *Major* | **Deprecated Admin.deleteSnapshot(byte[])** Deprecate Admin.deleteSnapshot(byte[]), please use the String version instead. --- * [HBASE-22040](https://issues.apache.org/jira/browse/HBASE-22040) | *Major* | **Add mergeRegionsAsync with a List of region names method in AsyncAdmin** Add a mergeRegionsAsync(byte[][], boolean) method in the AsyncAdmin interface. Instead of using assert, now we will throw IllegalArgumentException when you want to merge less than 2 regions at client side. And also, at master side, instead of using assert, now we will throw DoNotRetryIOException if you want merge more than 2 regions, since we only support merging two regions at once for now. --- * [HBASE-22039](https://issues.apache.org/jira/browse/HBASE-22039) | *Major* | **Should add the synchronous parameter for the XXXSwitch method in AsyncAdmin** Add drainXXX parameter for balancerSwitch/splitSwitch/mergeSwitch methods in the AsyncAdmin interface, which has the same meaning with the synchronous parameter for these methods in the Admin interface. --- * [HBASE-22044](https://issues.apache.org/jira/browse/HBASE-22044) | *Major* | **ByteBufferUtils should not be IA.Public API** As of HBase 3.0, the ByteBufferUtils class is now marked as a Private API for internal project use only. Downstream users are advised that it no longer has any compatibility promises across releases. As of earlier HBase release lines the class is now marked as deprecated to call attention to this planned transition. --- * [HBASE-21810](https://issues.apache.org/jira/browse/HBASE-21810) | *Major* | **bulkload support set hfile compression on client** bulkload (HFileOutputFormat2) support config the compression on client ,you can set the job configuration "hbase.mapreduce.hfileoutputformat.compression" override the auto-detection of the target table's compression --- * [HBASE-22001](https://issues.apache.org/jira/browse/HBASE-22001) | *Major* | **Polish the Admin interface** Add a cloneSnapshotAsync method with restoreAcl parameter. Deprecated restoreSnapshotAsync method as it just ignores the failsafe configuration. Make snapshotAsync method returns a Future\. Deprecated the snapshot related methods which take a 'byte[]' as the snapshot name. Use default methods to reduce the code base for implementation classes. --- * [HBASE-22000](https://issues.apache.org/jira/browse/HBASE-22000) | *Major* | **Deprecated isTableAvailable with splitKeys** Deprecated AsyncTable.isTableAvailable(TableName, byte[][]). --- * [HBASE-21871](https://issues.apache.org/jira/browse/HBASE-21871) | *Major* | **Support to specify a peer table name in VerifyReplication tool** After HBASE-21871, we can specify a peer table name with --peerTableName in VerifyReplication tool like the following: hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication --peerTableName=peerTable 5 TestTable In addition, we can compare any 2 tables in any remote clusters with specifying both peerId and --peerTableName. For example: hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication --peerTableName=peerTable zk1,zk2,zk3:2181/hbase TestTable --- * [HBASE-15728](https://issues.apache.org/jira/browse/HBASE-15728) | *Major* | **Add remaining per-table region / store / flush / compaction related metrics** Adds below flush, split, and compaction metrics + // split related metrics + private MutableFastCounter splitRequest; + private MutableFastCounter splitSuccess; + private MetricHistogram splitTimeHisto; + + // flush related metrics + private MetricHistogram flushTimeHisto; + private MetricHistogram flushMemstoreSizeHisto; + private MetricHistogram flushOutputSizeHisto; + private MutableFastCounter flushedMemstoreBytes; + private MutableFastCounter flushedOutputBytes; + + // compaction related metrics + private MetricHistogram compactionTimeHisto; + private MetricHistogram compactionInputFileCountHisto; + private MetricHistogram compactionInputSizeHisto; + private MetricHistogram compactionOutputFileCountHisto; + private MetricHistogram compactionOutputSizeHisto; + private MutableFastCounter compactedInputBytes; + private MutableFastCounter compactedOutputBytes; + + private MetricHistogram majorCompactionTimeHisto; + private MetricHistogram majorCompactionInputFileCountHisto; + private MetricHistogram majorCompactionInputSizeHisto; + private MetricHistogram majorCompactionOutputFileCountHisto; + private MetricHistogram majorCompactionOutputSizeHisto; + private MutableFastCounter majorCompactedInputBytes; + private MutableFastCounter majorCompactedOutputBytes; --- * [HBASE-21481](https://issues.apache.org/jira/browse/HBASE-21481) | *Major* | **[acl] Superuser's permissions should not be granted or revoked by any non-su global admin** HBASE-21481 improves the quality of access control, by strengthening the protection of super users's privileges. --- * [HBASE-21082](https://issues.apache.org/jira/browse/HBASE-21082) | *Critical* | **Reimplement assign/unassign related procedure metrics** Now we have four types of RIT procedure metrics, assign, unassign, move, reopen. The meaning of assign/unassign is changed, as we will not increase the unassign metric and then the assign metric when moving a region. Also introduced two new procedure metrics, open and close, which are used to track the open/close region calls to region server. We may send open/close multiple times to finish a RIT since we may retry multiple times. --- * [HBASE-20724](https://issues.apache.org/jira/browse/HBASE-20724) | *Critical* | **Sometimes some compacted storefiles are still opened after region failover** Problem: This is an old problem since HBASE-2231. The compaction event marker was only writed to WAL. But after flush, the WAL may be archived, which means an useful compaction event marker be deleted, too. So the compacted store files cannot be archived when region open and replay WAL. Solution: After this jira, the compaction event tracker will be writed to HFile. When region open and load store files, read the compaction evnet tracker from HFile and archive the compacted store files which still exist. --- * [HBASE-21820](https://issues.apache.org/jira/browse/HBASE-21820) | *Major* | **Implement CLUSTER quota scope** HBase contains two quota scopes: MACHINE and CLUSTER. Before this patch, set quota operations did not expose scope option to client api and use MACHINE as default, CLUSTER scope can not be set and used. Shell commands are as follows: set\_quota, TYPE =\> THROTTLE, TABLE =\> 't1', LIMIT =\> '10req/sec' This issue implements CLUSTER scope in a simple way: For user, namespace, user over namespace quota, use [ClusterLimit / RSNum] as machine limit. For table and user over table quota, use [ClusterLimit / TotalTableRegionNum \* MachineTableRegionNum] as machine limit. After this patch, user can set CLUSTER scope quota, but MACHINE is still default if user ignore scope. Shell commands are as follows: set\_quota, TYPE =\> THROTTLE, TABLE =\> 't1', LIMIT =\> '10req/sec' set\_quota, TYPE =\> THROTTLE, TABLE =\> 't1', LIMIT =\> '10req/sec', SCOPE =\> MACHINE set\_quota, TYPE =\> THROTTLE, TABLE =\> 't1', LIMIT =\> '10req/sec', SCOPE =\> CLUSTER --- * [HBASE-21057](https://issues.apache.org/jira/browse/HBASE-21057) | *Minor* | **upgrade to latest spotbugs** Change spotbugs version to 3.1.11. --- * [HBASE-21505](https://issues.apache.org/jira/browse/HBASE-21505) | *Major* | **Several inconsistencies on information reported for Replication Sources by hbase shell status 'replication' command.** This modifies "status 'replication'" output, fixing inconsistencies on the reporting times and ages of last shipped edits, as well as wrong calculation of replication lags. It also introduces additional info for each recovery queue, which was not accounted by this command before. The new output for "status 'replication'" command is explained in details below: a) Source started, target stopped, no edits arrived on source yet: ... SOURCE: PeerID=1 Normal Queue: 1 No Ops shipped since last restart, SizeOfLogQueue=1, No edits for this source since it started, Replication Lag=0 ... b) Source started, target stopped, add edit on source: ... Normal Queue: 1 No Ops shipped since last restart, SizeOfLogQueue=1, TimeStampOfLastArrivedInSource=Wed Nov 21 07:21:00 GMT 2018, Replication Lag=2459 ... c) Source started, target stopped, edit added on source, restart source: ... SOURCE: PeerID=1 Normal Queue: 1 No Ops shipped since last restart, SizeOfLogQueue=1, No edits for this source since it started, Replication Lag=0 Recovered Queue: 1-hbase01.home,16020,1542784524057 No Ops shipped since last restart, SizeOfLogQueue=1, TimeStampOfLastArrivedInSource=Wed Nov 21 07:23:00 GMT 2018, Replication Lag=201495 ... d) Source started, target stopped, add edit on source, restart source, add another edit on source: ... SOURCE: PeerID=1 Normal Queue: 1 No Ops shipped since last restart, SizeOfLogQueue=1, TimeStampOfLastArrivedInSource=Wed Nov 21 07:02:28 GMT 2018, Replication Lag=6349 Recovered Queue: 1-hbase01.home,16020,1542782758742 No Ops shipped since last restart, SizeOfLogQueue=0, TimeStampOfLastArrivedInSource=Wed Nov 21 06:53:05 GMT 2018, Replication Lag=569394 ... e) Source started, target stopped, add edit on source, restart source, add another edit on source, start target: ... SOURCE: PeerID=1 Normal Queue: 1 AgeOfLastShippedOp=30000, TimeStampOfLastShippedOp=Wed Nov 21 07:07:58 GMT 2018, SizeOfLogQueue=1, TimeStampOfLastArrivedInSource=Wed Nov 21 07:02:28 GMT 2018, Replication Lag=0 ... f) Source started, target stopped, add edit on source, restart source, restart target: ... SOURCE: PeerID=1 Normal Queue: 1 No Ops shipped since last restart, SizeOfLogQueue=1, No edits for this source since it started, Replication Lag=0 ... --- * [HBASE-21922](https://issues.apache.org/jira/browse/HBASE-21922) | *Major* | **BloomContext#sanityCheck may failed when use ROWPREFIX\_DELIMITED bloom filter** Remove bloom filter type ROWPREFIX\_DELIMITED. May add it back when find a better solution. --- * [HBASE-21783](https://issues.apache.org/jira/browse/HBASE-21783) | *Major* | **Support exceed user/table/ns throttle quota if region server has available quota** Support enable or disable exceed throttle quota. Exceed throttle quota means, user can over consume user/namespace/table quota if region server has additional available quota because other users don't consume at the same time. Use the following shell commands to enable/disable exceed throttle quota: enable\_exceed\_throttle\_quota disable\_exceed\_throttle\_quota There are two limits when enable exceed throttle quota: 1. Must set at least one read and one write region server throttle quota; 2. All region server throttle quotas must be in seconds time unit. Because once previous requests exceed their quota and consume region server quota, quota in other time units may be refilled in a long time, this may affect later requests. --- * [HBASE-20587](https://issues.apache.org/jira/browse/HBASE-20587) | *Major* | **Replace Jackson with shaded thirdparty gson** Remove jackson dependencies from most hbase modules except hbase-rest, use shaded gson instead. The output json will be a bit different since jackson can use getter/setter, but gson will always use the fields. --- * [HBASE-21928](https://issues.apache.org/jira/browse/HBASE-21928) | *Major* | **Deprecated HConstants.META\_QOS** Mark HConstants.META\_QOS as deprecated. It is for internal use only, which is the highest priority. You should not try to set a priority greater than or equal to this value, although it is no harm but also useless. --- * [HBASE-17942](https://issues.apache.org/jira/browse/HBASE-17942) | *Major* | **Disable region splits and merges per table** This patch adds the ability to disable split and/or merge for a table (By default, split and merge are enabled for a table). --- * [HBASE-21636](https://issues.apache.org/jira/browse/HBASE-21636) | *Major* | **Enhance the shell scan command to support missing scanner specifications like ReadType, IsolationLevel etc.** Allows shell to set Scan options previously not exposed. See additions as part of the scan help by typing following hbase shell: hbase\> help 'scan' --- * [HBASE-21201](https://issues.apache.org/jira/browse/HBASE-21201) | *Major* | **Support to run VerifyReplication MR tool without peerid** We can specify peerQuorumAddress instead of peerId in VerifyReplication tool. So it no longer requires peerId to be setup when using this tool. For example: hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication zk1,zk2,zk3:2181/hbase testTable --- * [HBASE-21838](https://issues.apache.org/jira/browse/HBASE-21838) | *Major* | **Create a special ReplicationEndpoint just for verifying the WAL entries are fine** Introduce a VerifyWALEntriesReplicationEndpoint which replicates nothing but only verifies if all the cells are valid. It can be used to capture bugs for writing WAL, as most times we will not read the WALs again after writing it if there are no region server crashes. --- * [HBASE-21764](https://issues.apache.org/jira/browse/HBASE-21764) | *Major* | **Size of in-memory compaction thread pool should be configurable** Introduced an new config key in this issue: hbase.regionserver.inmemory.compaction.pool.size. the default value would be 10. you can configure this to set the pool size of in-memory compaction pool. Note that all memstores in one region server will share the same pool, so if you have many regions in one region server, you need to set this larger to compact faster for better read performance. --- * [HBASE-21684](https://issues.apache.org/jira/browse/HBASE-21684) | *Major* | **Throw DNRIOE when connection or rpc client is closed** Make StoppedRpcClientException extend DoNotRetryIOException. --- * [HBASE-21739](https://issues.apache.org/jira/browse/HBASE-21739) | *Major* | **Move grant/revoke from regionserver to master** To implement user permission control in Precedure V2, move grant and revoke method from AccessController to master firstly. Mark AccessController#grant and AccessController#revoke as deprecated and please use Admin#grant and Admin#revoke instead. --- * [HBASE-21791](https://issues.apache.org/jira/browse/HBASE-21791) | *Blocker* | **Upgrade thrift dependency to 0.12.0** IMPORTANT: Due to security issues, all users who use hbase thrift should avoid using releases which do not have this fix. The effect releases are: 2.1.x: 2.1.2 and below 2.0.x: 2.0.4 and below 1.x: 1.4.x and below If you are using the effect releases above, please consider upgrading to a newer release ASAP. --- * [HBASE-20894](https://issues.apache.org/jira/browse/HBASE-20894) | *Major* | **Move BucketCache from java serialization to protobuf** For users who have configured hbase.bucketcache.ioengine with either the file:, files:, or mmap: prefix, and configured it to be persistent via the hbase.bucketcache.persistent.path property, the serialization format of the bucket cache has changed between versions. The old state will not be read during startup, and there is currently no migration path. The impact is expected to be minimal, however, since the cache will rebuild over time as access patterns dictate. # HBASE 2.2.0 Release Notes These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements. --- * [HBASE-21970](https://issues.apache.org/jira/browse/HBASE-21970) | *Major* | **Document that how to upgrade from 2.0 or 2.1 to 2.2+** See the document http://hbase.apache.org/book.html#upgrade2.2 about how to upgrade from 2.0 or 2.1 to 2.2+. HBase 2.2+ uses a new Procedure form assiging/unassigning/moving Regions. It does not process HBase 2.1 and 2.0's Unassign/Assign Procedure types. Upgrade requires that we first drain the Master Procedure Store of old style Procedures before starting the new 2.2 Master. So you need to make sure that before you kill the old version (2.0 or 2.1) Master, there is no region in transition. And once the new version (2.2+) Master is up, you can rolling upgrade RegionServers one by one. And there is a more safer way if you are running 2.1.1+ or 2.0.3+ cluster. It need four steps to upgrade Master. 1. Shutdown both active and standby Masters (Your cluster will continue to server reads and writes without interruption). 2. Set the property hbase.procedure.upgrade-to-2-2 to true in hbase-site.xml for the Master, and start only one Master, still using the 2.1.1+ (or 2.0.3+) version. 3. Wait until the Master quits. Confirm that there is a 'READY TO ROLLING UPGRADE' message in the Master log as the cause of the shutdown. The Procedure Store is now empty. 4. Start new Masters with the new 2.2+ version. Then you can rolling upgrade RegionServers one by one. See HBASE-21075 for more details. --- * [HBASE-21536](https://issues.apache.org/jira/browse/HBASE-21536) | *Trivial* | **Fix completebulkload usage instructions** Added completebulkload short name for BulkLoadHFilesTool to bin/hbase. --- * [HBASE-22500](https://issues.apache.org/jira/browse/HBASE-22500) | *Blocker* | **Modify pom and jenkins jobs for hadoop versions** Change the default hadoop-3 version to 3.1.2. Drop the support for the releases which are effected by CVE-2018-8029, see this email https://lists.apache.org/thread.html/3d6831c3893cd27b6850aea2feff7d536888286d588e703c6ffd2e82@%3Cuser.hadoop.apache.org%3E --- * [HBASE-22148](https://issues.apache.org/jira/browse/HBASE-22148) | *Blocker* | **Provide an alternative to CellUtil.setTimestamp** The `CellUtil.setTimestamp` method changes to be an API with audience `LimitedPrivate(COPROC)` in HBase 3.0. With that designation the API should remain stable within a given minor release line, but may change between minor releases. Previously, this method was deprecated in HBase 2.0 for removal in HBase 3.0. Deprecation messages in HBase 2.y releases have been updated to indicate the expected API audience change. --- * [HBASE-21991](https://issues.apache.org/jira/browse/HBASE-21991) | *Major* | **Fix MetaMetrics issues - [Race condition, Faulty remove logic], few improvements** The class LossyCounting was unintentionally marked Public but was never intended to be part of our public API. This oversight has been corrected and LossyCounting is now marked as Private and going forward may be subject to additional breaking changes or removal without notice. If you have taken a dependency on this class we recommend cloning it locally into your project before upgrading to this release. --- * [HBASE-22226](https://issues.apache.org/jira/browse/HBASE-22226) | *Trivial* | **Incorrect level for headings in asciidoc** Warnings for level headings are corrected in the book for the HBase Incompatibilities section. --- * [HBASE-20970](https://issues.apache.org/jira/browse/HBASE-20970) | *Major* | **Update hadoop check versions for hadoop3 in hbase-personality** Add hadoop 3.0.3, 3.1.1 3.1.2 in our hadoop check jobs. --- * [HBASE-21784](https://issues.apache.org/jira/browse/HBASE-21784) | *Major* | **Dump replication queue should show list of wal files ordered chronologically** The DumpReplicationQueues tool will now list replication queues sorted in chronological order. --- * [HBASE-22384](https://issues.apache.org/jira/browse/HBASE-22384) | *Minor* | **Formatting issues in administration section of book** Fixes a formatting issue in the administration section of the book, where listing indentation were a little bit off. --- * [HBASE-22399](https://issues.apache.org/jira/browse/HBASE-22399) | *Major* | **Change default hadoop-two.version to 2.8.x and remove the 2.7.x hadoop checks** Now the default hadoop-two.version has been changed to 2.8.5, and all hadoop versions before 2.8.2(exclude) will not be supported any more. --- * [HBASE-22392](https://issues.apache.org/jira/browse/HBASE-22392) | *Trivial* | **Remove extra/useless +** Removed extra + in HRegion, HStore and LoadIncrementalHFiles for branch-2 and HRegion and HStore for branch-1. --- * [HBASE-20494](https://issues.apache.org/jira/browse/HBASE-20494) | *Major* | **Upgrade com.yammer.metrics dependency** Updated metrics core from 3.2.1 to 3.2.6. --- * [HBASE-22358](https://issues.apache.org/jira/browse/HBASE-22358) | *Minor* | **Change rubocop configuration for method length** The rubocop definition for the maximum method length was set to 75. --- * [HBASE-22379](https://issues.apache.org/jira/browse/HBASE-22379) | *Minor* | **Fix Markdown for "Voting on Release Candidates" in book** Fixes the formatting of the "Voting on Release Candidates" to actually show the quote and code formatting of the RAT check. --- * [HBASE-20851](https://issues.apache.org/jira/browse/HBASE-20851) | *Minor* | **Change rubocop config for max line length of 100** The rubocop configuration in the hbase-shell module now allows a line length with 100 characters, instead of 80 as before. For everything before 2.1.5 this change introduces rubocop itself. --- * [HBASE-22054](https://issues.apache.org/jira/browse/HBASE-22054) | *Minor* | **Space Quota: Compaction is not working for super user in case of NO\_WRITES\_COMPACTIONS** This change allows the system and superusers to initiate compactions, even when a space quota violation policy disallows compactions from happening. The original intent behind disallowing of compactions was to prevent end-user compactions from creating undue I/O load, not disallowing \*any\* compaction in the system. --- * [HBASE-22292](https://issues.apache.org/jira/browse/HBASE-22292) | *Blocker* | **PreemptiveFastFailInterceptor clean repeatedFailuresMap issue** Adds new configuration hbase.client.failure.map.cleanup.interval which defaults to ten minutes. --- * [HBASE-22155](https://issues.apache.org/jira/browse/HBASE-22155) | *Major* | **Move 2.2.0 on to hbase-thirdparty-2.2.0** Updates libs used internally by hbase via hbase-thirdparty as follows: gson 2.8.1 -\\\> 2.8.5 guava 22.0 -\\\> 27.1-jre pb 3.5.1 -\\\> 3.7.0 netty 4.1.17 -\\\> 4.1.34 commons-collections4 4.1 -\\\> 4.3 --- * [HBASE-22178](https://issues.apache.org/jira/browse/HBASE-22178) | *Major* | **Introduce a createTableAsync with TableDescriptor method in Admin** Introduced Future\ createTableAsync(TableDescriptor); --- * [HBASE-22108](https://issues.apache.org/jira/browse/HBASE-22108) | *Major* | **Avoid passing null in Admin methods** Introduced these methods: void move(byte[]); void move(byte[], ServerName); Future\ splitRegionAsync(byte[]); These methods are deprecated: void move(byte[], byte[]) --- * [HBASE-22152](https://issues.apache.org/jira/browse/HBASE-22152) | *Major* | **Create a jenkins file for yetus to processing GitHub PR** Add a new jenkins file for running pre commit check for GitHub PR. --- * [HBASE-22007](https://issues.apache.org/jira/browse/HBASE-22007) | *Major* | **Add restoreSnapshot and cloneSnapshot with acl methods in AsyncAdmin** Add cloneSnapshot/restoreSnapshot with acl methods in AsyncAdmin. --- * [HBASE-22123](https://issues.apache.org/jira/browse/HBASE-22123) | *Minor* | **REST gateway reports Insufficient permissions exceptions as 404 Not Found** When insufficient permissions, you now get: HTTP/1.1 403 Forbidden on the HTTP side, and in the message Forbidden org.apache.hadoop.hbase.security.AccessDeniedException: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions for user ‘myuser',action: get, tableName:mytable, family:cf. at org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor.authorizeAccess(RangerAuthorizationCoprocessor.java:547) and the rest of the ADE stack --- * [HBASE-22100](https://issues.apache.org/jira/browse/HBASE-22100) | *Minor* | **False positive for error prone warnings in pre commit job** Now we will sort the javac WARNING/ERROR before generating diff in pre-commit so we can get a stable output for the error prone. The downside is that we just sort the output lexicographically so the line number will also be sorted lexicographically, which is a bit strange to human. --- * [HBASE-22057](https://issues.apache.org/jira/browse/HBASE-22057) | *Major* | **Impose upper-bound on size of ZK ops sent in a single multi()** Exposes a new configuration property "zookeeper.multi.max.size" which dictates the maximum size of deletes that HBase will make to ZooKeeper in a single RPC. This property defaults to 1MB, which should fall beneath the default ZooKeeper limit of 2MB, controlled by "jute.maxbuffer". --- * [HBASE-22052](https://issues.apache.org/jira/browse/HBASE-22052) | *Major* | **pom cleaning; filter out jersey-core in hadoop2 to match hadoop3 and remove redunant version specifications** Fixed awkward dependency issue that prevented site building. #### note specific to HBase 2.1.4 HBase 2.1.4 shipped with an early version of this fix that incorrectly altered the libraries included in our binary assembly for using Apache Hadoop 2.7 (the current build default Hadoop version for 2.1.z). For folks running out of the box against a Hadoop 2.7 cluster (or folks who skip the installation step of [replacing the bundled Hadoop libraries](http://hbase.apache.org/book.html#hadoop)) this will result in a failure at Region Server startup due to a missing class definition. e.g.: ``` 2019-03-27 09:02:05,779 ERROR [main] regionserver.HRegionServer: Failed construction RegionServer java.lang.NoClassDefFoundError: org/apache/htrace/SamplerBuilder at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:644) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:628) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2701) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2683) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:372) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:171) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:356) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.hadoop.hbase.util.CommonFSUtils.getRootDir(CommonFSUtils.java:362) at org.apache.hadoop.hbase.util.CommonFSUtils.isValidWALRootDir(CommonFSUtils.java:411) at org.apache.hadoop.hbase.util.CommonFSUtils.getWALRootDir(CommonFSUtils.java:387) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeFileSystem(HRegionServer.java:704) at org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:613) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:3029) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:63) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149) at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:3047) Caused by: java.lang.ClassNotFoundException: org.apache.htrace.SamplerBuilder at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 26 more ``` Workaround via any _one_ of the following: * If you are running against a Hadoop cluster that is 2.8+, ensure you replace the Hadoop libaries in the default binary assembly with those for your version. * If you are running against a Hadoop cluster that is 2.8+, build the binary assembly from the source release while specifying your Hadoop version. * If you are running against a Hadoop cluster that is a supported 2.7 release, ensure the `hadoop` executable is in the `PATH` seen at Region Server startup and that you are not using the `HBASE_DISABLE_HADOOP_CLASSPATH_LOOKUP` bypass. * For any supported Hadoop version, manually make the Apache HTrace artifact `htrace-core-3.1.0-incubating.jar` available to all Region Servers via the HBASE_CLASSPATH environment variable. * For any supported Hadoop version, manually make the Apache HTrace artifact `htrace-core-3.1.0-incubating.jar` available to all Region Servers by copying it into the directory `${HBASE_HOME}/lib/client-facing-thirdparty/`. --- * [HBASE-22065](https://issues.apache.org/jira/browse/HBASE-22065) | *Major* | **Add listTableDescriptors(List\) method in AsyncAdmin** Add a listTableDescriptors(List\) method in the AsyncAdmin interface, to align with the Admin interface. --- * [HBASE-22040](https://issues.apache.org/jira/browse/HBASE-22040) | *Major* | **Add mergeRegionsAsync with a List of region names method in AsyncAdmin** Add a mergeRegionsAsync(byte[][], boolean) method in the AsyncAdmin interface. Instead of using assert, now we will throw IllegalArgumentException when you want to merge less than 2 regions at client side. And also, at master side, instead of using assert, now we will throw DoNotRetryIOException if you want merge more than 2 regions, since we only support merging two regions at once for now. --- * [HBASE-22039](https://issues.apache.org/jira/browse/HBASE-22039) | *Major* | **Should add the synchronous parameter for the XXXSwitch method in AsyncAdmin** Add drainXXX parameter for balancerSwitch/splitSwitch/mergeSwitch methods in the AsyncAdmin interface, which has the same meaning with the synchronous parameter for these methods in the Admin interface. --- * [HBASE-21810](https://issues.apache.org/jira/browse/HBASE-21810) | *Major* | **bulkload support set hfile compression on client** bulkload (HFileOutputFormat2) support config the compression on client ,you can set the job configuration "hbase.mapreduce.hfileoutputformat.compression" override the auto-detection of the target table's compression --- * [HBASE-22000](https://issues.apache.org/jira/browse/HBASE-22000) | *Major* | **Deprecated isTableAvailable with splitKeys** Deprecated AsyncTable.isTableAvailable(TableName, byte[][]). --- * [HBASE-21871](https://issues.apache.org/jira/browse/HBASE-21871) | *Major* | **Support to specify a peer table name in VerifyReplication tool** After HBASE-21871, we can specify a peer table name with --peerTableName in VerifyReplication tool like the following: hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication --peerTableName=peerTable 5 TestTable In addition, we can compare any 2 tables in any remote clusters with specifying both peerId and --peerTableName. For example: hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication --peerTableName=peerTable zk1,zk2,zk3:2181/hbase TestTable --- * [HBASE-15728](https://issues.apache.org/jira/browse/HBASE-15728) | *Major* | **Add remaining per-table region / store / flush / compaction related metrics** Adds below flush, split, and compaction metrics + // split related metrics + private MutableFastCounter splitRequest; + private MutableFastCounter splitSuccess; + private MetricHistogram splitTimeHisto; + + // flush related metrics + private MetricHistogram flushTimeHisto; + private MetricHistogram flushMemstoreSizeHisto; + private MetricHistogram flushOutputSizeHisto; + private MutableFastCounter flushedMemstoreBytes; + private MutableFastCounter flushedOutputBytes; + + // compaction related metrics + private MetricHistogram compactionTimeHisto; + private MetricHistogram compactionInputFileCountHisto; + private MetricHistogram compactionInputSizeHisto; + private MetricHistogram compactionOutputFileCountHisto; + private MetricHistogram compactionOutputSizeHisto; + private MutableFastCounter compactedInputBytes; + private MutableFastCounter compactedOutputBytes; + + private MetricHistogram majorCompactionTimeHisto; + private MetricHistogram majorCompactionInputFileCountHisto; + private MetricHistogram majorCompactionInputSizeHisto; + private MetricHistogram majorCompactionOutputFileCountHisto; + private MetricHistogram majorCompactionOutputSizeHisto; + private MutableFastCounter majorCompactedInputBytes; + private MutableFastCounter majorCompactedOutputBytes; --- * [HBASE-20886](https://issues.apache.org/jira/browse/HBASE-20886) | *Critical* | **[Auth] Support keytab login in hbase client** From 2.2.0, hbase supports client login via keytab. To use this feature, client should specify \`hbase.client.keytab.file\` and \`hbase.client.keytab.principal\` in hbase-site.xml, then the connection will contain the needed credentials which be renewed periodically to communicate with kerberized hbase cluster. --- * [HBASE-21410](https://issues.apache.org/jira/browse/HBASE-21410) | *Major* | **A helper page that help find all problematic regions and procedures** After HBASE-21410, we add a helper page to Master UI. This helper page is mainly to help HBase operator quickly found all regions and pids that are get stuck. There are 2 entries to get in this page. One is showing in the Regions in Transition section, it made "num region(s) in transition" a link that you can click and check all regions in transition and their related procedure IDs. The other one is showing in the table details section, it made the number of CLOSING or OPENING regions a link, which you can click and check regions and related procedure IDs of CLOSING or OPENING regions of a certain table. In this helper page, not only you can see all regions and related procedures, there are 2 buttons at the top which will show these regions or procedure IDs in text format. This is mainly aim to help operator to easily copy and paste all problematic procedure IDs and encoded region names to HBCK2's command line, by which we HBase operator can bypass these procedures or assign these regions. --- * [HBASE-21588](https://issues.apache.org/jira/browse/HBASE-21588) | *Major* | **Procedure v2 wal splitting implementation** After HBASE-21588, we introduce a new way to do WAL splitting coordination by procedure framework. This can simplify the process of WAL splitting and no need to connect zookeeper any more. During ServerCrashProcedure, it will create a SplitWALProcedure for each WAL that need to split. Then each SplitWALProcedure will spawn a SplitWALRemoteProcedure to send the request to regionserver. At the RegionServer side, whole process is handled by SplitWALCallable. It split the WAL and return the result to master. According to my test, this patch has a better performance as the number of WALs that need to split increase. And it can relieve the pressure on zookeeper. --- * [HBASE-20734](https://issues.apache.org/jira/browse/HBASE-20734) | *Major* | **Colocate recovered edits directory with hbase.wal.dir** Previously the recovered.edits directory was under the root directory. This JIRA moves the recovered.edits directory to be under the hbase.wal.dir if set. It also adds a check for any recovered.edits found under the root directory for backwards compatibility. This gives improvements when a faster media(like SSD) or more local FileSystem is used for the hbase.wal.dir than the root dir. --- * [HBASE-20401](https://issues.apache.org/jira/browse/HBASE-20401) | *Minor* | **Make \`MAX\_WAIT\` and \`waitIfNotFinished\` in CleanerContext configurable** When oldwals (and hfile) cleaner cleans stale wals (and hfiles), it will periodically check and wait the clean results from filesystem, the total wait time will be no more than a max time. The periodically wait and check configurations are hbase.oldwals.cleaner.thread.check.interval.msec (default is 500 ms) and hbase.regionserver.hfilecleaner.thread.check.interval.msec (default is 1000 ms). Meanwhile, The max time configurations are hbase.oldwals.cleaner.thread.timeout.msec and hbase.regionserver.hfilecleaner.thread.timeout.msec, they are set to 60 seconds by default. All support dynamic configuration. e.g. in the oldwals cleaning scenario, one may consider tuning hbase.oldwals.cleaner.thread.timeout.msec and hbase.oldwals.cleaner.thread.check.interval.msec 1. While deleting a oldwal never complete (strange but possible), then delete file task needs to wait for a max of 60 seconds. Here, 60 seconds might be too long, or the opposite way is to increase more than 60 seconds in the use cases of slow file delete. 2. The check and wait of a file delete is set to default in the period of 500 milliseconds, one might want to tune this checking period to a short interval to check more frequently or to a longer interval to avoid checking too often to manage their delete file task checking period (the longer interval may be use to avoid checking too fast while using a high latency storage). --- * [HBASE-21481](https://issues.apache.org/jira/browse/HBASE-21481) | *Major* | **[acl] Superuser's permissions should not be granted or revoked by any non-su global admin** HBASE-21481 improves the quality of access control, by strengthening the protection of super users's privileges. --- * [HBASE-21082](https://issues.apache.org/jira/browse/HBASE-21082) | *Critical* | **Reimplement assign/unassign related procedure metrics** Now we have four types of RIT procedure metrics, assign, unassign, move, reopen. The meaning of assign/unassign is changed, as we will not increase the unassign metric and then the assign metric when moving a region. Also introduced two new procedure metrics, open and close, which are used to track the open/close region calls to region server. We may send open/close multiple times to finish a RIT since we may retry multiple times. --- * [HBASE-20724](https://issues.apache.org/jira/browse/HBASE-20724) | *Critical* | **Sometimes some compacted storefiles are still opened after region failover** Problem: This is an old problem since HBASE-2231. The compaction event marker was only writed to WAL. But after flush, the WAL may be archived, which means an useful compaction event marker be deleted, too. So the compacted store files cannot be archived when region open and replay WAL. Solution: After this jira, the compaction event tracker will be writed to HFile. When region open and load store files, read the compaction evnet tracker from HFile and archive the compacted store files which still exist. --- * [HBASE-21820](https://issues.apache.org/jira/browse/HBASE-21820) | *Major* | **Implement CLUSTER quota scope** HBase contains two quota scopes: MACHINE and CLUSTER. Before this patch, set quota operations did not expose scope option to client api and use MACHINE as default, CLUSTER scope can not be set and used. Shell commands are as follows: set\_quota, TYPE =\> THROTTLE, TABLE =\> 't1', LIMIT =\> '10req/sec' This issue implements CLUSTER scope in a simple way: For user, namespace, user over namespace quota, use [ClusterLimit / RSNum] as machine limit. For table and user over table quota, use [ClusterLimit / TotalTableRegionNum \* MachineTableRegionNum] as machine limit. After this patch, user can set CLUSTER scope quota, but MACHINE is still default if user ignore scope. Shell commands are as follows: set\_quota, TYPE =\> THROTTLE, TABLE =\> 't1', LIMIT =\> '10req/sec' set\_quota, TYPE =\> THROTTLE, TABLE =\> 't1', LIMIT =\> '10req/sec', SCOPE =\> MACHINE set\_quota, TYPE =\> THROTTLE, TABLE =\> 't1', LIMIT =\> '10req/sec', SCOPE =\> CLUSTER --- * [HBASE-21057](https://issues.apache.org/jira/browse/HBASE-21057) | *Minor* | **upgrade to latest spotbugs** Change spotbugs version to 3.1.11. --- * [HBASE-21922](https://issues.apache.org/jira/browse/HBASE-21922) | *Major* | **BloomContext#sanityCheck may failed when use ROWPREFIX\_DELIMITED bloom filter** Remove bloom filter type ROWPREFIX\_DELIMITED. May add it back when find a better solution. --- * [HBASE-21783](https://issues.apache.org/jira/browse/HBASE-21783) | *Major* | **Support exceed user/table/ns throttle quota if region server has available quota** Support enable or disable exceed throttle quota. Exceed throttle quota means, user can over consume user/namespace/table quota if region server has additional available quota because other users don't consume at the same time. Use the following shell commands to enable/disable exceed throttle quota: enable\_exceed\_throttle\_quota disable\_exceed\_throttle\_quota There are two limits when enable exceed throttle quota: 1. Must set at least one read and one write region server throttle quota; 2. All region server throttle quotas must be in seconds time unit. Because once previous requests exceed their quota and consume region server quota, quota in other time units may be refilled in a long time, this may affect later requests. --- * [HBASE-20587](https://issues.apache.org/jira/browse/HBASE-20587) | *Major* | **Replace Jackson with shaded thirdparty gson** Remove jackson dependencies from most hbase modules except hbase-rest, use shaded gson instead. The output json will be a bit different since jackson can use getter/setter, but gson will always use the fields. --- * [HBASE-21928](https://issues.apache.org/jira/browse/HBASE-21928) | *Major* | **Deprecated HConstants.META\_QOS** Mark HConstants.META\_QOS as deprecated. It is for internal use only, which is the highest priority. You should not try to set a priority greater than or equal to this value, although it is no harm but also useless. --- * [HBASE-17942](https://issues.apache.org/jira/browse/HBASE-17942) | *Major* | **Disable region splits and merges per table** This patch adds the ability to disable split and/or merge for a table (By default, split and merge are enabled for a table). --- * [HBASE-21636](https://issues.apache.org/jira/browse/HBASE-21636) | *Major* | **Enhance the shell scan command to support missing scanner specifications like ReadType, IsolationLevel etc.** Allows shell to set Scan options previously not exposed. See additions as part of the scan help by typing following hbase shell: hbase\> help 'scan' --- * [HBASE-21201](https://issues.apache.org/jira/browse/HBASE-21201) | *Major* | **Support to run VerifyReplication MR tool without peerid** We can specify peerQuorumAddress instead of peerId in VerifyReplication tool. So it no longer requires peerId to be setup when using this tool. For example: hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication zk1,zk2,zk3:2181/hbase testTable --- * [HBASE-21838](https://issues.apache.org/jira/browse/HBASE-21838) | *Major* | **Create a special ReplicationEndpoint just for verifying the WAL entries are fine** Introduce a VerifyWALEntriesReplicationEndpoint which replicates nothing but only verifies if all the cells are valid. It can be used to capture bugs for writing WAL, as most times we will not read the WALs again after writing it if there are no region server crashes. --- * [HBASE-21727](https://issues.apache.org/jira/browse/HBASE-21727) | *Minor* | **Simplify documentation around client timeout** Deprecated HBaseConfiguration#getInt(Configuration, String, String, int) method and removed it from 3.0.0 version. --- * [HBASE-21764](https://issues.apache.org/jira/browse/HBASE-21764) | *Major* | **Size of in-memory compaction thread pool should be configurable** Introduced an new config key in this issue: hbase.regionserver.inmemory.compaction.pool.size. the default value would be 10. you can configure this to set the pool size of in-memory compaction pool. Note that all memstores in one region server will share the same pool, so if you have many regions in one region server, you need to set this larger to compact faster for better read performance. --- * [HBASE-21684](https://issues.apache.org/jira/browse/HBASE-21684) | *Major* | **Throw DNRIOE when connection or rpc client is closed** Make StoppedRpcClientException extend DoNotRetryIOException. --- * [HBASE-21739](https://issues.apache.org/jira/browse/HBASE-21739) | *Major* | **Move grant/revoke from regionserver to master** To implement user permission control in Precedure V2, move grant and revoke method from AccessController to master firstly. Mark AccessController#grant and AccessController#revoke as deprecated and please use Admin#grant and Admin#revoke instead. --- * [HBASE-21791](https://issues.apache.org/jira/browse/HBASE-21791) | *Blocker* | **Upgrade thrift dependency to 0.12.0** IMPORTANT: Due to security issues, all users who use hbase thrift should avoid using releases which do not have this fix. The effect releases are: 2.1.x: 2.1.2 and below 2.0.x: 2.0.4 and below 1.x: 1.4.x and below If you are using the effect releases above, please consider upgrading to a newer release ASAP. --- * [HBASE-21792](https://issues.apache.org/jira/browse/HBASE-21792) | *Major* | **Mark HTableMultiplexer as deprecated and remove it in 3.0.0** HTableMultiplexer exposes the implementation class, and it is incomplete, so we mark it as deprecated and remove it in 3.0.0 release. There is no direct replacement for HTableMultiplexer, please use BufferedMutator if you want to batch mutations to a table. --- * [HBASE-21782](https://issues.apache.org/jira/browse/HBASE-21782) | *Major* | **LoadIncrementalHFiles should not be IA.Public** Introduce a BulkLoadHFiles interface which is marked as IA.Public, for doing bulk load programmatically. Introduce a BulkLoadHFilesTool which extends BulkLoadHFiles, and is marked as IA.LimitedPrivate(TOOLS), for using from command line. The old LoadIncrementalHFiles is deprecated and will be removed in 3.0.0. --- * [HBASE-21762](https://issues.apache.org/jira/browse/HBASE-21762) | *Major* | **Move some methods in ClusterConnection to Connection** Move the two getHbck method from ClusterConnection to Connection, and mark the methods as IA.LimitedPrivate(HBCK), as ClusterConnection is IA.Private and should not be depended by HBCK2. Add a clearRegionLocationCache method in Connection to clear the region location cache for all the tables. As in RegionLocator, most of the methods have a 'reload' parameter, which implicitly tells user that we have a region location cache, so adding a method to clear the cache is fine. --- * [HBASE-21713](https://issues.apache.org/jira/browse/HBASE-21713) | *Major* | **Support set region server throttle quota** Support set region server rpc throttle quota which represents the read/write ability of region servers and throttles when region server's total requests exceeding the limit. Use the following shell command to set RS quota: set\_quota TYPE =\> THROTTLE, REGIONSERVER =\> 'all', THROTTLE\_TYPE =\> WRITE, LIMIT =\> '20000req/sec' set\_quota TYPE =\> THROTTLE, REGIONSERVER =\> 'all', LIMIT =\> NONE "all" represents the throttle quota of all region servers and setting specified region server quota isn't supported currently. --- * [HBASE-21689](https://issues.apache.org/jira/browse/HBASE-21689) | *Minor* | **Make table/namespace specific current quota info available in shell(describe\_namespace & describe)** In shell commands "describe\_namespace" and "describe", which are used to see the descriptors of the namespaces and tables respectively, quotas set on that particular namespace/table will also be printed along. --- * [HBASE-17370](https://issues.apache.org/jira/browse/HBASE-17370) | *Major* | **Fix or provide shell scripts to drain and decommission region server** Adds shell support for the following: - List decommissioned/draining region servers - Decommission a list of region servers, optionally offload corresponding regions - Recommission a region server, optionally load a list of passed regions --- * [HBASE-21734](https://issues.apache.org/jira/browse/HBASE-21734) | *Major* | **Some optimization in FilterListWithOR** After HBASE-21620, the filterListWithOR has been a bit slow because we need to merge each sub-filter's RC , while before HBASE-21620, we will skip many RC merging, but the logic was wrong. So here we choose another way to optimaze the performance: removing the KeyValueUtil#toNewKeyCell. Anoop Sam John suggested that the KeyValueUtil#toNewKeyCell can save some GC before because if we copy key part of cell into a single byte[], then the block the cell refering won't be refered by the filter list any more, the upper layer can GC the data block quickly. while after HBASE-21620, we will update the prevCellList for every encountered cell now, so the lifecycle of cell in prevCellList for FilterList will be quite shorter. so just use the cell ref for saving cpu. BTW, we removed all the arrays streams usage in filter list, because it's also quite time-consuming in our test. --- * [HBASE-21738](https://issues.apache.org/jira/browse/HBASE-21738) | *Critical* | **Remove all the CSLM#size operation in our memstore because it's an quite time consuming.** We found the memstore snapshotting would cost much time because of calling the time-consuming ConcurrentSkipListMap#Size, it would make the p999 latency spike happen. So in this issue, we remove all ConcurrentSkipListMap#size in memstore by counting the cellsCount in MemstoreSizeing. As the issue described, the p999 latency spike was mitigated. --- * [HBASE-21034](https://issues.apache.org/jira/browse/HBASE-21034) | *Major* | **Add new throttle type: read/write capacity unit** Provides a new throttle type: capacity unit. One read/write/request capacity unit represents that read/write/read+write up to 1K data. If data size is more than 1K, then consume additional capacity units. Use shell command to set capacity unit(CU): set\_quota TYPE =\> THROTTLE, THROTTLE\_TYPE =\> WRITE, USER =\> 'u1', LIMIT =\> '10CU/sec' Use the "hbase.quota.read.capacity.unit" property to set the data size of one read capacity unit in bytes, the default value is 1K. Use the "hbase.quota.write.capacity.unit" property to set the data size of one write capacity unit in bytes, the default value is 1K. --- * [HBASE-21595](https://issues.apache.org/jira/browse/HBASE-21595) | *Minor* | **Print thread's information and stack traces when RS is aborting forcibly** Does thread dump on stdout on abort. --- * [HBASE-21732](https://issues.apache.org/jira/browse/HBASE-21732) | *Critical* | **Should call toUpperCase before using Enum.valueOf in some methods for ColumnFamilyDescriptor** Now all the Enum configs in ColumnFamilyDescriptor can accept lower case config value. --- * [HBASE-21712](https://issues.apache.org/jira/browse/HBASE-21712) | *Minor* | **Make submit-patch.py python3 compatible** Python3 support was added to dev-support/submit-patch.py. To install newly required dependencies run \`pip install -r dev-support/python-requirements.txt\` command. --- * [HBASE-21657](https://issues.apache.org/jira/browse/HBASE-21657) | *Major* | **PrivateCellUtil#estimatedSerializedSizeOf has been the bottleneck in 100% scan case.** In HBASE-21657, I simplified the path of estimatedSerialiedSize() & estimatedSerialiedSizeOfCell() by moving the general getSerializedSize() and heapSize() from ExtendedCell to Cell interface. The patch also included some other improvments: 1. For 99% of case, our cells has no tags, so let the HFileScannerImpl just return the NoTagsByteBufferKeyValue if no tags, which means we can save lots of cpu time when sending no tags cell to rpc because can just return the length instead of getting the serialize size by caculating offset/length of each fields(row/cf/cq..) 2. Move the subclass's getSerializedSize implementation from ExtendedCell to their own class, which mean we did not need to call ExtendedCell's getSerialiedSize() firstly, then forward to subclass's getSerializedSize(withTags). 3. Give a estimated result arraylist size for avoiding the frequent list extension when in a big scan, now we estimate the array size as min(scan.rows, 512). it's also help a lot. We gain almost ~40% throughput improvement in 100% scan case for branch-2 (cacheHitRatio~100%)[1], it's a good thing. While it's a incompatible change in some case, such as if the upstream user implemented their own Cells, although it's rare but can happen, then their compile will be error. --- * [HBASE-21647](https://issues.apache.org/jira/browse/HBASE-21647) | *Major* | **Add status track for splitting WAL tasks** Adds task monitor that shows ServerCrashProcedure progress in UI. --- * [HBASE-21652](https://issues.apache.org/jira/browse/HBASE-21652) | *Major* | **Refactor ThriftServer making thrift2 server inherited from thrift1 server** Before this issue, thrift1 server and thrift2 server are totally different servers. If a new feature is added to thrift1 server, thrfit2 server have to make the same change to support it(e.g. authorization). After this issue, thrift2 server is inherited from thrift1, thrift2 server now have all the features thrift1 server has(e.g http support, which thrift2 server doesn't have before). The way to start thrift1 or thrift2 server remain the same after this issue. --- * [HBASE-21661](https://issues.apache.org/jira/browse/HBASE-21661) | *Major* | **Provide Thrift2 implementation of Table/Admin** ThriftAdmin/ThriftTable are implemented based on Thrift2. With ThriftAdmin/ThriftTable, People can use thrift2 protocol just like HTable/HBaseAdmin. Example of using ThriftConnection Configuration conf = HBaseConfiguration.create(); conf.set(ClusterConnection.HBASE\_CLIENT\_CONNECTION\_IMPL,ThriftConnection.class.getName()); Connection conn = ConnectionFactory.createConnection(conf); Table table = conn.getTable(tablename) It is just like a normal Connection, similar use experience with the default ConnectionImplementation --- * [HBASE-21618](https://issues.apache.org/jira/browse/HBASE-21618) | *Critical* | **Scan with the same startRow(inclusive=true) and stopRow(inclusive=false) returns one result** There was a bug when scan with the same startRow(inclusive=true) and stopRow(inclusive=false). The old incorrect behavior is return one result. After this fix, the new correct behavior is return nothing. --- * [HBASE-21159](https://issues.apache.org/jira/browse/HBASE-21159) | *Major* | **Add shell command to switch throttle on or off** Support enable or disable rpc throttle when hbase quota is enabled. If hbase quota is enabled, rpc throttle is enabled by default. When disable rpc throttle, HBase will not throttle any request. Use the following commands to switch rpc throttle : enable\_rpc\_throttle / disable\_rpc\_throttle. --- * [HBASE-21659](https://issues.apache.org/jira/browse/HBASE-21659) | *Minor* | **Avoid to load duplicate coprocessors in system config and table descriptor** Add a new configuration "hbase.skip.load.duplicate.table.coprocessor". The default value is false to keep compatible with the old behavior. Config it true to skip load duplicate table coprocessor. --- * [HBASE-21650](https://issues.apache.org/jira/browse/HBASE-21650) | *Major* | **Add DDL operation and some other miscellaneous to thrift2** Added DDL operations and some other structure definition to thrift2. Methods added: create/modify/addColumnFamily/deleteColumnFamily/modifyColumnFamily/enable/disable/truncate/delete table create/modify/delete namespace get(list)TableDescriptor(s)/get(list)NamespaceDescirptor(s) tableExists/isTableEnabled/isTableDisabled/isTableAvailabe And some class definitions along with those methods --- * [HBASE-21643](https://issues.apache.org/jira/browse/HBASE-21643) | *Major* | **Introduce two new region coprocessor method and deprecated postMutationBeforeWAL** Deprecated region coprocessor postMutationBeforeWAL and introduce two new region coprocessor postIncrementBeforeWAL and postAppendBeforeWAL instead. --- * [HBASE-21635](https://issues.apache.org/jira/browse/HBASE-21635) | *Major* | **Use maven enforcer to ban imports from illegal packages** Use de.skuzzle.enforcer.restrict-imports-enforcer-rule extension for maven enforcer plugin to ban illegal imports at compile time. Now if you use illegal imports, for example, import com.google.common.\*, there will be a compile error, instead of a checkstyle warning. --- * [HBASE-21401](https://issues.apache.org/jira/browse/HBASE-21401) | *Critical* | **Sanity check when constructing the KeyValue** Add a sanity check when constructing KeyValue from a byte[]. we use the constructor when we're reading kv from socket or HFIle or WAL(replication). the santiy check isn't designed for discovering the bits corruption in network transferring or disk IO. It is designed to detect bugs inside HBase in advance. and HBASE-21459 indicated that there's extremely small performance loss for diff kinds of keyvalue. --- * [HBASE-21554](https://issues.apache.org/jira/browse/HBASE-21554) | *Minor* | **Show replication endpoint classname for replication peer on master web UI** The replication UI on master will show the replication endpoint classname. --- * [HBASE-21549](https://issues.apache.org/jira/browse/HBASE-21549) | *Major* | **Add shell command for serial replication peer** Add a SERIAL flag for add\_peer command to identifiy whether or not the replication peer is a serial replication peer. The default serial flag is false. --- * [HBASE-21453](https://issues.apache.org/jira/browse/HBASE-21453) | *Major* | **Convert ReadOnlyZKClient to DEBUG instead of INFO** Log level of ReadOnlyZKClient moved to debug. --- * [HBASE-21283](https://issues.apache.org/jira/browse/HBASE-21283) | *Minor* | **Add new shell command 'rit' for listing regions in transition** The HBase `shell` now includes a command to list regions currently in transition. ``` HBase Shell Use "help" to get list of supported commands. Use "exit" to quit this interactive shell. Version 1.5.0-SNAPSHOT, r9bb6d2fa8b760f16cd046657240ebd4ad91cb6de, Mon Oct 8 21:05:50 UTC 2018 hbase(main):001:0> help 'rit' List all regions in transition. Examples: hbase> rit hbase(main):002:0> create ... 0 row(s) in 2.5150 seconds => Hbase::Table - IntegrationTestBigLinkedList hbase(main):003:0> rit 0 row(s) in 0.0340 seconds hbase(main):004:0> unassign '56f0c38c81ae453d19906ce156a2d6a1' 0 row(s) in 0.0540 seconds hbase(main):005:0> rit IntegrationTestBigLinkedList,L\xCC\xCC\xCC\xCC\xCC\xCC\xCB,1539117183224.56f0c38c81ae453d19906ce156a2d6a1. state=PENDING_CLOSE, ts=Tue Oct 09 20:33:34 UTC 2018 (0s ago), server=null 1 row(s) in 0.0170 seconds ``` --- * [HBASE-21567](https://issues.apache.org/jira/browse/HBASE-21567) | *Major* | **Allow overriding configs starting up the shell** Allow passing of -Dkey=value option to shell to override hbase-\* configuration: e.g.: $ ./bin/hbase shell -Dhbase.zookeeper.quorum=ZK0.remote.cluster.example.org,ZK1.remote.cluster.example.org,ZK2.remote.cluster.example.org -Draining=false ... hbase(main):001:0\> @shell.hbase.configuration.get("hbase.zookeeper.quorum") =\> "ZK0.remote.cluster.example.org,ZK1.remote.cluster.example.org,ZK2.remote.cluster.example.org" hbase(main):002:0\> @shell.hbase.configuration.get("raining") =\> "false" --- * [HBASE-21560](https://issues.apache.org/jira/browse/HBASE-21560) | *Major* | **Return a new TableDescriptor for MasterObserver#preModifyTable to allow coprocessor modify the TableDescriptor** Incompatible change. Allow MasterObserver#preModifyTable to return a new TableDescriptor. And master will use this returned TableDescriptor to modify table. --- * [HBASE-21551](https://issues.apache.org/jira/browse/HBASE-21551) | *Blocker* | **Memory leak when use scan with STREAM at server side** ### Summary HBase clusters will experience Region Server failures due to out of memory errors due to a leak given any of the following: * User initiates Scan operations set to use the STREAM reading type * User initiates Scan operations set to use the default reading type that read more than 4 * the block size of column families involved in the scan (e.g. by default 4*64KiB) * Compactions run ### Root cause When there are long running scans the Region Server process attempts to optimize access by using a different API geared towards sequential access. Due to an error in HBASE-20704 for HBase 2.0+ the Region Server fails to release related resources when those scans finish. That same optimization path is always used for the HBase internal file compaction process. ### Workaround Impact for this error can be minimized by setting the config value “hbase.storescanner.pread.max.bytes” to MAX_INT to avoid the optimization for default user scans. Clients should also be checked to ensure they do not pass the STREAM read type to the Scan API. This will have a severe impact on performance for long scans. Compactions always use this sequential optimized reading mechanism so downstream users will need to periodically restart Region Server roles after compactions have happened. --- * [HBASE-21550](https://issues.apache.org/jira/browse/HBASE-21550) | *Major* | **Add a new method preCreateTableRegionInfos for MasterObserver which allows CPs to modify the TableDescriptor** Add a new method preCreateTableRegionInfos for MasterObserver, which will be called before creating region infos for the given table, before the preCreateTable method. It allows you to return a new TableDescritor to override the original one. Returns null or throws exception will stop the creation. --- * [HBASE-21492](https://issues.apache.org/jira/browse/HBASE-21492) | *Critical* | **CellCodec Written To WAL Before It's Verified** After HBASE-21492 the return type of WALCellCodec#getWALCellCodecClass has been changed from String to Class --- * [HBASE-21387](https://issues.apache.org/jira/browse/HBASE-21387) | *Major* | **Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files** To prevent race condition between in progress snapshot (performed by TakeSnapshotHandler) and HFileCleaner which results in data loss, this JIRA introduced mutual exclusion between taking snapshot and running HFileCleaner. That is, at any given moment, either some snapshot can be taken or, HFileCleaner checks hfiles which are not referenced, but not both can be running. --- * [HBASE-21452](https://issues.apache.org/jira/browse/HBASE-21452) | *Major* | **Illegal character in hbase counters group name** Changes group name of hbase metrics from "HBase Counters" to "HBaseCounters". --- * [HBASE-21443](https://issues.apache.org/jira/browse/HBASE-21443) | *Major* | **[hbase-connectors] Purge hbase-\* modules from core now they've been moved to hbase-connectors** Parent issue moved hbase-spark\* modules to hbase-connectors. This issue removes hbase-spark\* modules from hbase core repo. --- * [HBASE-21430](https://issues.apache.org/jira/browse/HBASE-21430) | *Major* | **[hbase-connectors] Move hbase-spark\* modules to hbase-connectors repo** hbase-spark\* modules have been cloned to https://github.com/apache/hbase-connectors All spark connector dev is to happen in that repo from here on out. Let me file a subtask to remove hbase-spark\* modules from hbase core. --- * [HBASE-21417](https://issues.apache.org/jira/browse/HBASE-21417) | *Critical* | **Pre commit build is broken due to surefire plugin crashes** Add -Djdk.net.URLClassPath.disableClassPathURLCheck=true when executing surefire plugin. --- * [HBASE-21191](https://issues.apache.org/jira/browse/HBASE-21191) | *Major* | **Add a holding-pattern if no assign for meta or namespace (Can happen if masterprocwals have been cleared).** Puts master startup into holding pattern if meta is not assigned (previous it would exit). To make progress again, operator needs to inject an assign (Caveats and instruction can be found in HBASE-21035). --- * [HBASE-21322](https://issues.apache.org/jira/browse/HBASE-21322) | *Critical* | **Add a scheduleServerCrashProcedure() API to HbckService** Adds scheduleServerCrashProcedure to the HbckService. --- * [HBASE-21325](https://issues.apache.org/jira/browse/HBASE-21325) | *Major* | **Force to terminate regionserver when abort hang in somewhere** Add two new config hbase.regionserver.abort.timeout and hbase.regionserver.abort.timeout.task. If regionserver abort timeout, it will schedule an abort timeout task to run. The default abort task is SystemExitWhenAbortTimeout, which will force to terminate region server when abort timeout. And you can config a special abort timeout task by hbase.regionserver.abort.timeout.task. --- * [HBASE-21215](https://issues.apache.org/jira/browse/HBASE-21215) | *Major* | **Figure how to invoke hbck2; make it easy to find** Adds to bin/hbase means of invoking hbck2. Pass the new '-j' option on the 'hbck' command with a value of the full path to the HBCK2.jar. E.g: $ ./bin/hbase hbck -j ~/checkouts/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar setTableState x ENABLED --- * [HBASE-21372](https://issues.apache.org/jira/browse/HBASE-21372) | *Major* | **Set hbase.assignment.maximum.attempts to Long.MAX** Retry assigns 'forever' (or until an intervention such as a ServerCrashProcedure). Previous retry was a maximum of ten times but on failure, handling was an indeterminate. --- * [HBASE-21338](https://issues.apache.org/jira/browse/HBASE-21338) | *Major* | **[balancer] If balancer is an ill-fit for cluster size, it gives little indication** The description claims the balancer not dynamically configurable but this is an error; it is http://hbase.apache.org/book.html#dyn\_config Also, if balancer is seen to be cutting out too soon, try setting "hbase.master.balancer.stochastic.runMaxSteps" to true. Adds cleaner logging around balancer start. --- * [HBASE-21073](https://issues.apache.org/jira/browse/HBASE-21073) | *Major* | **"Maintenance mode" master** Instead of being an ephemeral state set by hbck, maintenance mode is now an explicit toggle set by either configuration property or environment variable. In maintenance mode, master will host system tables and not assign any user-space tables to RSs. This gives operators the ability to affect repairs to meta table with fewer moving parts. --- * [HBASE-21335](https://issues.apache.org/jira/browse/HBASE-21335) | *Critical* | **Change the default wait time of HBCK2 tool** Changed waitTime parameter to lockWait on bypass. Changed default waitTime from 0 -- i.e. wait for ever -- to 1ms so if lock is held, we'll go past it and if override enforce bypass. --- * [HBASE-21291](https://issues.apache.org/jira/browse/HBASE-21291) | *Major* | **Add a test for bypassing stuck state-machine procedures** bypass will now throw an Exception if passed a lockWait \<= 0; i.e bypass will prevent an operator getting stuck on an entity lock waiting forever (lockWait == 0) --- * [HBASE-21320](https://issues.apache.org/jira/browse/HBASE-21320) | *Major* | **[canary] Cleanup of usage and add commentary** Cleans up usage and docs around Canary. Does not change command-line args (though we should -- smile). --- * [HBASE-21278](https://issues.apache.org/jira/browse/HBASE-21278) | *Critical* | **Do not rollback successful sub procedures when rolling back a procedure** For the sub procedures which are successfully finished, do not do rollback. This is a change in rollback behavior. State changes which are done by sub procedures should be handled by parent procedures when rolling back. For example, when rolling back a MergeTableProcedure, we will schedule new procedures to bring the offline regions online instead of rolling back the original procedures which off-lined the regions (in fact these procedures can not be rolled back...). --- * [HBASE-21158](https://issues.apache.org/jira/browse/HBASE-21158) | *Critical* | **Empty qualifier cell should not be returned if it does not match QualifierFilter** Scans that make use of `QualifierFilter` previously would erroneously return both columns with an empty qualifier along with those that matched. After this change that behavior has changed to only return those columns that match. --- * [HBASE-21098](https://issues.apache.org/jira/browse/HBASE-21098) | *Major* | **Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3** It is recommended to place the working directory on-cluster on HDFS as doing so has shown a strong performance increase due to data locality. It is important to note that the working directory should not overlap with any existing directories as the working directory will be cleaned out during the snapshot process. Beyond that, any well-named directory on HDFS should be sufficient. --- * [HBASE-21185](https://issues.apache.org/jira/browse/HBASE-21185) | *Minor* | **WALPrettyPrinter: Additional useful info to be printed by wal printer tool, for debugability purposes** This adds two extra features to WALPrettyPrinter tool: 1) Output for each cell combined size of cell descriptors, plus the cell value itself, in a given WAL edit. This is printed on the results as "cell total size sum:" info by default; 2) An optional -g/--goto argument, that allows to seek straight to that specific WAL file position, then sequentially reading the WAL from that point towards its end; --- * [HBASE-21287](https://issues.apache.org/jira/browse/HBASE-21287) | *Major* | **JVMClusterUtil Master initialization wait time not configurable** Local HBase cluster (as used by unit tests) wait times on startup and initialization can be configured via \`hbase.master.start.timeout.localHBaseCluster\` and \`hbase.master.init.timeout.localHBaseCluster\` --- * [HBASE-21280](https://issues.apache.org/jira/browse/HBASE-21280) | *Trivial* | **Add anchors for each heading in UI** Adds anchors #tables, #tasks, etc. --- * [HBASE-21232](https://issues.apache.org/jira/browse/HBASE-21232) | *Major* | **Show table state in Tables view on Master home page** Add table state column to the tables panel --- * [HBASE-21223](https://issues.apache.org/jira/browse/HBASE-21223) | *Critical* | **[amv2] Remove abort\_procedure from shell** Removed the abort\_procedure command from shell -- dangerous -- and deprecated abortProcedure in Admin API. --- * [HBASE-20636](https://issues.apache.org/jira/browse/HBASE-20636) | *Major* | **Introduce two bloom filter type : ROWPREFIX\_FIXED\_LENGTH and ROWPREFIX\_DELIMITED** Add two bloom filter type : ROWPREFIX\_FIXED\_LENGTH and ROWPREFIX\_DELIMITED 1. ROWPREFIX\_FIXED\_LENGTH: specify the length of the prefix 2. ROWPREFIX\_DELIMITED: specify the delimiter of the prefix Need to specify parameters for these two types of bloomfilter, otherwise the table will fail to create Example: create 't1', {NAME =\> 'f1', BLOOMFILTER =\> 'ROWPREFIX\_FIXED\_LENGTH', CONFIGURATION =\> {'RowPrefixBloomFilter.prefix\_length' =\> '10'}} create 't1', {NAME =\> 'f1', BLOOMFILTER =\> 'ROWPREFIX\_DELIMITED', CONFIGURATION =\> {'RowPrefixDelimitedBloomFilter.delimiter' =\> '#'}} --- * [HBASE-21156](https://issues.apache.org/jira/browse/HBASE-21156) | *Critical* | **[hbck2] Queue an assign of hbase:meta and bulk assign/unassign** Adds 'raw' assigns/unassigns to the Hbck Service. Takes a list of encoded region names and bulk assigns/unassigns. Skirts Master 'state' check and does not invoke Coprocessors. For repair only. Here is what HBCK2 usage looks like now: {code} $ java -cp hbase-hbck2-1.0.0-SNAPSHOT.jar org.apache.hbase.HBCK2 usage: HBCK2 \ COMMAND [\] Options: -d,--debug run with debug output -h,--help output this help message --hbase.zookeeper.peerport peerport of target hbase ensemble --hbase.zookeeper.quorum ensemble of target hbase --zookeeper.znode.parent parent znode of target hbase Commands: setTableState \ \ Possible table states: ENABLED, DISABLED, DISABLING, ENABLING To read current table state, in the hbase shell run: hbase\> get 'hbase:meta', '\', 'table:state' A value of \\x08\\x00 == ENABLED, \\x08\\x01 == DISABLED, etc. An example making table name 'user' ENABLED: $ HBCK2 setTableState users ENABLED Returns whatever the previous table state was. assign \ ... A 'raw' assign that can be used even during Master initialization. Skirts Coprocessors. Pass one or more encoded RegionNames: e.g. 1588230740 is hard-coded encoding for hbase:meta region and de00010733901a05f5a2a3a382e27dd4 is an example of what a random user-space encoded Region name looks like. For example: $ HBCK2 assign 1588230740 de00010733901a05f5a2a3a382e27dd4 Returns the pid of the created AssignProcedure or -1 if none. unassign \ ... A 'raw' unassign that can be used even during Master initialization. Skirts Coprocessors. Pass one or more encoded RegionNames: Skirts Coprocessors. Pass one or more encoded RegionNames: de00010733901a05f5a2a3a382e27dd4 is an example of what a random user-space encoded Region name looks like. For example: $ HBCK2 unassign 1588230740 de00010733901a05f5a2a3a382e27dd4 Returns the pid of the created UnassignProcedure or -1 if none. {code} --- * [HBASE-21021](https://issues.apache.org/jira/browse/HBASE-21021) | *Major* | **Result returned by Append operation should be ordered** This change ensures Append operations are assembled into the expected order. --- * [HBASE-21171](https://issues.apache.org/jira/browse/HBASE-21171) | *Major* | **[amv2] Tool to parse a directory of MasterProcWALs standalone** Make it so can run the WAL parse and load system in isolation. Here is an example: {code}$ HBASE\_OPTS=" -XX:+UnlockDiagnosticVMOptions -XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:+DebugNonSafepoints" ./bin/hbase org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore ~/big\_set\_of\_masterprocwals/ {code} --- * [HBASE-21107](https://issues.apache.org/jira/browse/HBASE-21107) | *Minor* | **add a metrics for netty direct memory** Add a new nettyDirectMemoryUsage under server's ipc metrics to show direct memory usage for netty rpc server. --- * [HBASE-21153](https://issues.apache.org/jira/browse/HBASE-21153) | *Major* | **Shaded client jars should always build in relevant phase to avoid confusion** Client facing artifacts are now built whenever Maven is run through the "package" goal. Previously, the client facing artifacts would create placeholder jars that skipped repackaging HBase and third-party dependencies unless the "release" profile was active. Build times may be noticeably longer depending on your build hardware. For example, the Jenkins worker nodes maintained by ASF Infra take ~14% longer to do a full packaging build. An example portability-focused personal laptop took ~25% longer. --- * [HBASE-20942](https://issues.apache.org/jira/browse/HBASE-20942) | *Major* | **Improve RpcServer TRACE logging** Allows configuration of the length of RPC messages printed to the log at TRACE level via "hbase.ipc.trace.param.size" in RpcServer. --- * [HBASE-20649](https://issues.apache.org/jira/browse/HBASE-20649) | *Minor* | **Validate HFiles do not have PREFIX\_TREE DataBlockEncoding** Users who have previously made use of prefix tree encoding can now check that their existing HFiles no longer contain data that uses it with an additional preupgrade check command. ``` hbase pre-upgrade validate-hfile ``` Please see the "HFile Content validation" section of the ref guide's coverage of the pre-upgrade validator tool for usage details. --- * [HBASE-20941](https://issues.apache.org/jira/browse/HBASE-20941) | *Major* | **Create and implement HbckService in master** Adds an HBCK Service and a first method to force-change-in-table-state for use by an HBCK client effecting 'repair' to a malfunctioning HBase. --- * [HBASE-21071](https://issues.apache.org/jira/browse/HBASE-21071) | *Major* | **HBaseTestingUtility::startMiniCluster() to use builder pattern** Cleanup all the cluster start override combos in HBaseTestingUtility by adding a StartMiniClusterOption and Builder. --- * [HBASE-21072](https://issues.apache.org/jira/browse/HBASE-21072) | *Major* | **Block out HBCK1 in hbase2** Fence out hbase-1.x hbck1 instances. Stop them making state changes on an hbase-2.x cluster; they could do damage. We do this by writing the hbck1 lock file into place on hbase-2.x Master start-up. To disable this new behavior, set hbase.write.hbck1.lock.file to false --- * [HBASE-20881](https://issues.apache.org/jira/browse/HBASE-20881) | *Major* | **Introduce a region transition procedure to handle all the state transition for a region** Introduced a new TransitRegionStateProcedure to replace the old AssignProcedure/UnassignProcedure/MoveRegionProcedure. In the old code, MRP will not be attached to RegionStateNode, so it can not be interrupted by ServerCrashProcedure, which introduces lots of tricky code to deal with races, and also causes lots of other difficulties on how to prevent scheduling redundant or even conflict procedures for a region. And now TRSP is the only one procedure which can bring region online or offline. When you want to schedule one, you need to check whether there is already one attached to the RegionStateNode, under the lock of the RegionStateNode. If not just go ahead, and if there is one, then you should do something, for example, give up and fail directly, or tell the TRSP to give up(This is what SCP does). Since the check and attach are both under the lock of RSN, it will greatly reduce the possible races, and make the code much simpler. --- * [HBASE-21012](https://issues.apache.org/jira/browse/HBASE-21012) | *Critical* | **Revert the change of serializing TimeRangeTracker** HFiles generated by 2.0.0, 2.0.1, 2.1.0 are not forward compatible to 1.4.6-, 1.3.2.1-, 1.2.6.1-, and other inactive releases. Why HFile lose compatability is hbase in new versions (2.0.0, 2.0.1, 2.1.0) use protobuf to serialize/deserialize TimeRangeTracker (TRT) while old versions use DataInput/DataOutput. To solve this, We have to put HBASE-21012 to 2.x and put HBASE-21013 in 1.x. For more information, please check HBASE-21008. --- * [HBASE-20965](https://issues.apache.org/jira/browse/HBASE-20965) | *Major* | **Separate region server report requests to new handlers** After HBASE-20965, we can use MasterFifoRpcScheduler in master to separate RegionServerReport requests to indenpedent handler. To use this feature, please set "hbase.master.rpc.scheduler.factory.class" to "org.apache.hadoop.hbase.ipc.MasterFifoRpcScheduler". Use "hbase.master.server.report.handler.count" to set RegionServerReport handlers count, the default value is half of "hbase.regionserver.handler.count" value, but at least 1, and the other handlers count in master is "hbase.regionserver.handler.count" value minus RegionServerReport handlers count, but at least 1 too. --- * [HBASE-20813](https://issues.apache.org/jira/browse/HBASE-20813) | *Minor* | **Remove RPC quotas when the associated table/Namespace is dropped off** In previous releases, when a Space Quota was configured on a table or namespace and that table or namespace was deleted, the Space Quota was also deleted. This change improves the implementation so that the same is also done for RPC Quotas. --- * [HBASE-20986](https://issues.apache.org/jira/browse/HBASE-20986) | *Major* | **Separate the config of block size when we do log splitting and write Hlog** After HBASE-20986, we can set different value to block size of WAL and recovered edits. Both of their default value is 2 \* default HDFS blocksize. And hbase.regionserver.recoverededits.blocksize is for block size of recovered edits while hbase.regionserver.hlog.blocksize is for block size of WAL. --- * [HBASE-20856](https://issues.apache.org/jira/browse/HBASE-20856) | *Minor* | **PITA having to set WAL provider in two places** With this change if a WAL's meta provider (hbase.wal.meta\_provider) is not explicitly set, it now defaults to whatever hbase.wal.provider is set to. Previous, the two settings operated independently, each with its own default. This change is operationally incompatible with previous HBase versions because the default WAL meta provider no longer defaults to AsyncFSWALProvider but to hbase.wal.provider. The thought is that this is more in line with an operator's expectation, that a change in hbase.wal.provider is sufficient to change how WALs are written, especially given hbase.wal.meta\_provider is an obscure configuration and that the very idea that meta regions would have their own wal provider would likely come as a surprise. --- * [HBASE-20538](https://issues.apache.org/jira/browse/HBASE-20538) | *Critical* | **Upgrade our hadoop versions to 2.7.7 and 3.0.3** Update hadoop-two.version to 2.7.7 and hadoop-three.version to 3.0.3 due to a JDK issue which is solved by HADOOP-15473. --- * [HBASE-20846](https://issues.apache.org/jira/browse/HBASE-20846) | *Major* | **Restore procedure locks when master restarts** 1. Make hasLock method final, and add a locked field in Procedure to record whether we have the lock. We will set it to true in doAcquireLock and to false in doReleaseLock. The sub procedures do not need to manage it any more. 2. Also added a locked field in the proto message. When storing, the field will be set according to the return value of hasLock. And when loading, there is a new field in Procedure called lockedWhenLoading. We will set it to true if the locked field in proto message is true. 3. The reason why we can not set the locked field directly to true by calling doAcquireLock is that, during initialization, most procedures need to wait until master is initialized. So the solution here is that, we introduced a new method called waitInitialized in Procedure, and move the wait master initialized related code from acquireLock to this method. And we added a restoreLock method to Procedure, if lockedWhenLoading is true, we will call the acquireLock to get the lock, but do not set locked to true. And later when we call doAcquireLock and pass the waitInitialized check, we will test lockedWhenLoading, if it is true, when we just set the locked field to true and return, without actually calling the acquireLock method since we have already called it once. --- * [HBASE-20672](https://issues.apache.org/jira/browse/HBASE-20672) | *Minor* | **New metrics ReadRequestRate and WriteRequestRate** Exposing 2 new metrics in HBase to provide ReadRequestRate and WriteRequestRate at region server level. These metrics give the rate of request handled by the region server and are reset after every monitoring interval. --- * [HBASE-6028](https://issues.apache.org/jira/browse/HBASE-6028) | *Minor* | **Implement a cancel for in-progress compactions** Added a new command to the shell to switch on/off compactions called "compaction\_switch". Disabling compactions will interrupt any currently ongoing compactions. This setting will be lost on restart of the server. Added the configuration hbase.regionserver.compaction.enabled so user can enable/disable compactions via hbase-site.xml. --- * [HBASE-20884](https://issues.apache.org/jira/browse/HBASE-20884) | *Major* | **Replace usage of our Base64 implementation with java.util.Base64** Class org.apache.hadoop.hbase.util.Base64 has been removed in it's entirety from HBase 2+. In HBase 1, unused methods have been removed from the class and the audience was changed from Public to Private. This class was originally intended as an internal utility class that could be used externally but thinking since changed; these classes should not have been advertised as public to end-users. This represents an incompatible change for users who relied on this implementation. An alternative implementation for affected clients is available at java.util.Base64 when using Java 8 or newer; be aware, it may encode/decode differently. For clients seeking to restore this specific implementation, it is available in the public domain for download at http://iharder.sourceforge.net/current/java/base64/ --- * [HBASE-20357](https://issues.apache.org/jira/browse/HBASE-20357) | *Major* | **AccessControlClient API Enhancement** This enhances the AccessControlClient APIs to retrieve the permissions based on namespace, table name, family and qualifier for specific user. AccessControlClient can also validate a user whether allowed to perform specified operations on a particular table. Following APIs have been added, 1) getUserPermissions(Connection connection, String tableRegex, byte[] columnFamily, byte[] columnQualifier, String userName) Scope of retrieving permission will be same as existing. 2) hasPermission(onnection connection, String tableName, byte[] columnFamily, byte[] columnQualifier, String userName, Permission.Action... actions) Scope of validating user privilege, User can perform self check without any special privilege but ADMIN privilege will be required to perform check for other users. For example, suppose there are two users "userA" & "userB" then there can be below scenarios, a. When userA want to check whether userA have privilege to perform mentioned actions userA don't need ADMIN privilege, as it's a self query. b. When userA want to check whether userB have privilege to perform mentioned actions, userA must have ADMIN or superuser privilege, as it's trying to query for other user. # HBASE 2.1.0 Release Notes These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements. --- * [HBASE-20691](https://issues.apache.org/jira/browse/HBASE-20691) | *Blocker* | **Storage policy should allow deferring to HDFS** After HBASE-20691 we have changed the default setting of hbase.wal.storage.policy from "HOT" back to "NONE" which means we defer the policy to HDFS. This fixes the problem of release 2.0.0 that the storage policy of WAL directory will defer to HDFS and may not be "HOT" even if you explicitly set hbase.wal.storage.policy to "HOT" --- * [HBASE-20839](https://issues.apache.org/jira/browse/HBASE-20839) | *Blocker* | **Fallback to FSHLog if we can not instantiated AsyncFSWAL when user does not specify AsyncFSWAL explicitly** As we hack into the internal of DFSClient when implementing AsyncFSWAL to get better performance, a patch release of hadoop can make it broken. So now, if user does not specify a wal provider, then we will first try to use 'asyncfs', i.e, the AsyncFSWALProvider. If we fail due to some compatible issues, we will fallback to 'filesystem', i.e, FSHLog. --- * [HBASE-20193](https://issues.apache.org/jira/browse/HBASE-20193) | *Critical* | **Basic Replication Web UI - Regionserver** After HBASE-20193, we add a section to web ui to show the replication status of each wal group. There are 2 parts of this section, they both show the peerId, wal group and current replicating log of each replication source. And one is showing the information of replication log queue, i.e. size of current log, log queue size and replicating offset. The other one is showing the delay of replication, i.e. last shipped age and replication delay. If the offset shows -1 and replication delay is UNKNOWN, that means replication is not started. This may be caused by this peer is disabled or the replicationEndpoint is sleeping due to some reason. --- * [HBASE-19997](https://issues.apache.org/jira/browse/HBASE-19997) | *Blocker* | **[rolling upgrade] 1.x =\> 2.x** Now we have a 'basically work' solution for rolling upgrade from 1.4.x to 2.x. Please see the "Rolling Upgrade from 1.x to 2.x" section in ref guide for more details. --- * [HBASE-20270](https://issues.apache.org/jira/browse/HBASE-20270) | *Major* | **Turn off command help that follows all errors in shell** The command help that followed all errors, before, is now no longer available. Erroneous command inputs would now just show error-texts followed by the shell command to try for seeing the help message. It looks like: For usage try 'help “create”’. Operators can copy-paste the command to get the help message. --- * [HBASE-20194](https://issues.apache.org/jira/browse/HBASE-20194) | *Critical* | **Basic Replication WebUI - Master** After HBASE-20194, we added 2 parts to master's web page. One is Peers that shows all replication peers and some of their configurations, like peer id, cluster key, state, bandwidth, and which namespace or table it will replicate. The other one is replication status of all regionservers, we added a tab to region servers division, then we can check the replication delay of all region servers for any peer. This table shows AgeOfLastShippedOp, SizeOfLogQueue and ReplicationLag for each regionserver and the table is sort by ReplicationLag in descending order. By this way we can easily find the problematic region server. If the replication delay is UNKNOWN, that means this walGroup doesn't start replicate yet and it may get disabled. ReplicationLag will update once this peer start replicate. --- * [HBASE-18569](https://issues.apache.org/jira/browse/HBASE-18569) | *Major* | **Add prefetch support for async region locator** Add prefetch support for async region locator. The default value is 10. Set 'hbase.client.locate.prefetch.limit' in hbase-site.xml if you want to use another value for it. --- * [HBASE-20642](https://issues.apache.org/jira/browse/HBASE-20642) | *Major* | **IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException** This changes client-side nonce generation to use the same nonce for re-submissions of client RPC DDL operations. --- * [HBASE-20708](https://issues.apache.org/jira/browse/HBASE-20708) | *Blocker* | **Remove the usage of RecoverMetaProcedure in master startup** Introduce an InitMetaProcedure to initialize meta table for a new HBase deploy. Marked RecoverMetaProcedure deprecated and remove the usage of it in the current code base. We still need to keep it in place for compatibility. The code in RecoverMetaProcedure has been moved to ServerCrashProcedure, and SCP will always be enabled and we will rely on it to bring meta region online. For more on the issue addressed by this commit, see the design doc for overview and plan: https://docs.google.com/document/d/1\_872oHzrhJq4ck7f6zmp1J--zMhsIFvXSZyX1Mxg5MA/edit#heading=h.xy1z4alsq7uy --- * [HBASE-20334](https://issues.apache.org/jira/browse/HBASE-20334) | *Major* | **add a test that expressly uses both our shaded client and the one from hadoop 3** HBase now includes a helper script that can be used to run a basic functionality test for a given HBase installation at in `dev_support`. The test can optionally be given an HBase client artifact to rely on and can optionally be given specific Hadoop client artifacts to use. For usage information see `./dev-support/hbase_nightly_pseudo-distributed-test.sh --help`. The project nightly tests now make use of this test to check running on top of Hadoop 2, Hadoop 3, and Hadoop 3 with shaded client artifacts. --- * [HBASE-19735](https://issues.apache.org/jira/browse/HBASE-19735) | *Major* | **Create a minimal "client" tarball installation** The HBase convenience binary artifacts now includes a client focused tarball that a) includes more docs and b) does not include scripts or jars only needed for running HBase cluster services. The new artifact is made as a normal part of the `assembly:single` maven command. --- * [HBASE-20615](https://issues.apache.org/jira/browse/HBASE-20615) | *Major* | **emphasize use of shaded client jars when they're present in an install** HBase's built in scripts now rely on the downstream facing shaded artifacts where possible. In particular interest to downstream users, the `hbase classpath` and `hbase mapredcp` commands now return the relevant shaded client artifact and only those third paty jars needed to make use of them (e.g. slf4j-api, commons-logging, htrace, etc). Downstream users should note that by default the `hbase classpath` command will treat having `hadoop` on the shell's PATH as an implicit request to include the output of the `hadoop classpath` command in the returned classpath. This long-existing behavior can be opted out of by setting the environment variable `HBASE_DISABLE_HADOOP_CLASSPATH_LOOKUP` to the value "true". For example: `HBASE_DISABLE_HADOOP_CLASSPATH_LOOKUP="true" bin/hbase classpath`. --- * [HBASE-20333](https://issues.apache.org/jira/browse/HBASE-20333) | *Critical* | **break up shaded client into one with no Hadoop and one that's standalone** Downstream users who need to use both HBase and Hadoop APIs should switch to relying on the new `hbase-shaded-client-byo-hadoop` artifact rather than the existing `hbase-shaded-client` artifact. The new artifact no longer includes and Hadoop classes. It should work in combination with either the output of `hadoop classpath` or the Hadoop provided client-facing shaded artifacts in Hadoop 3+. --- * [HBASE-20332](https://issues.apache.org/jira/browse/HBASE-20332) | *Critical* | **shaded mapreduce module shouldn't include hadoop** The `hbase-shaded-mapreduce` artifact no longer include its own copy of Hadoop classes. Users who make use of the artifact via YARN should be able to get these classes from YARN's classpath without having to make any changes. --- * [HBASE-20681](https://issues.apache.org/jira/browse/HBASE-20681) | *Major* | **IntegrationTestDriver fails after HADOOP-15406 due to missing hamcrest-core** Users of our integration tests on Hadoop 3 can now add all needed dependencies by pointing at jars included in our binary convenience artifact. Prior to this fix, downstream users on Hadoop 3 would need to get a copy of the Hamcrest v1.3 jar from elsewhere. --- * [HBASE-19852](https://issues.apache.org/jira/browse/HBASE-19852) | *Major* | **HBase Thrift 1 server SPNEGO Improvements** Adds two new properties for hbase-site.xml for THRIFT SPNEGO when in HTTP mode: \* hbase.thrift.spnego.keytab.file \* hbase.thrift.spnego.principal --- * [HBASE-20590](https://issues.apache.org/jira/browse/HBASE-20590) | *Critical* | **REST Java client is not able to negotiate with the server in the secure mode** Adds a negotiation logic between a secure java REST client and server. After this jira the Java REST client will start responding to the Negotiate challenge sent by the server. Adds RESTDemoClient which can be used to verify whether the secure Java REST client works against secure REST server or not. --- * [HBASE-20634](https://issues.apache.org/jira/browse/HBASE-20634) | *Critical* | **Reopen region while server crash can cause the procedure to be stuck** A second attempt at fixing HBASE-20173. Fixes unfinished keeping of server state inside AM (ONLINE=\>SPLITTING=\>OFFLINE=\>null). Concurrent unassigns look at server state to figure if they should wait on SCP to wake them up or not. --- * [HBASE-20579](https://issues.apache.org/jira/browse/HBASE-20579) | *Minor* | **Improve snapshot manifest copy in ExportSnapshot** This patch adds an FSUtil.copyFilesParallel() to help copy files in parallel, and it will return all the paths of directories and files traversed. Thus when we copy manifest in ExportSnapshot, we can copy reference files concurrently and use the paths it returns to help setOwner and setPermission. The size of thread pool is determined by the configuration snapshot.export.copy.references.threads, and its default value is the number of runtime available processors. --- * [HBASE-18116](https://issues.apache.org/jira/browse/HBASE-18116) | *Major* | **Replication source in-memory accounting should not include bulk transfer hfiles** Before this change we would incorrectly include the size of enqueued store files for bulk replication in the calculation for determining whether or not to rate limit the transfer of WAL edits. Because bulk replication uses a separate and asynchronous mechanism for file transfer this could incorrectly limit the batch sizes for WAL replication if bulk replication in progress, with negative impact on latency and throughput. --- * [HBASE-20592](https://issues.apache.org/jira/browse/HBASE-20592) | *Minor* | **Create a tool to verify tables do not have prefix tree encoding** PreUpgradeValidator tool with DataBlockEncoding validator was added to verify cluster is upgradable to HBase 2. --- * [HBASE-20501](https://issues.apache.org/jira/browse/HBASE-20501) | *Blocker* | **Change the Hadoop minimum version to 2.7.1** HBase is no longer able to maintain compatibility with Apache Hadoop versions that are no longer receiving updates. This release raises the minimum supported version to Hadoop 2.7.1. Downstream users are strongly advised to upgrade to the latest Hadoop 2.7 maintenance release. Downstream users of earlier HBase versions are similarly advised to upgrade to Hadoop 2.7.1+. When doing so, it is especially important to follow the guidance from [the HBase Reference Guide's Hadoop section](http://hbase.apache.org/book.html#hadoop) on replacing the Hadoop artifacts bundled with HBase. --- * [HBASE-20601](https://issues.apache.org/jira/browse/HBASE-20601) | *Minor* | **Add multiPut support and other miscellaneous to PE** 1. Add multiPut support Set --multiPut=number to enable batchput(meanwhile, --autoflush need be set to false) 2. Add Connection Count support Added a new parameter connCount to PE. set --connCount=2 means all threads will share 2 connections. oneCon option and connCount option shouldn't be set at the same time. 3. Add avg RT and avg TPS/QPS statstic for all threads 4. Delete some redundant code Now RandomWriteTest is inherited from SequentialWrite. --- * [HBASE-20544](https://issues.apache.org/jira/browse/HBASE-20544) | *Blocker* | **downstream HBaseTestingUtility fails with invalid port** HBase now relies on an internal mechanism to determine when it is running a local hbase cluster meant for external interaction vs an encapsulated test. When created via the `HBaseTestingUtility`, ports for Master and RegionServer services and UIs will be set to random ports to allow for multiple parallel uses on a single machine. Normally when running a Standalone HBase Deployment (as described in the HBase Reference Guide) the ports will be picked according to the same defaults used in a full cluster set up. If you wish to instead use the random port assignment set `hbase.localcluster.assign.random.ports` to true. --- * [HBASE-20004](https://issues.apache.org/jira/browse/HBASE-20004) | *Minor* | **Client is not able to execute REST queries in a secure cluster** Added 'hbase.rest.http.allow.options.method' configuration property to allow user to decide whether Rest Server HTTP should allow OPTIONS method or not. By default it is enabled in HBase 2.1.0+ versions and in other versions it is disabled. Similarly 'hbase.thrift.http.allow.options.method' is added HBase 1.5, 2.1.0 and 3.0.0 versions. It is disabled by default. --- * [HBASE-20327](https://issues.apache.org/jira/browse/HBASE-20327) | *Minor* | **When qualifier is not specified, append and incr operation do not work (shell)** This change will enable users to perform append and increment operation with null qualifier via hbase-shell. --- * [HBASE-18842](https://issues.apache.org/jira/browse/HBASE-18842) | *Minor* | **The hbase shell clone\_snaphost command returns bad error message** When attempting to clone a snapshot but using a namespace that does not exist, the HBase shell will now correctly report the exception as caused by the passed namespace. Previously, the shell would report that the problem was an unknown namespace but it would claim the user provided table name was not found as a namespace. Both before and after this change the shell properly used the passed namespace to attempt to handle the request. --- * [HBASE-20406](https://issues.apache.org/jira/browse/HBASE-20406) | *Major* | **HBase Thrift HTTP - Shouldn't handle TRACE/OPTIONS methods** When configured to do thrift-over-http, the HBase Thrift API Server no longer accepts the HTTP methods TRACE nor OPTIONS. --- * [HBASE-20046](https://issues.apache.org/jira/browse/HBASE-20046) | *Major* | **Reconsider the implementation for serial replication** Now in replication we can make sure the order of pushing logs is same as the order of requests from client. Set the serial flag to true for a replication peer to enable this feature. --- * [HBASE-20159](https://issues.apache.org/jira/browse/HBASE-20159) | *Major* | **Support using separate ZK quorums for client** After HBASE-20159 we allow client to use different ZK quorums by introducing three new properties: hbase.client.zookeeper.quorum and hbase.client.zookeeper.property.clientPort to specify client zookeeper properties (note that the combination of these two properties should be different from the server ZK quorums), and hbase.client.zookeeper.observer.mode to indicate whether the client ZK nodes are in observer mode (false by default) HConstants.DEFAULT\_ZOOKEPER\_CLIENT\_PORT has been removed in HBase 3.0 and replaced by the correctly spelled DEFAULT\_ZOOKEEPER\_CLIENT\_PORT. --- * [HBASE-20242](https://issues.apache.org/jira/browse/HBASE-20242) | *Major* | **The open sequence number will grow if we fail to open a region after writing the max sequence id file** Now when opening a region, we will store the current max sequence id of the region to its max sequence id file instead of the 'next sequence id'. This could avoid the sequence id bumping when we fail to open a region, and also align to the behavior when we close a region. --- * [HBASE-19024](https://issues.apache.org/jira/browse/HBASE-19024) | *Critical* | **Configurable default durability for synchronous WAL** The default durability setting for the synchronous WAL is Durability.SYNC\_WAL, which triggers HDFS hflush() to flush edits to the datanodes. We also support Durability.FSYNC\_WAL, which instead triggers HDFS hsync() to flush \_and\_ fsync edits. This change introduces the new configuration setting "hbase.wal.hsync", defaulting to FALSE, that if set to TRUE changes the default durability setting for the synchronous WAL to FSYNC\_WAL. --- * [HBASE-19389](https://issues.apache.org/jira/browse/HBASE-19389) | *Critical* | **Limit concurrency of put with dense (hundreds) columns to prevent write handler exhausted** After HBASE-19389 we introduced a RegionServer self-protection mechanism to prevent write handler getting exhausted by high concurrency put with dense columns, mainly through two new properties: hbase.region.store.parallel.put.limit.min.column.count to decide what kind of put (with how many columns within a single column family) to limit (100 by default) and hbase.region.store.parallel.put.limit to limit the concurrency (10 by default). There's another property for advanced user and please check source and javadoc of StoreHotnessProtector for more details. --- * [HBASE-20148](https://issues.apache.org/jira/browse/HBASE-20148) | *Major* | **Make serial replication as a option for a peer instead of a table** A new method setSerial has been added to the interface ReplicationPeerConfigBuilder which is marked as IA.Public. This interface is not supposed to be implemented by client code, but if you do, this will be an incompatible change as you need to add this method to your implementation too. --- * [HBASE-19397](https://issues.apache.org/jira/browse/HBASE-19397) | *Major* | **Design procedures for ReplicationManager to notify peer change event from master** Introduce 5 procedures to do peer modifications: AddPeerProcedure RemovePeerProcedure UpdatePeerConfigProcedure EnablePeerProcedure DisablePeerProcedure The procedures are all executed with the following stage: 1. Call pre CP hook, if an exception is thrown then give up 2. Check whether the operation is valid, if not then give up 3. Update peer storage. Notice that if we have entered this stage, then we can not rollback any more. 4. Schedule sub procedures to refresh the peer config on every RS. 5. Do post cleanup if any. 6. Call post CP hook. The exception thrown will be ignored since we have already done the work. The procedure will hold an exclusive lock on the peer id, so now there is no concurrent modifications on a single peer. And now it is guaranteed that once the procedure is done, the peer modification has already taken effect on all RSes. Abstracte a storage layer for replication peer/queue manangement, and refactored the upper layer to remove zk related naming/code/comment. Add pre/postExecuteProcedures CP hooks to RegionServerObserver, and add permission check for executeProcedures method which requires the caller to be system user or super user. On rolling upgrade: just do not do any replication peer modifications during the rolling upgrading. There is no pb/layout changes on the peer/queue storage on zk. # HBASE 2.0.0 Release Notes These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements. --- * [HBASE-20464](https://issues.apache.org/jira/browse/HBASE-20464) | *Major* | **Disable IMC** Change the default so that on creation of new tables, In-Memory Compaction BASIC is NOT enabled. This change is in branch-2.0 only, not in branch-2. --- * [HBASE-20276](https://issues.apache.org/jira/browse/HBASE-20276) | *Blocker* | **[shell] Revert shell REPL change and document** The HBase shell now behaves as it did prior to the changes that started in HBASE-15965. Namely, some shell commands return values that may be further manipulated within the shell's IRB session. The command line option `--return-values` is no longer acted on by the shell since it now always behaves as it did when passed this parameter. Passing the option results in a harmless warning about this change. Users who wish to maintain the behavior seen in the 1.4.0-1.4.2 releases of the HBase shell should refer to the section _irbrc_ in the reference guide for how to configure their IRB session to avoid echoing expression results to the console. --- * [HBASE-18792](https://issues.apache.org/jira/browse/HBASE-18792) | *Blocker* | **hbase-2 needs to defend against hbck operations** As of HBase version 2.0, the hbck tool is significantly changed. In general, all Read-Only options are supported and can be be used safely. Most -fix/ -repair options are NOT supported. Please see usage below for details on which options are not supported: Usage: fsck [opts] {only tables} where [opts] are: -help Display help options (this) -details Display full report of all regions. -timelag \ Process only regions that have not experienced any metadata updates in the last \ seconds. -sleepBeforeRerun \ Sleep this many seconds before checking if the fix worked if run with -fix -summary Print only summary of the tables and status. -metaonly Only check the state of the hbase:meta table. -sidelineDir \ HDFS path to backup existing meta. -boundaries Verify that regions boundaries are the same between META and store files. -exclusive Abort if another hbck is exclusive or fixing. Datafile Repair options: (expert features, use with caution!) -checkCorruptHFiles Check all Hfiles by opening them to make sure they are valid -sidelineCorruptHFiles Quarantine corrupted HFiles. implies -checkCorruptHFiles Replication options -fixReplication Deletes replication queues for removed peers Metadata Repair options supported as of version 2.0: (expert features, use with caution!) -fixVersionFile Try to fix missing hbase.version file in hdfs. -fixReferenceFiles Try to offline lingering reference store files -fixHFileLinks Try to offline lingering HFileLinks -noHdfsChecking Don't load/check region info from HDFS. Assumes hbase:meta region info is good. Won't check/fix any HDFS issue, e.g. hole, orphan, or overlap -ignorePreCheckPermission ignore filesystem permission pre-check NOTE: Following options are NOT supported as of HBase version 2.0+. UNSUPPORTED Metadata Repair options: (expert features, use with caution!) -fix Try to fix region assignments. This is for backwards compatiblity -fixAssignments Try to fix region assignments. Replaces the old -fix -fixMeta Try to fix meta problems. This assumes HDFS region info is good. -fixHdfsHoles Try to fix region holes in hdfs. -fixHdfsOrphans Try to fix region dirs with no .regioninfo file in hdfs -fixTableOrphans Try to fix table dirs with no .tableinfo file in hdfs (online mode only) -fixHdfsOverlaps Try to fix region overlaps in hdfs. -maxMerge \ When fixing region overlaps, allow at most \ regions to merge. (n=5 by default) -sidelineBigOverlaps When fixing region overlaps, allow to sideline big overlaps -maxOverlapsToSideline \ When fixing region overlaps, allow at most \ regions to sideline per group. (n=2 by default) -fixSplitParents Try to force offline split parents to be online. -removeParents Try to offline and sideline lingering parents and keep daughter regions. -fixEmptyMetaCells Try to fix hbase:meta entries not referencing any region (empty REGIONINFO\_QUALIFIER rows) UNSUPPORTED Metadata Repair shortcuts -repair Shortcut for -fixAssignments -fixMeta -fixHdfsHoles -fixHdfsOrphans -fixHdfsOverlaps -fixVersionFile -sidelineBigOverlaps -fixReferenceFiles-fixHFileLinks -repairHoles Shortcut for -fixAssignments -fixMeta -fixHdfsHoles --- * [HBASE-19994](https://issues.apache.org/jira/browse/HBASE-19994) | *Major* | **Create a new class for RPC throttling exception, make it retryable.** A new RpcThrottlingException deprecates ThrottlingException. The new RpcThrottlingException is a retryable Exception that clients will retry when Rpc throttling quota is exceeded. The deprecated ThrottlingException is a nonretryable Exception. --- * [HBASE-20224](https://issues.apache.org/jira/browse/HBASE-20224) | *Blocker* | **Web UI is broken in standalone mode** Standalone webui was broken inadvertently by HBASE-20027. --- * [HBASE-18784](https://issues.apache.org/jira/browse/HBASE-18784) | *Major* | **Use of filesystem that requires hflush / hsync / append / etc should query outputstream capabilities** If HBase is run on top of Apache Hadoop libraries that support the needed APIs it will verify that underlying Filesystem implementations provide the needed durability mechanisms to safely operate. The needed APIs *should* be present in Hadoop 3 release and Hadoop 2 releases starting in the Hadoop 2.9 series. If the APIs are not available, HBase behaves as it has in previous releases (that is, it moves forward assuming such a check would pass). Where this check fails, it is unsafe to rely on HBase in a production setting. In the event of process or node failure, the HBase RegionServer process may fail to have access to all the data it previously wrote to its write ahead log, resulting in data loss. In the event of process or node failure, the HBase master process may lose all or part of the write ahead log that it relies on for cluster management operations, leaving the cluster in an inconsistent state that we aren't sure it could recover from. Notably, the LocalFileSystem implementation provided by Hadoop reports (accurately) via these new APIs that it can not provide the durability HBase needs to operate. As such, the current instructions for single-node HBase operation have been updated both with a) how to bypass this safety check and b) a strong warning about the dire consequences of doing so outside of a dev/test environment. --- * [HBASE-20219](https://issues.apache.org/jira/browse/HBASE-20219) | *Critical* | **An error occurs when scanning with reversed=true and loadColumnFamiliesOnDemand=true** Throws DoNotRetryIOException when you ask for a reverse scan loading adjacent column families on demand. Previous it threw IllegalStateException --- * [HBASE-20358](https://issues.apache.org/jira/browse/HBASE-20358) | *Minor* | **Fix bin/hbase thrift usage text** Cleanup usage message and command-line processing (no functional change). --- * [HBASE-20182](https://issues.apache.org/jira/browse/HBASE-20182) | *Blocker* | **Can not locate region after split and merge** Now if we hit a split parent when locating a region, we will skip to the next row and try again until the region does not contain our row. So there will be no RegionOfflineException for a split parent any more, instead, if the split children have not been onlined yet, i.e, we finally arrive at a region which does not contain our row, an IOException will be thrown. --- * [HBASE-20149](https://issues.apache.org/jira/browse/HBASE-20149) | *Critical* | **Purge dev javadoc from bin tarball (or make a separate tarball of javadoc)** We no longer include dev or dev test javadocs in our binary bundle. We still build them; they are just not included because they were half the size of the resultant tarball. Here is our story on javadoc as of this commit: \* apidocs - user facing main api javadocs. currently for a release line, published on website and linked from menu. included in the bin tarball \* devapidocs - hbase internal javadocs. currently for a release line, published on the website but not linked from the menu. no longer included in the bin tarball. \* testapidocs - user facing test scope api javadocs. currently for a release line, not published. included in the bin tarball. \* testdevapidocs - hbase internal test scope javadocs. currently for a release line, not published. no longer included in the bin tarball --- * [HBASE-18828](https://issues.apache.org/jira/browse/HBASE-18828) | *Blocker* | **[2.0] Generate CHANGES.txt** Moves us over to yetus releasedocmaker tooling generating CHANGES. CHANGES is not markdown (CHANGES.md) as opposed to CHANGES.txt. We've also added a new RELEASENOTES.md that lists JIRA release notes (courtesy of releasedocmaker). CHANGES/RELEASENOTES are current as of now. Will need a 'freshening' when we cut the RC. --- * [HBASE-14175](https://issues.apache.org/jira/browse/HBASE-14175) | *Critical* | **Adopt releasedocmaker for better generated release notes** We will use yetus releasedocmaker to make our changes doc from here on out. A CHANGELOG.md will replace our current CHANGES.txt. Adjacent, we'll keep up a RELEASENOTES.md doc courtesy of releasedocmaker. Over in HBASE-18828 is where we are working through steps for the RM integrating this new tooling. --- * [HBASE-16499](https://issues.apache.org/jira/browse/HBASE-16499) | *Critical* | **slow replication for small HBase clusters** Changed the default value for replication.source.ratio from 0.1 to 0.5. Which means now by default 50% of the total RegionServers in peer cluster(s) will participate in replication. --- * [HBASE-16459](https://issues.apache.org/jira/browse/HBASE-16459) | *Trivial* | **Remove unused hbase shell --format option** The HBase `shell` command no longer recognizes the option `--format`. Previously this option only recognized the default value of 'console'. The default value is now always used. --- * [HBASE-20259](https://issues.apache.org/jira/browse/HBASE-20259) | *Critical* | **Doc configs for in-memory-compaction and add detail to in-memory-compaction logging** Disables in-memory compaction as default. Adds logging of in-memory compaction configuration on creation. Adds a chapter to the refguide on this new feature. --- * [HBASE-20282](https://issues.apache.org/jira/browse/HBASE-20282) | *Major* | **Provide short name invocations for useful tools** \`hbase regionsplitter\` is a new short invocation for \`hbase org.apache.hadoop.hbase.util.RegionSplitter\` --- * [HBASE-20314](https://issues.apache.org/jira/browse/HBASE-20314) | *Major* | **Precommit build for master branch fails because of surefire fork fails** Upgrade surefire plugin to 2.21.0. --- * [HBASE-20130](https://issues.apache.org/jira/browse/HBASE-20130) | *Critical* | **Use defaults (16020 & 16030) as base ports when the RS is bound to localhost** When region servers bind to localhost (mostly in pseudo distributed mode), default ports (16020 & 16030) are used as base ports. This will support up to 9 instances of region servers by default with `local-regionservers.sh` script. If additional instances are needed, see the reference guide on how to deploy with a different range using the environment variables `HBASE_RS_BASE_PORT` and `HBASE_RS_INFO_BASE_PORT`. --- * [HBASE-20111](https://issues.apache.org/jira/browse/HBASE-20111) | *Critical* | **Able to split region explicitly even on shouldSplit return false from split policy** When a split is requested on a Region, the RegionServer hosting that Region will now consult the configured SplitPolicy for that table when determining if a split of that Region is allowed. When a split is disallowed (due to the Region not being OPEN or the SplitPolicy denying the request), the operation will \*not\* be implicitly retried as it has previously done. Users will need to guard against and explicitly retry region split requests which are denied by the system. --- * [HBASE-20223](https://issues.apache.org/jira/browse/HBASE-20223) | *Blocker* | **Use hbase-thirdparty 2.1.0** Moves commons-cli and commons-collections4 into the HBase thirdparty shaded jar which means that these are no longer generally available for users on the classpath. --- * [HBASE-19128](https://issues.apache.org/jira/browse/HBASE-19128) | *Major* | **Purge Distributed Log Replay from codebase, configurations, text; mark the feature as unsupported, broken.** Removes Distributed Log Replay feature. Disable the feature before upgrading. --- * [HBASE-19504](https://issues.apache.org/jira/browse/HBASE-19504) | *Major* | **Add TimeRange support into checkAndMutate** 1) checkAndMutate accept a TimeRange to query the specified cell 2) remove writeToWAL flag from Region#checkAndMutate since it is useless (this is a incompatible change) --- * [HBASE-20237](https://issues.apache.org/jira/browse/HBASE-20237) | *Critical* | **Put back getClosestRowBefore and throw UnknownProtocolException instead... for asynchbase client** Throw UnknownProtocolException if a client connects and tries to invoke the old getClosestRowOrBefore method. Pre-hbase-1.0.0 or asynchbase do this instead of using its replacement, the reverse Scan. getClosestRowOrBefore was implemented as a flag on Get. Before this patch though the flag was set, hbase2 were ignoring it. This made it look like a pre-1.0.0 client was 'working' but then it'd fail finding the appropriate Region for a client-specified row doing lookups into hbase:meta. --- * [HBASE-20247](https://issues.apache.org/jira/browse/HBASE-20247) | *Major* | **Set version as 2.0.0 in branch-2.0 in prep for first RC** Set version as 2.0.0 on branch-2.0. --- * [HBASE-20090](https://issues.apache.org/jira/browse/HBASE-20090) | *Major* | **Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run** When there is concurrent region split, MemStoreFlusher may not find flushable region if the only candidate region left hasn't received writes (resulting in 0 data size). After this JIRA, such scenario wouldn't trigger Precondition assertion (replaced by an if statement to see whether there is any flushable region). If there is no flushable region, a DEBUG log would appear in region server log, saying "Above memory mark but there is no flushable region". --- * [HBASE-19552](https://issues.apache.org/jira/browse/HBASE-19552) | *Major* | **update hbase to use new thirdparty libs** hbase-thirdparty libs have moved to o.a.h.thirdparty offset. Netty shading system property is no longer necessary. --- * [HBASE-20119](https://issues.apache.org/jira/browse/HBASE-20119) | *Minor* | **Introduce a pojo class to carry coprocessor information in order to make TableDescriptorBuilder accept multiple cp at once** 1) Make all methods in TableDescriptorBuilder be setter pattern. addCoprocessor -\> setCoprocessor addColumnFamily -\> setColumnFamily (addCoprocessor and addColumnFamily are still in branch-2 but they are marked as deprecated) 2) add CoprocessorDescriptor to carry cp information 3) add CoprocessorDescriptorBuilder to build CoprocessorDescriptor 4) TD disallow user to set negative priority to coprocessor since parsing the negative value will cause a exception --- * [HBASE-17165](https://issues.apache.org/jira/browse/HBASE-17165) | *Critical* | **Add retry to LoadIncrementalHFiles tool** Adds retry to load of incremental hfiles. Pertinent key is HConstants.HBASE\_CLIENT\_RETRIES\_NUMBER. Default is HConstants.DEFAULT\_HBASE\_CLIENT\_RETRIES\_NUMBER. --- * [HBASE-20108](https://issues.apache.org/jira/browse/HBASE-20108) | *Critical* | **\`hbase zkcli\` falls into a non-interactive prompt after HBASE-15199** This issue fixes a runtime dependency issues where JLine is not made available on the classpath which causes the ZooKeeper CLI to appear non-interactive. JLine was being made available unintentionally via the JRuby jar file on the classpath for the HBase shell. While the JRuby jar is not always present, the fix made here was to selectively include the JLine dependency on the zkcli command's classpath. --- * [HBASE-8770](https://issues.apache.org/jira/browse/HBASE-8770) | *Blocker* | **deletes and puts with the same ts should be resolved according to mvcc/seqNum** This behavior is available as a new feature. See HBASE-15968 release note. This issue is just about adding to the refguide documentation on the HBASE\_15968 feature. --- * [HBASE-19114](https://issues.apache.org/jira/browse/HBASE-19114) | *Major* | **Split out o.a.h.h.zookeeper from hbase-server and hbase-client** Splits out most of ZooKeeper related code into a separate new module: hbase-zookeeper. Also, renames some ZooKeeper related classes to follow a common naming pattern - "ZK" prefix - as compared to many different styles earlier. --- * [HBASE-19437](https://issues.apache.org/jira/browse/HBASE-19437) | *Critical* | **Batch operation can't handle the null result for Append/Increment** The result from server is changed from null to Result.EMPTY\_RESULT when Append/Increment operation can't retrieve any data from server, --- * [HBASE-17448](https://issues.apache.org/jira/browse/HBASE-17448) | *Major* | **Export metrics from RecoverableZooKeeper** Committed to master and branch-1 --- * [HBASE-19400](https://issues.apache.org/jira/browse/HBASE-19400) | *Major* | **Add missing security checks in MasterRpcServices** Added ACL check to following Admin functions: enableCatalogJanitor, runCatalogJanitor, cleanerChoreSwitch, runCleanerChore, execProcedure, execProcedureWithReturn, normalize, normalizerSwitch, coprocessorService. When ACL is enabled, only those with ADMIN rights will be able to invoke these operations successfully. --- * [HBASE-20048](https://issues.apache.org/jira/browse/HBASE-20048) | *Blocker* | **Revert serial replication feature** Revert the serial replication feature from all branches. Plan to reimplement it soon and land onto 2.1 release line. --- * [HBASE-19166](https://issues.apache.org/jira/browse/HBASE-19166) | *Blocker* | **AsyncProtobufLogWriter persists ProtobufLogWriter as class name for backward compatibility** For backward compatibility, AsyncProtobufLogWriter uses "ProtobufLogWriter" as writer class name and SecureAsyncProtobufLogWriter uses "SecureProtobufLogWriter" as writer class name. --- * [HBASE-18596](https://issues.apache.org/jira/browse/HBASE-18596) | *Blocker* | **[TEST] A hbase1 cluster should be able to replicate to a hbase2 cluster; verify** Replication between versions verified as basically working. 0.98.25-SNAPSHOT to beta-2 hbase2 and a 1.2-ish version tried. --- * [HBASE-20017](https://issues.apache.org/jira/browse/HBASE-20017) | *Blocker* | **BufferedMutatorImpl submit the same mutation repeatedly** This change fixes multithreading issues in the implementation of BufferedMutator. BufferedMutator should not be used with 1.4 releases prior to 1.4.2. --- * [HBASE-20032](https://issues.apache.org/jira/browse/HBASE-20032) | *Minor* | **Receving multiple warnings for missing reporting.plugins.plugin.version** Add (latest) version elements missing from reporting plugins in top-level pom. --- * [HBASE-19954](https://issues.apache.org/jira/browse/HBASE-19954) | *Major* | **Separate TestBlockReorder into individual tests to avoid ShutdownHook suppression error against hadoop3** hadoop3 minidfscluster removes all shutdown handlers when the cluster goes down which made this test that does FS-stuff fail (Fix was to break up the test so each test method ran with an unadulterated FS). --- * [HBASE-20014](https://issues.apache.org/jira/browse/HBASE-20014) | *Major* | **TestAdmin1 Times out** Ups the overall test timeout from 10 minutes to 13minutes. 15minutes is the surefire timeout. --- * [HBASE-20020](https://issues.apache.org/jira/browse/HBASE-20020) | *Critical* | **Make sure we throw DoNotRetryIOException when ConnectionImplementation is closed** Add checkClosed to core Client methods. Avoid unnecessary retry. --- * [HBASE-19978](https://issues.apache.org/jira/browse/HBASE-19978) | *Major* | **The keepalive logic is incomplete in ProcedureExecutor** Completes keep-alive logic and then enables it; ProcedureExecutor Workers will spin up more threads when need settling back to the core count after the burst in demand has passed. Default keep-alive is one minute. Default core-count is CPUs/4 or 16, which ever is greater. Maximum is an arbitrary core-count \* 10 (a limit that should never be hit and if it is, there is something else very wrong). --- * [HBASE-19950](https://issues.apache.org/jira/browse/HBASE-19950) | *Minor* | **Introduce a ColumnValueFilter** ColumnValueFilter provides a way to fetch matched cells only by providing specified column, value and a comparator, which is different from SingleValueFilter, fetching an entire row as soon as a matched cell found. --- * [HBASE-18294](https://issues.apache.org/jira/browse/HBASE-18294) | *Major* | **Reduce global heap pressure: flush based on heap occupancy** A region is flushed if its memory component exceeds the region flush threshold. A flush policy decides which stores to flush by comparing the size of the store to a column-family-flush threshold. If the overall size of all memstores in the machine exceeds the bounds defined by the administrator (denoted global pressure) a region is selected and flushed. HBASE-18294 changes flush decisions to be based on heap-occupancy and not data (key-value) size, consistently across levels. This rolls back some of the changes by HBASE-16747. Specifically, (1) RSs, Regions and stores track their overall on-heap and off-heap occupancy, (2) A region is flushed when its on-heap+off-heap size exceeds the region flush threshold specified in hbase.hregion.memstore.flush.size, (3) The store to be flushed is chosen based on its on-heap+off-heap size (4) At the RS level, a flush is triggered when the overall on-heap exceeds the on-heap limit, or when the overall off-heap size exceeds the off-heap limit (low/high water marks). Note that when the region flush size is set to XXmb a region flush may be triggered even before writing keys and values of size XX because the total heap occupancy of the region which includes additional metadata exceeded the threshold. --- * [HBASE-19116](https://issues.apache.org/jira/browse/HBASE-19116) | *Critical* | **Currently the tail of hfiles with CellComparator\* classname makes it so hbase1 can't open hbase2 written hfiles; fix** hbase-2.x sets KeyValue Comparators into the tail of hfiles rather than CellComparator, what it uses internally, just so hbase-1.x can continue to read hbase-2.x written hfiles. --- * [HBASE-19948](https://issues.apache.org/jira/browse/HBASE-19948) | *Major* | **Since HBASE-19873, HBaseClassTestRule, Small/Medium/Large has different semantic** In subtask, fixed doc and annotations to be more explicit that test timings are for the whole Test Fixture/Test Class/Test Suite NOT the test method only as we'd measuring up to this (tother subtasks untethered Categorization and test timeout such that all categories now have a ten minute timeout -- no test can run longer than ten minutes or it gets killed/timedout). --- * [HBASE-16060](https://issues.apache.org/jira/browse/HBASE-16060) | *Blocker* | **1.x clients cannot access table state talking to 2.0 cluster** By default, we mirror table state to zookeeper so hbase-1.x clients will work against an hbase-2 cluster (With this patch, hbase-1.x clients can do most Admin functions including table create; hbase-1.x clients can do all Table/DML against hbase-2 cluster). Flag to disable mirroring is hbase.mirror.table.state.to.zookeeper; set it to false in Configuration. Related, Master on startup will look to see if there are table state znodes left over by an hbase-1 instance. If any found, it will migrate the table state to hbase-2 setting the state into the hbase:meta table where table state is now kept. We will do this check on every Master start. Notion is that this will be overall beneficial with low impediment. To disable the migration check, set hbase.migrate.table.state.from.zookeeper to false. --- * [HBASE-19900](https://issues.apache.org/jira/browse/HBASE-19900) | *Critical* | **Region-level exception destroy the result of batch** This fix makes the following changes to how client handle the both of action result and region exception. 1) honor the action result rather than region exception. If the action have both of true result and region exception, the action is fine as the exception is caused by other actions which are in the same region. 2) honor the action exception rather than region exception. If the action have both of action exception and region exception, we deal with the action exception only. If we also handle the region exception for the same action, it will introduce the negative count of actions in progress. The AsyncRequestFuture#waitUntilDone will block forever. --- * [HBASE-19841](https://issues.apache.org/jira/browse/HBASE-19841) | *Major* | **Tests against hadoop3 fail with StreamLacksCapabilityException** HBaseTestingUtility now assumes that all clusters will use local storage until a MiniDFSCluster is started or assigned. --- * [HBASE-19528](https://issues.apache.org/jira/browse/HBASE-19528) | *Major* | **Major Compaction Tool** Tool allows you to compact a cluster with given concurrency of regionservers compacting at a given time. If tool completes successfully everything requested for compaction will be compacted, regardless of region moves, splits and merges. --- * [HBASE-19919](https://issues.apache.org/jira/browse/HBASE-19919) | *Major* | **Tidying up logging** (I thought this change innocuous but I made work for a co-worker when I upped interval between log cleaner runs -- meant a smoke test failed because we were slow doing an expected cleanup). Edit of log lines removing redundancy. Shorten thread names shown in log. Made some log TRACE instead of DEBUG. Capitalizations. Upped log cleaner interval from every minute to every ten minutes. hbase.master.cleaner.interval Lowered default count of threads started by Procedure Executor from count of CPUs to 1/4 of count of CPUs. --- * [HBASE-19901](https://issues.apache.org/jira/browse/HBASE-19901) | *Major* | **Up yetus proclimit on nightlies** Pass to yetus a dockermemlimit of 20G and a proclimit of 10000. Defaults are 4G and 1G respectively. --- * [HBASE-19912](https://issues.apache.org/jira/browse/HBASE-19912) | *Minor* | **The flag "writeToWAL" of Region#checkAndRowMutate is useless** Remove useless 'writeToWAL' flag of Region#checkAndRowMutate & related class --- * [HBASE-19911](https://issues.apache.org/jira/browse/HBASE-19911) | *Major* | **Convert some tests from small to medium because they are timing out: TestNettyRpcServer, TestClientClusterStatus, TestCheckTestClasses** Changed a few tests so they are medium sized rather than small size. Also, upped the time we wait on small tests to 60seconds from 30seconds. Small tests are tests that run in 15seconds or less. What we changed was the timeout watcher. It is now more lax, more tolerant of dodgy infrastructure that might be running tests slowly. --- * [HBASE-19892](https://issues.apache.org/jira/browse/HBASE-19892) | *Major* | **Checking 'patch attach' and yetus 0.7.0 and move to Yetus 0.7.0** Moved our internal yetus reference from 0.6.0 to 0.7.0. Concurrently, I changed hadoopqa to run with 0.7.0 (by editing the config in jenkins). --- * [HBASE-19873](https://issues.apache.org/jira/browse/HBASE-19873) | *Major* | **Add a CategoryBasedTimeout ClassRule for all UTs** Along with @category -- small, medium, large -- all hbase tests must now carry a ClassRule as follows: + @ClassRule + public static final HBaseClassTestRule CLASS\_RULE = + HBaseClassTestRule.forClass(TestInterfaceAudienceAnnotations.class); where the class changes by test. Currently the classrule enforces timeout for the whole test suite -- i.e. if a SmallTest Category then all the tests in the TestSuite must complete inside 60seconds, the timeout we set on SmallTest Category test suite -- but is meant to be a repository for general, runtime, hbase test facility. --- * [HBASE-19770](https://issues.apache.org/jira/browse/HBASE-19770) | *Critical* | **Add '--return-values' option to Shell to print return values of commands in interactive mode** Introduces a new option to the HBase shell: -r, --return-values. When the shell is in "interactive" mode (default), the return value of shell commands are not returned to the user as they dirty the console output. For those who desire this functionality, the "--return-values" option restores the old functionality of the commands passing their return value to the user. --- * [HBASE-15321](https://issues.apache.org/jira/browse/HBASE-15321) | *Major* | **Ability to open a HRegion from hdfs snapshot.** HRegion.openReadOnlyFileSystemHRegion() provides the ability to open HRegion from a read-only hdfs snapshot. Because hdfs snapshots are read-only, no cleanup happens when using this API. --- * [HBASE-17513](https://issues.apache.org/jira/browse/HBASE-17513) | *Critical* | **Thrift Server 1 uses different QOP settings than RPC and Thrift Server 2 and can easily be misconfigured so there is no encryption when the operator expects it.** This change fixes an issue where users could have unintentionally configured the HBase Thrift1 server to run without wire-encryption, when they believed they had configured the Thrift1 server to do so. --- * [HBASE-19828](https://issues.apache.org/jira/browse/HBASE-19828) | *Major* | **Flakey TestRegionsOnMasterOptions.testRegionsOnAllServers** Disables TestRegionsOnMasterOptions because Regions on Master does not work reliably; see HBASE-19831. --- * [HBASE-18963](https://issues.apache.org/jira/browse/HBASE-18963) | *Major* | **Remove MultiRowMutationProcessor and implement mutateRows... methods using batchMutate()** Modified HRegion.mutateRow() APIs to use batchMutate() instead of processRowsWithLocks() with MultiRowMutationProcessor. MultiRowMutationProcessor is removed to have single write path that uses batchMutate(). --- * [HBASE-19163](https://issues.apache.org/jira/browse/HBASE-19163) | *Major* | **"Maximum lock count exceeded" from region server's batch processing** When there are many mutations against the same row in a batch, as each mutation will acquire a shared row lock, it will exceed the maximum shared lock count the java ReadWritelock supports (64k). Along with other optimization, the batch is divided into multiple possible minibatches. A new config is added to limit the maximum number of mutations in the minibatch. \ \hbase.regionserver.minibatch.size\ \20000\ \ The default value is 20000. --- * [HBASE-19739](https://issues.apache.org/jira/browse/HBASE-19739) | *Minor* | **Include thrift IDL files in HBase binary distribution** Thrift IDLs are now shipped, bundled up in the respective hbase-\*thrift.jars (look for files ending in .thrift). --- * [HBASE-11409](https://issues.apache.org/jira/browse/HBASE-11409) | *Major* | **Add more flexibility for input directory structure to LoadIncrementalHFiles** Allows for users to bulk load entire tables from hdfs by specifying the parameter -loadTable. This allows you to pass in a table level directory and have all regions column families bulk loaded, if you do not specify the -loadTable parameter LoadIncrementalHFiles will work as before. Note: you must have a pre-created table to run with -loadTable it will not create one for you. --- * [HBASE-19769](https://issues.apache.org/jira/browse/HBASE-19769) | *Critical* | **IllegalAccessError on package-private Hadoop metrics2 classes in MapReduce jobs** Client-side ZooKeeper metrics which were added to 2.0.0 alpha/beta releases cause issues when launching MapReduce jobs via {{yarn jar}} on the command line. This stems from ClassLoader separation issues that YARN implements. It was chosen that the easiest solution was to remove these ZooKeeper metrics entirely. --- * [HBASE-19783](https://issues.apache.org/jira/browse/HBASE-19783) | *Minor* | **Change replication peer cluster key/endpoint from a not-null value to null is not allowed** To reduce the confusing behavior, now when you call updatePeerConfig with empty ClusterKey or ReplicationEndpointImpl, but the value of field of the to-be-updated ReplicationPeerConfig is not null, we will throw exception instead of ignoring them. --- * [HBASE-19483](https://issues.apache.org/jira/browse/HBASE-19483) | *Major* | **Add proper privilege check for rsgroup commands** This JIRA aims at refactoring AccessController, using ACL as core library in CPs. 1. Stripping out a public class AccessChecker from AccessController, using ACL as core library in CPs. AccessChecker don't have any dependency on anything CP related. Create it's instance from other CPS. 2. Change the default value of hbase.security.authorization to false. 3. Don't use CP hooks to check access in RSGroup. Use the access checker instance directly in functions of RSGroupAdminServiceImpl. --- * [HBASE-19358](https://issues.apache.org/jira/browse/HBASE-19358) | *Major* | **Improve the stability of splitting log when do fail over** After HBASE-19358 we introduced a new property hbase.split.writer.creation.bounded to limit the opening writers for each WALSplitter. If set to true, we won't open any writer for recovered.edits until the entries accumulated in memory reaching hbase.regionserver.hlog.splitlog.buffersize (which defaults at 128M) and will write and close the file in one go instead of keeping the writer open. It's false by default and we recommend to set it to true if your cluster has a high region load (like more than 300 regions per RS), especially when you observed obvious NN/HDFS slow down during hbase (single RS or cluster) failover. --- * [HBASE-19651](https://issues.apache.org/jira/browse/HBASE-19651) | *Minor* | **Remove LimitInputStream** HBase had copied from guava the file LmiitedInputStream. This commit removes the copied file in favor of (our internal, shaded) guava's ByteStreams.limit. Guava 14.0's LIS noted: "Use ByteStreams.limit(java.io.InputStream, long) instead. This class is scheduled to be removed in Guava release 15.0." --- * [HBASE-19691](https://issues.apache.org/jira/browse/HBASE-19691) | *Critical* | **Do not require ADMIN permission for obtaining ClusterStatus** This change reverts an unintentional requirement for global ADMIN permission to obtain cluster status from the active HMaster. --- * [HBASE-19486](https://issues.apache.org/jira/browse/HBASE-19486) | *Major* | ** Periodically ensure records are not buffered too long by BufferedMutator** The BufferedMutator now supports two settings that are used to ensure records do not stay too long in the buffer of a BufferedMutator. For periodically flushing the BufferedMutator there is now a "Timeout": "How old may the oldest record in the buffer be before we force a flush" and a "TimerTick": How often do we check if the timeout has been exceeded. Using these settings you can make the BufferedMutator automatically flush the write buffer if after the specified number of milliseconds no flush has occurred. This is mainly useful in streaming scenarios (i.e. writing data into HBase using Apache Flink/Beam/Storm) where it is common (especially in a test/development situation) to see small unpredictable bursts of data that need to be written into HBase. When using the BufferedMutator till now the effect was that records would remain in the write buffer until the buffer was full or an explicit flush was triggered. In practice this would mean that the 'last few records' of a burst would remain in the write buffer until the next burst arrives filling the buffer to capacity and thus triggering a flush. --- * [HBASE-19670](https://issues.apache.org/jira/browse/HBASE-19670) | *Major* | **Workaround: Purge User API building from branch-2 so can make a beta-1** Disable filtering of User API based off yetus annotation done in doclet. See parent issue for build failure currently being worked on but not done in time for a beta-1. --- * [HBASE-19282](https://issues.apache.org/jira/browse/HBASE-19282) | *Major* | **CellChunkMap Benchmarking and User Interface** When MSLAB is in use (that is the default config) , we will always use the CellChunkMap indexing variant for in memory flushed Immutable segments. When MSLAB is turned off, we will use CellAraryMap. These can not be changed with any configs. The in memory flush threshold been made to be default to 10% of region flush size. This can be turned using 'hbase.memstore.inmemoryflush.threshold.factor'. --- * [HBASE-19628](https://issues.apache.org/jira/browse/HBASE-19628) | *Major* | **ByteBufferCell should extend ExtendedCell** ByteBufferCell → ByteBufferExtendedCell MapReduceCell → MapReduceExtendedCell ByteBufferChunkCell → ByteBufferChunkKeyValue NoTagByteBufferChunkCell → NoTagByteBufferChunkKeyValue KeyOnlyByteBufferCell → KeyOnlyByteBufferExtendedCell TagRewriteByteBufferCell → TagRewriteByteBufferExtendedCell ValueAndTagRewriteByteBufferCell → ValueAndTagRewriteByteBufferExtendedCell EmptyByteBufferCell → EmptyByteBufferExtendedCell FirstOnRowByteBufferCell → FirstOnRowByteBufferExtendedCell LastOnRowByteBufferCell → LastOnRowByteBufferExtendedCell FirstOnRowColByteBufferCell → FirstOnRowColByteBufferExtendedCell FirstOnRowColTSByteBufferCell → FirstOnRowColTSByteBufferExtendedCell LastOnRowColByteBufferCell → LastOnRowColByteBufferCell OffheapDecodedCell → OffheapDecodedExtendedCell --- * [HBASE-19576](https://issues.apache.org/jira/browse/HBASE-19576) | *Major* | **Introduce builder for ReplicationPeerConfig and make it immutable** Add a ReplicationPeerConfigBuilder to create ReplicationPeerConfig and make ReplicationPeerConfig immutable. Meanwhile, deprecated set\* methods in ReplicationPeerConfig. --- * [HBASE-10092](https://issues.apache.org/jira/browse/HBASE-10092) | *Critical* | **Move to slf4j** We now have slf4j as our front-end. Be careful adding logging from here on out; make sure it slf4j. From here on out, as us devs go, we need to convert log messages from being 'guarded' -- i.e. surrounded by if (LOG.isDebugEnabled...) -- to instead being parameterized log messages. e.g. the latter rather than the former in the below: logger.debug("The new entry is "+entry+"."); logger.debug("The new entry is {}.", entry); See [1] for background on perf benefits. Note, FATAL log level is not present in slf4j. It is noted as a Marker but won't show in logs as a LEVEL. 1. https://www.slf4j.org/faq.html#logging\_performance --- * [HBASE-19148](https://issues.apache.org/jira/browse/HBASE-19148) | *Blocker* | **Reevaluate default values of configurations** Removed unused hbase.fs.tmp.dir from hbase-default.xml. Upped hbase.master.fileSplitTimeout from 30s to 10minutes (suggested by production experience) Added note that handler-count should be ~CPU count. hbase.regionserver.logroll.multiplier has been changed from 0.95 to 0.5 AND the default block size has been doubled. A few of the core configs are now dumped to the log on startup. --- * [HBASE-19492](https://issues.apache.org/jira/browse/HBASE-19492) | *Major* | **Add EXCLUDE\_NAMESPACE and EXCLUDE\_TABLECFS support to replication peer config** Add two new field: EXCLUDE\_NAMESPACE and EXCLUDE\_TABLECFS to replication peer config. If replicate\_all flag is true, it means all user tables will be replicated to peer cluster. Then allow config exclude namespaces or exclude table-cfs which can't be replicated to peer cluster. If replicate\_all flag is false, it means all user tables can't be replicated to peer cluster. Then allow to config namespaces or table-cfs which will be replicated to peer cluster. --- * [HBASE-19494](https://issues.apache.org/jira/browse/HBASE-19494) | *Major* | **Create simple WALKey filter that can be plugged in on the Replication Sink** Adds means of adding very basic filter on the sink side of replication. We already have a means of installing filter source-side, which is better place to filter edits before they are shipped over the network, but this facility is needed by hbase-indexer. Set hbase.replication.sink.walentrysinkfilter with a no-param Constructor implementation. See test in patch for example. --- * [HBASE-19112](https://issues.apache.org/jira/browse/HBASE-19112) | *Blocker* | **Suspect methods on Cell to be deprecated** Adds method Cell#getType which returns enum describing Cell Type. Deprecates the following Cell methods: getTypeByte getSequenceId getTagsArray getTagsOffset getTagsLength CPs trying to build cells should use RawCellBuilderFactory that supports building cells with tags. --- * [HBASE-14790](https://issues.apache.org/jira/browse/HBASE-14790) | *Major* | **Implement a new DFSOutputStream for logging WAL only** Implement a FanOutOneBlockAsyncDFSOutput for writing WAL only, the WAL provider which uses this class is AsyncFSWALProvider. It is based on netty, and will write to 3 DNs at the same time concurrently(fan-out) so generally it will lead to a lower latency. And it is also fail-fast, the stream will become unwritable immediately after there are any read/write errors, no pipeline recovery. You need to call recoverLease to force close the output for this case. And it only supports to write a file with a single block. For WAL this is a good behavior as we can always open a new file when the old one is broken. The performance analysis in HBASE-16890 shows that it has a better performance. Behavior changes: 1. As now we write to 3 DNs concurrently, according to the visibility guarantee of HDFS, the data will be available immediately when arriving at DN since all the DNs will be considered as the last one in pipeline. This means replication may read uncommitted data and replicate it to the remote cluster and cause data inconsistency. HBASE-14004 is used to solve the problem. 2. There will be no sync failure. When the output is broken, we will open a new file and write all the unacked wal entries to the new file. This means that we may have duplicated entries in wal files. HBASE-14949 is used to solve this problem. --- * [HBASE-15536](https://issues.apache.org/jira/browse/HBASE-15536) | *Critical* | **Make AsyncFSWAL as our default WAL** Now the default WALProvider is AsyncFSWALProvider, i.e. 'asyncfs'. If you want to change back to use FSHLog, please add this in hbase-site.xml {code} \ \hbase.wal.provider\ \filesystem\ \ {code} If you want to use FSHLog with multiwal, please add this in hbase-site.xml {code} \ \hbase.wal.regiongrouping.delegate.provider\ \filesystem\ \ {code} This patch also sets hbase.wal.async.use-shared-event-loop to false so WAL has its own netty event group. --- * [HBASE-19462](https://issues.apache.org/jira/browse/HBASE-19462) | *Major* | **Deprecate all addImmutable methods in Put** Deprecates Put#addImmutable as of release 2.0.0, this will be removed in HBase 3.0.0. Use {@link #add(Cell)} and {@link org.apache.hadoop.hbase.CellBuilder} instead --- * [HBASE-19213](https://issues.apache.org/jira/browse/HBASE-19213) | *Minor* | **Align check and mutate operations in Table and AsyncTable** In Table interface deprecate checkAndPut, checkAndDelete and checkAndMutate methods. Similarly to AsyncTable a new method was added to replace the deprecated ones: CheckAndMutateBuilder checkAndMutate(byte[] row, byte[] family) with CheckAndMutateBuilder interface which can be used to construct the checkAnd\*() operations. --- * [HBASE-19134](https://issues.apache.org/jira/browse/HBASE-19134) | *Major* | **Make WALKey an Interface; expose Read-Only version to CPs** Made WALKey an Interface and added a WALKeyImpl implementation. WALKey comes through to Coprocessors. WALKey is read-only. --- * [HBASE-18169](https://issues.apache.org/jira/browse/HBASE-18169) | *Blocker* | **Coprocessor fix and cleanup before 2.0.0 release** Refactor of Coprocessor API for hbase2. Purged methods that exposed too much of our internals. Other hooks were recast so they no longer took or returned internal classes; instead we pass Interfaces or read-only versions of implementations. Here is some overview doc on changes in hbase2 for Coprocessors including detail on why the change was made: https://github.com/apache/hbase/blob/branch-2.0/dev-support/design-docs/Coprocessor\_Design\_Improvements-Use\_composition\_instead\_of\_inheritance-HBASE-17732.adoc --- * [HBASE-19301](https://issues.apache.org/jira/browse/HBASE-19301) | *Major* | **Provide way for CPs to create short circuited connection with custom configurations** Provided a way for the CP users to create a short circuitable connection with custom configs. createConnection(Configuration) is added to MasterCoprocessorEnvironment, RegionServerCoprocessorEnvironment and RegionCoprocessorEnvironment. The getConnection() method already available in these Env interfaces returns the cluster connection used by the server (which the server also uses) where as this new method will create a new connection on request. The difference from connection created using ConnectionFactory APIs is that this connection can short circuit the calls to same server avoiding the RPC paths. The connection will NOT be cached/maintained by server. That should be done the CPs. Be careful creating Connections out of a Coprocessor. See the javadoc on these createConnection and getConnection. --- * [HBASE-19357](https://issues.apache.org/jira/browse/HBASE-19357) | *Major* | **Bucket cache no longer L2 for LRU cache** Removed cacheDataInL1 option for HCD BucketCache is no longer the L2 for LRU on heap cache. When BC is used, data blocks will be strictly on BC only where as index/bloom blocks are on LRU L1 cache. Config 'hbase.bucketcache.combinedcache.enabled' is removed. There is no way set combined mode = false. Means make BC as victim handler for LRU cache. This will be one more noticeable change when one uses BucketCache in File mode. Then the system table's data block(Including the META table) will be cached in Bucket Cache files only. Plain scan from META files alone test reveal that the throughput of file mode BC is almost half only. But for META entries we have RegionLocation cache at client side connections. So this would not be a big concern in a real cluster usage. Will check more on this and probably fix even when we do tiered BucketCache. --- * [HBASE-19430](https://issues.apache.org/jira/browse/HBASE-19430) | *Major* | **Remove the SettableTimestamp and SettableSequenceId** All the cells which are used in server side are of ExtendedCell now. --- * [HBASE-19295](https://issues.apache.org/jira/browse/HBASE-19295) | *Major* | **The Configuration returned by CPEnv should be read-only.** CoprocessorEnvironment#getConfiguration returns a READ-ONLY Configuration. Attempts at altering the returned Configuration -- whether setting or adding resources -- will result in an IllegalStateException warning of the Read-only condition of the returned Configuration. --- * [HBASE-19410](https://issues.apache.org/jira/browse/HBASE-19410) | *Major* | **Move zookeeper related UTs to hbase-zookeeper and mark them as ZKTests** There is a new HBaseZKTestingUtility which can only start a mini zookeeper cluster. And we will publish sources for test-jar for all modules. --- * [HBASE-19323](https://issues.apache.org/jira/browse/HBASE-19323) | *Major* | **Make netty engine default in hbase2** NettyRpcServer is now our default RPC server replacing SimpleRpcServer. --- * [HBASE-19426](https://issues.apache.org/jira/browse/HBASE-19426) | *Major* | **Move has() and setTimestamp() to Mutation** Moves #has and #setTimestamp back up to Mutation from the subclass Put so available to other Mutation implementations. --- * [HBASE-19384](https://issues.apache.org/jira/browse/HBASE-19384) | *Critical* | **Results returned by preAppend hook in a coprocessor are replaced with null from other coprocessor even on bypass** When a coprocessor sets 'bypass', we will skip calling subsequent Coprocessors that may be stacked-up on the method invocation; e.g. if a prePut has three coprocessors hooked up, if the first coprocessor decides to set 'bypass', we will not call the two subsequent coprocessors (this is similar to the 'complete' functionality that was in hbase1, removed in hbase2). --- * [HBASE-19408](https://issues.apache.org/jira/browse/HBASE-19408) | *Trivial* | **Remove WALActionsListener.Base** 1) remove the WALActionsListener.Base 2) provide default method implementation to WALActionsListener The person who want to receive the notification of WAL events should implements the WALActionsListener rather than WALActionsListener.Base. --- * [HBASE-19339](https://issues.apache.org/jira/browse/HBASE-19339) | *Critical* | **Eager policy results in the negative size of memstore** Enable TestAcidGuaranteesWithEagerPolicy and TestAcidGuaranteesWithAdaptivePolicy --- * [HBASE-19336](https://issues.apache.org/jira/browse/HBASE-19336) | *Major* | **Improve rsgroup to allow assign all tables within a specified namespace by only writing namespace** Add two new shell cmd. move\_namespaces\_rsgroup is used to reassign tables of specified namespaces from one RegionServer group to another. move\_servers\_namespaces\_rsgroup is used to reassign regionServers and tables of specified namespaces from one group to another. --- * [HBASE-19285](https://issues.apache.org/jira/browse/HBASE-19285) | *Critical* | **Add per-table latency histograms** Per-RegionServer table latency histograms have been returned to HBase (after being removed due to impacting performance). These metrics are exposed via a new JMX bean "TableLatencies" with the typical naming conventions: namespace, table, and histogram component. --- * [HBASE-19359](https://issues.apache.org/jira/browse/HBASE-19359) | *Major* | **Revisit the default config of hbase client retries number** The default value of hbase.client.retries.number was 35. It is now 10. And for server side, the default hbase.client.serverside.retries.multiplier was 10. So the server side retries number was 35 \* 10 = 350. It is now 3. --- * [HBASE-18090](https://issues.apache.org/jira/browse/HBASE-18090) | *Major* | **Improve TableSnapshotInputFormat to allow more multiple mappers per region** In this task, we make it possible to run multiple mappers per region in the table snapshot. The following code is primary table snapshot mapper initializatio: TableMapReduceUtil.initTableSnapshotMapperJob( snapshotName, // The name of the snapshot (of a table) to read from scan, // Scan instance to control CF and attribute selection mapper, // mapper outputKeyClass, // mapper output key outputValueClass, // mapper output value job, // The current job to adjust true, // upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars) restoreDir, // a temporary directory to copy the snapshot files into ); The job only run one map task per region in the table snapshot. With this feature, client can specify the desired num of mappers when init table snapshot mapper job: TableMapReduceUtil.initTableSnapshotMapperJob( snapshotName, // The name of the snapshot (of a table) to read from scan, // Scan instance to control CF and attribute selection mapper, // mapper outputKeyClass, // mapper output key outputValueClass, // mapper output value job, // The current job to adjust true, // upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars) restoreDir, // a temporary directory to copy the snapshot files into splitAlgorithm, // splitAlgo algorithm to split, current split algorithms support RegionSplitter.UniformSplit() and RegionSplitter.HexStringSplit() n // how many input splits to generate per one region ); --- * [HBASE-19035](https://issues.apache.org/jira/browse/HBASE-19035) | *Major* | **Miss metrics when coprocessor use region scanner to read data** 1. Move read requests count to region level. Because RegionScanner is exposed to CP. 2. Update write requests count in processRowsWithLocks. 3. Remove requestRowActionCount in RSRpcServices. This metric can be computed by region's readRequestsCount and writeRequestsCount. --- * [HBASE-19318](https://issues.apache.org/jira/browse/HBASE-19318) | *Critical* | **MasterRpcServices#getSecurityCapabilities explicitly checks for the HBase AccessController implementation** Fixes an issue with loading customer coprocessor endpoint implementations inside of the HBase Master which breaks Apache Ranger. --- * [HBASE-19092](https://issues.apache.org/jira/browse/HBASE-19092) | *Critical* | **Make Tag IA.LimitedPrivate and expose for CPs** This JIRA aims at exposing Tags for Coprocessor usage. Tag interface is now exposed to Coprocessors and CPs can make use of this interface to create their own Tags. RawCell is a new interface that is a subtype of Cell and that is exposed to CPs. RawCell has the following APIs List\ getTags() Optional\ getTag(byte type) byte[] cloneTags() The above APIs helps to read tags from the Cell. CellUtil#createCell(Cell cell, List\ tags) CellUtil#createCell(Cell cell, byte[] tags) CellUtil#createCell(Cell cell, byte[] value, byte[] tags) are deprecated. If CPs want to create a cell with Tags they can use the RegionCoprocessorEnvironment#getCellBuilder() that returns an ExtendedCellBuilder. Using ExtendedCellBuilder the CP can create Cells with Tags. Other helper methods to work on Tags are available as static APIs in Tag interface. --- * [HBASE-19266](https://issues.apache.org/jira/browse/HBASE-19266) | *Minor* | **TestAcidGuarantees should cover adaptive in-memory compaction** separate the TestAcidGuarantees by the policy: 1) NONE -\> TestAcidGuaranteesWithNoInMemCompaction 2) BASIC -\> TestAcidGuaranteesWithBasicPolicy 3) EAGER -\> TestAcidGuaranteesWithEagerPolicy 4) ADAPTIVE -\> TestAcidGuaranteesWithAdaptivePolicy TestAcidGuaranteesWithEagerPolicy and TestAcidGuaranteesWithAdaptivePolicy are disabled by default as the eager policy may cause the negative size of memstore. --- * [HBASE-16868](https://issues.apache.org/jira/browse/HBASE-16868) | *Critical* | **Add a replicate\_all flag to avoid misuse the namespaces and table-cfs config of replication peer** Add a replicate\_all flag to replication peer config. The default value is true, which means all user tables (REPLICATION\_SCOPE != 0 ) will be replicated to peer cluster. How to config a peer from replicate all to only replicate special namespace/tablecfs? Step1. Add a new peer with no namespace/tablecfs config, the replicate\_all flag will be true automatically. Step2. User want only replicate some namespaces or tables, so set replicate\_all flag to false first. Step3. Add special namespaces or table-cfs config to the replication peer. How to config a peer from replicate special namespace/tablecfs to replicate all? Step1. Add a new peer with special namespace/tablecfs config, the replicate\_all flag will be false automatically. Step2. User want replicate all user tables, so remove the special namespace/tablecfs config first. Step3. Set replicate\_all flag to true. How to config replicate nothing? Set replicate\_all flag to false and no namespace/tablecfs config, then all tables cannot be replicated to peer cluster. --- * [HBASE-19122](https://issues.apache.org/jira/browse/HBASE-19122) | *Critical* | **preCompact and preFlush can bypass by returning null scanner; shut it down** Remove the ability to 'bypass' preFlush and preCompact by returning a null Scanner. Bypass is disallowed on these methods in hbase2. --- * [HBASE-19200](https://issues.apache.org/jira/browse/HBASE-19200) | *Major* | **make hbase-client only depend on ZKAsyncRegistry and ZNodePaths** ConnectionImplementation now uses asynchronous connections to zookeeper via ZKAsyncRegistry to get cluster id, master address, meta region location, etc. Since ZKAsyncRegistry uses curator framework, this change purges a lot of zookeeper dependencies in hbase-client. Now hbase-client only depends on only ZKAsyncRegistry, ZNodePaths and the newly introduced ZKMetadata. --- * [HBASE-19311](https://issues.apache.org/jira/browse/HBASE-19311) | *Major* | **Promote TestAcidGuarantees to LargeTests and start mini cluster once to make it faster** Introduce a AcidGuaranteesTestTool and expose as tool instead of TestAcidGuarantees. Now TestAcidGuarantees is just a UT. --- * [HBASE-19293](https://issues.apache.org/jira/browse/HBASE-19293) | *Major* | **Support adding a new replication peer in disabled state** Add a boolean parameter which means the new replication peer's state is enabled or disabled for Admin/AsyncAdmin's addReplicationPeer method. Meanwhile, you can use shell cmd to add a enabled/disabled replication peer. The STATE parameter is optional and the default state is enabled. hbase\> add\_peer '1', CLUSTER\_KEY =\> "server1.cie.com:2181:/hbase", STATE =\> "ENABLED" hbase\> add\_peer '1', CLUSTER\_KEY =\> "server1.cie.com:2181:/hbase", STATE =\> "DISABLED" --- * [HBASE-19123](https://issues.apache.org/jira/browse/HBASE-19123) | *Major* | **Purge 'complete' support from Coprocesor Observers** This issue removes the 'complete' facility that was in ObserverContext. It is no longer possible for a Coprocessor to cut the chain-of-invocation and insist its response prevails. --- * [HBASE-18911](https://issues.apache.org/jira/browse/HBASE-18911) | *Major* | **Unify Admin and AsyncAdmin's methods name** Deprecated 4 methods for Admin interface. Deprecated compactRegionServer(ServerName, boolean). Use compactRegionServer(ServerName) and majorCompactcompactRegionServer(ServerName) instead. Deprecated getRegionLoad(ServerName) method. Use getRegionLoads(ServerName) instead. Deprecated getRegionLoad(ServerName, TableName) method. Use getRegionLoads(ServerName, TableName) instead. Deprecated getQuotaRetriever(QuotaFilter) instead. Use getQuota(QuotaFilter) instead. Add 7 methods for Admin interface. ServerName getMaster(); Collection\ getBackupMasters(); Collection\ getRegionServers(); boolean splitSwitch(boolean enabled, boolean synchronous); boolean mergeSwitch(boolean enabled, boolean synchronous); boolean isSplitEnabled(); boolean isMergeEnabled(); --- * [HBASE-18703](https://issues.apache.org/jira/browse/HBASE-18703) | *Critical* | **Inconsistent behavior for preBatchMutate in doMiniBatchMutate and processRowsWithLocks** Two write paths Region.batchMutate() and Region.mutateRows() are unified and inconsistencies are resolved. --- * [HBASE-18964](https://issues.apache.org/jira/browse/HBASE-18964) | *Major* | **Deprecate RowProcessor and processRowsWithLocks() APIs that take RowProcessor as an argument** RowProcessor and Region#processRowsWithLocks() methods that take RowProcessor as an argument are deprecated. Use Coprocessors if you want to customize handling. --- * [HBASE-19251](https://issues.apache.org/jira/browse/HBASE-19251) | *Major* | **Merge RawAsyncTable and AsyncTable** Merge the RawAsyncTable and AsyncTable interfaces. Use generic to reflection the difference between the observer style scan API. For the implementation which does not have a user specified thread pool, the observer is AdvancedScanResultConsumer. For the implementation which needs a user specified thread pool, the observer is ScanResultConsumer. --- * [HBASE-19262](https://issues.apache.org/jira/browse/HBASE-19262) | *Major* | **Revisit checkstyle rules** Change the import order rule that now we should put the shaded import at bottom. Ignore the VisibilityModifier warnings for test code. --- * [HBASE-19187](https://issues.apache.org/jira/browse/HBASE-19187) | *Minor* | **Remove option to create on heap bucket cache** Removing the on heap Bucket cache feature. The config "hbase.bucketcache.ioengine" no longer support the 'heap' value. Its supported values now are 'offheap', 'file:\', 'files:\' and 'mmap:\' --- * [HBASE-12350](https://issues.apache.org/jira/browse/HBASE-12350) | *Minor* | **Backport error-prone build support to branch-1 and branch-2** This change introduces compile time support for running the error-prone suite of static analyses. Enable with -PerrorProne on the Maven command line. Requires JDK 8 or higher. (Don't enable if building with JDK 7.) --- * [HBASE-14350](https://issues.apache.org/jira/browse/HBASE-14350) | *Blocker* | **Procedure V2 Phase 2: Assignment Manager** (Incomplete) = Incompatbiles == Coprocessor Incompatibilities Split/Merge have moved to the Master; it runs them now. Means hooks around Split/Merge are now noops. To intercept Split/Merge phases, CPs need to intercept on MasterObserver. --- * [HBASE-19189](https://issues.apache.org/jira/browse/HBASE-19189) | *Major* | **Ad-hoc test job for running a subset of tests lots of times** Folks can now test out tests on an arbitrary release branch. Head over to [builds.a.o job "HBase-adhoc-run-tests"](https://builds.apache.org/view/H-L/view/HBase/job/HBase-adhoc-run-tests/), then pick "Build with parameters". Tests are specified as just names e.g. TestLogRollingNoCluster. can also be a glob. e.g. TestHFile* --- * [HBASE-19220](https://issues.apache.org/jira/browse/HBASE-19220) | *Major* | **Async tests time out talking to zk; 'clusterid came back null'** Changed retries from 3 to 30 for zk initial connect for registry. --- * [HBASE-19002](https://issues.apache.org/jira/browse/HBASE-19002) | *Minor* | **Introduce more examples to show how to intercept normal region operations** With the change in Coprocessor APIs, the hbase-examples module has been updated to provide additional examples that show how to write Coprocessors against the new API. --- * [HBASE-18961](https://issues.apache.org/jira/browse/HBASE-18961) | *Major* | **doMiniBatchMutate() is big, split it into smaller methods** HRegion.batchMutate()/ doMiniBatchMutate() is refactored with aim to unify batchMutate() and mutateRows() code paths later. batchMutate() currently handles 2 types of batches: MutationBatchOperations and ReplayBatchOperations. Common base class BatchOperations is augmented with common methods which are overridden in derived classes as needed. doMiniBatchMutate() is implemented using common methods in base class BatchOperations. --- * [HBASE-19103](https://issues.apache.org/jira/browse/HBASE-19103) | *Minor* | **Add BigDecimalComparator for filter** If BigDecimal is stored as value, and you need to add a matched comparator to the value filter when scanning, a BigDecimalComparator can be used. --- * [HBASE-19111](https://issues.apache.org/jira/browse/HBASE-19111) | *Critical* | **Add missing CellUtil#isPut(Cell) methods** A new public API method was added to CellUtil "isPut(Cell)" for clients to use to determine if the Cell is for a Put operation. Additionally, other CellUtil API calls which expose Cell-implementation were marked as deprecated and will be removed in a future version. --- * [HBASE-19160](https://issues.apache.org/jira/browse/HBASE-19160) | *Critical* | **Re-expose CellComparator** CellComparator is now InterfaceAudience.Public --- * [HBASE-19131](https://issues.apache.org/jira/browse/HBASE-19131) | *Major* | **Add the ClusterStatus hook and cleanup other hooks which can be replaced by ClusterStatus hook** 1) Add preGetClusterStatus() and postGetClusterStatus() hooks 2) add preGetClusterStatus() to access control check - an admin action --- * [HBASE-19095](https://issues.apache.org/jira/browse/HBASE-19095) | *Major* | **Add CP hooks in RegionObserver for in memory compaction** Add 4 methods in RegionObserver: preMemStoreCompaction preMemStoreCompactionCompactScannerOpen preMemStoreCompactionCompact postMemStoreCompaction preMemStoreCompaction and postMemStoreCompaction will always be called for all in memory compactions. Under eager mode, preMemStoreCompactionCompactScannerOpen will be called before opening store scanner to allow you changing the max versions and TTL, and preMemStoreCompactionCompact will be called after the creation to let you do wrapping. --- * [HBASE-19152](https://issues.apache.org/jira/browse/HBASE-19152) | *Trivial* | **Update refguide 'how to build an RC' and the make\_rc.sh script** The make\_rc.sh script can run an hbase2 build now generating tarballs and pushing up to maven repository. TODO: Sign and checksum, check tarball, push to apache dist..... --- * [HBASE-19179](https://issues.apache.org/jira/browse/HBASE-19179) | *Critical* | **Remove hbase-prefix-tree** Purged the hbase-prefix-tree module and all references from the code base. prefix-tree data block encoding was a super cool experimental feature that saw some usage initially but has since languished. If interested in carrying this sweet facility forward, write the dev list and we'll restore this module. --- * [HBASE-19176](https://issues.apache.org/jira/browse/HBASE-19176) | *Major* | **Remove hbase-native-client from branch-2** Removed the hbase-native-client module from branch-2 (it is still in Master). It is not complete. Look for a finished C++ client in the near future. Will restore native client to branch-2 at that point. --- * [HBASE-19144](https://issues.apache.org/jira/browse/HBASE-19144) | *Major* | **[RSgroups] Retry assignments in FAILED\_OPEN state when servers (re)join the cluster** When regionserver placement groups (RSGroups) is active, as servers join the cluster the Master will attempt to reassign regions in FAILED\_OPEN state. --- * [HBASE-18770](https://issues.apache.org/jira/browse/HBASE-18770) | *Critical* | **Remove bypass method in ObserverContext and implement the 'bypass' logic case by case** Removes blanket bypass mechanism (Observer#bypass). Instead, a curated subset of methods are bypassable. Changes Coprocessor ObserverContext 'bypass' semantic. We flip the default so bypass is NOT supported on Observer invocations; only a couple of preXXX methods in RegionObserver allow it: e.g. preGet and prePut but not preFlush, etc. Everywhere else, we throw a Exception if a Coprocessor Observer tries to invoke bypass. Master Observers can no longer stop or change move, split, assign, create table, etc. preBatchMutate can no longer be bypassed (bypass the finer-grained prePut, preDelete, etc. instead) Ditto on complete, the mechanism that allowed a Coprocessor rule that all subsequent Coprocessors are skipped in an invocation chain; now, complete is only available to bypassable methods (and Coprocessors will get an exception if they try to 'complete' when it is not allowed). See javadoc for whether a Coprocessor Observer method supports 'bypass'. If no mention, 'bypass' is NOT supported. The below methods have been marked deprecated in hbase2. We would have liked to have removed them because they use IA.Private parameters but they are in use by CoreCoprocessors or are critical to downstreamers and we have no alternatives to provide currently. @Deprecated public boolean prePrepareTimeStampForDeleteVersion(final Mutation mutation, final Cell kv, final byte[] byteNow, final Get get) throws IOException { @Deprecated public boolean preWALRestore(final RegionInfo info, final WALKey logKey, final WALEdit logEdit) throws IOException { @Deprecated public void postWALRestore(final RegionInfo info, final WALKey logKey, final WALEdit logEdit) throws IOException { @Deprecated public DeleteTracker postInstantiateDeleteTracker(DeleteTracker result) throws IOException Metrics are updated now even if the Coprocessor does a bypass; e.g. The put count is updated even if a Coprocessor bypasses the core put operation (We do it this way so no need for Coprocessors to have access to our core metrics system). --- * [HBASE-19033](https://issues.apache.org/jira/browse/HBASE-19033) | *Blocker* | **Allow CP users to change versions and TTL before opening StoreScanner** Add back the three methods without a return value: preFlushScannerOpen preCompactScannerOpen preStoreScannerOpen Introduce a ScanOptions interface to let CP users change the max versions and TTL of a ScanInfo. It will be passed as a parameter in the three methods above. Inntroduce a new example WriteHeavyIncrementObserver which convert increment to put and do aggregating when get. It uses the above three methods. --- * [HBASE-19110](https://issues.apache.org/jira/browse/HBASE-19110) | *Minor* | **Add default for Server#isStopping & #getFileSystem** Made defaults for Server#isStopping and Server#getFileSystem. Should have done this when I added them (lesson learned, was actually mentioned in a review). --- * [HBASE-19047](https://issues.apache.org/jira/browse/HBASE-19047) | *Critical* | **CP exposed Scanner types should not extend Shipper** RegionObserver#preScannerOpen signature changed RegionScanner preScannerOpen( ObserverContext\ c, Scan scan, RegionScanner s) -\> void preScannerOpen( ObserverContext\ c, Scan scan) The pre hook can no longer return a RegionScanner instance. --- * [HBASE-18995](https://issues.apache.org/jira/browse/HBASE-18995) | *Critical* | **Move methods that are for internal usage from CellUtil to Private util class** Split CellUtil into public CellUtil and PrivateCellUtil for Internal use only. --- * [HBASE-18906](https://issues.apache.org/jira/browse/HBASE-18906) | *Critical* | **Provide Region#waitForFlushes API** Provided an API in Region (Exposed to CPs) boolean waitForFlushes(long timeout) This call will make the current thread to be waiting for all flushes in this region to be finished. (Upto the time out time being specified). The boolean return value specify whether the flushes are really over or the time out being elapsed. Return false when timeout elapsed but flushes are not over or true when flushes are over --- * [HBASE-18905](https://issues.apache.org/jira/browse/HBASE-18905) | *Major* | **Allow CPs to request flush on Region and know the completion of the requested flush** Add a FlushLifeCycleTracker which is similiar to CompactionLifeCycleTracker for tracking flush. Add a requestFlush method in Region interface to let CP users request flush on a region. The operation is asynchronous, you need to use the FlushLifeCycleTracker to track the flush. The difference with CompactionLifeCycleTracker is that, flush is per region so we do not use Store as a parameter of the methods. And also, notExecuted means the whole flush has not been executed, and afterExecution means the whole flush has been finished, so we do not have a separated completed method. A flush will be ended either by notExecuted or afterExecution. --- * [HBASE-19048](https://issues.apache.org/jira/browse/HBASE-19048) | *Major* | **Cleanup MasterObserver hooks which takes IA private params** Purged InterfaceAudience.Private parameters from methods in MasterObserver. preAbortProcedure no longer takes a ProcedureExecutor. postGetProcedures no longer takes a list of Procedures. postGetLocks no longer takes a list of locks. preRequestLock and postRequestLock no longer take lock type. preLockHeartbeat and postLockHeartbeat no longer takes a lock procedure. The implication is that that the Coprocessors that depended on these params have had to coarsen so for example, the AccessController can not do access per Procedure or Lock but rather, makes a judgement on the general access (You'll need to be ADMIN to see list of procedures and locks). --- * [HBASE-18994](https://issues.apache.org/jira/browse/HBASE-18994) | *Major* | **Decide if META/System tables should use Compacting Memstore or Default Memstore** Added a new config 'hbase.systemtables.compacting.memstore.type" for the system tables. By default all the system tables will have 'NONE' as the type and so it will be using the default memstore by default. {code} \ \hbase.systemtables.compacting.memstore.type\ \NONE\ \ {code} --- * [HBASE-19029](https://issues.apache.org/jira/browse/HBASE-19029) | *Critical* | **Align RPC timout methods in Table and AsyncTableBase** Deprecate the following methods in Table: - int getRpcTimeout() - int getReadRpcTimeout() - int getWriteRpcTimeout() - int getOperationTimeout() Add the following methods to Table: - long getRpcTimeout(TimeUnit) - long getReadRpcTimeout(TimeUnit) - long getWriteRpcTimeout(TimeUnit) - long getOperationTimeout(TimeUnit) Add missing deprecation tag for long getRpcTimeout(TimeUnit unit) in AsyncTableBase --- * [HBASE-18410](https://issues.apache.org/jira/browse/HBASE-18410) | *Major* | **FilterList Improvement.** In this task, we fixed all existing bugs in FilterList, and did the code refactor which ensured interface compatibility . The primary bug fixes are : 1. For sub-filter in FilterList with MUST\_PASS\_ONE, if previous filterKeyValue() of sub-filter returns NEXT\_COL, we cannot make sure that the next cell will be the first cell in next column, because FilterList choose the minimal forward step among sub-filters, and it may return a SKIP. so here we add an extra check to ensure that the next cell will match preivous return code for sub-filters. 2. Previous logic about transforming cell of FilterList is incorrect, we should set the previous transform result (rather than the given cell in question) as the initial vaule of transform cell before call filterKeyValue() of FilterList. 3. Handle the ReturnCodes which the previous code did not handle. About code refactor, we divided the FilterList into two separated sub-classes: FilterListWithOR and FilterListWithAND, The FilterListWithOR has been optimised to choose the next minimal step to seek cell rather than SKIP cell one by one, and the FilterListWithAND has been optimised to choose the next maximal key to seek among sub-filters in filter list. All in all, The code in FilterList is clean and easier to follow now. Note that ReturnCode NEXT\_ROW has been redefined as skipping to next row in current family, not to next row in all family. it’s more reasonable, because ReturnCode is a concept in store level, not in region level. Another bug that needs attention is: filterAllRemaining() in FilterList with MUST\_PASS\_ONE will now return false if the filter list is empty whereas earlier it used to return true for Operator.MUST\_PASS\_ONE. it's more reasonable now. --- * [HBASE-19077](https://issues.apache.org/jira/browse/HBASE-19077) | *Critical* | **Have Region\*CoprocessorEnvironment provide an ImmutableOnlineRegions** Adds getOnlineRegions to the RegionCoprocessorEnvironment (Context) and ditto to RegionServerCoprocessorEnvironment. Allows Coprocessor get list of Regions online on the currently hosting RegionServer. --- * [HBASE-19021](https://issues.apache.org/jira/browse/HBASE-19021) | *Critical* | **Restore a few important missing logics for balancer in 2.0** Re-enabled 'hbase.master.loadbalance.bytable', default 'false'. Draining servers are removed from consideration by blancer.balanceCluster() call. --- * [HBASE-19049](https://issues.apache.org/jira/browse/HBASE-19049) | *Major* | **Update kerby to 1.0.1 GA release** HBase now relies on Kerby version 1.0.1 for its test environment. No downstream facing change is expected. --- * [HBASE-16290](https://issues.apache.org/jira/browse/HBASE-16290) | *Major* | **Dump summary of callQueue content; can help debugging** Patch to print summary of call queues by size and count. This is displayed on the debug dump page of region server UI --- * [HBASE-18846](https://issues.apache.org/jira/browse/HBASE-18846) | *Major* | **Accommodate the hbase-indexer/lily/SEP consumer deploy-type** Makes it so hbase-indexer/lily can move off dependence on internal APIs and instead move to public APIs. Adds being able to disable near-all HRegionServer services. This along with an existing plugin mechanism which allows configuring the RegionServer to host an alternate Connection implementation, makes it so we can put up a cluster of hollowed-out HRegionServers purposed to pose as a Replication Sink for a source HBase Cluster (Users do not need to figure our RPC, our PB encodings, build a distributed service, etc.). In the alternate supplied Connection implementation, hbase-indexer would install its own code to catch the Replication. Below and attached are sample hbase-server.xml files and alternate Connection implementations. To start up an HRegionServer as a sink, first make sure there is a ZooKeeper ensemble we can talk to. If none, just start one: {code} ./bin/hbase-daemon.sh start zookeeper {code} To start up a single RegionServer, put in place the below sample hbase-site.xml and a derviative of the below IndexerConnection on the CLASSPATH, and then start the RegionServer: {code} ./bin/hbase-daemon.sh start org.apache.hadoop.hbase.regionserver.HRegionServer {code} Stdout and Stderr will go into files under configured logs directory. Browse to localhost:16030 to find webui (unless disabled). DETAILS This patch adds configuration to disable RegionServer internal Services, Managers, Caches, etc., starting up. By default a RegionServer starts up an Admin and Client Service. To disable either or both, use the below booleans: {code} hbase.regionserver.admin.service hbase.regionserver.client.service {code} Both default true. To make a HRegionServer startup and stay up without expecting to communicate with a master, set the below boolean to false: {code} hbase.masterless {code] Default is false. h3. Sample hbase-site.xml that disables internal HRegionServer Services Below is an example hbase-site.xml that turns off most Services and that then installs an alternate Connection implementation, one that is nulled out in all regards except in being able to return a "Table" that can catch a Replication Stream in its {code}batch(List\ actions, Object[] results){code} method. i.e. what the hbase-indexer wants. I also add the example alternate Connection implementation below (both of these files are also attached to this issue). Expects there to be an up and running zookeeper ensemble. {code} \ \ Standalone clusters and minicluster instances can now configure the session timeout for our embedded ZooKeeper quorum using `hbase.zookeeper.property.minSessionTimeout` and `hbase.zookeeper.property.maxSessionTimeout`. --- * [HBASE-15806](https://issues.apache.org/jira/browse/HBASE-15806) | *Critical* | **An endpoint-based export tool** org.apache.hadoop.hbase.coprocessor.Export Instructs HBase to dump the contents of table to HDFS in a sequence file + replaces MR by endpoint (see org.apache.hadoop.hbase.mapreduce.Export) + no large data to be transfered between hbase server and client + same command line as org.apache.hadoop.hbase.mapreduce.Export - user needs to alter table for deploying ExportEndpoint - user needs to adjust the endpoint timeout for dumping large data - user needs to get the EXECUTE permission --- * [HBASE-18577](https://issues.apache.org/jira/browse/HBASE-18577) | *Critical* | **shaded client includes several non-relocated third party dependencies** The HBase shaded artifacts (hbase-shaded-client and hbase-shaded-server) no longer contain several non-relocated third party dependency classes that were mistakenly included. Downstream users who relied on these classes being present will need to add a runtime dependency onto an appropriate third party artifact. Previously, we erroneously packaged several third party libs without relocating them. In some cases these libraries have now been relocated; in some cases they are no longer included at all. Includes: * jaxb * jetty * jersey * codahale metrics (HBase 1.4+ only) * commons-crypto * jets3t * junit * curator (HBase 1.4+) * netty 3 (HBase 1.1) * mokito-junit4 (HBase 1.1) There is now testing to ensure that the shaded artifacts only contain expected relocated content. It can be run via `mvn -Dtest=noUnitTests -pl hbase-shaded/hbase-shaded-check-invariants -am -Prelease verify`. For version 2.0+ this patch removes hadoop-mapreduce-client-core from the set of dependencies included for the hbase-client and hbase-shaded-client artifacts. For 2.0+, the slf4j-log4j12 dependency is now optional for both shaded artifacts. --- * [HBASE-14745](https://issues.apache.org/jira/browse/HBASE-14745) | *Blocker* | **Shade the last few dependencies in hbase-shaded-client** Previously some dependencies in hbase-shaded-client were still leaking into the un-shaded namespace. This should now be fixed. Additionally the rat checking on generated intermediate files from shading should be skipped. --- * [HBASE-18665](https://issues.apache.org/jira/browse/HBASE-18665) | *Critical* | **ReversedScannerCallable invokes getRegionLocations incorrectly** Performing reverse scan on tables used the meta cache incorrectly and fetched data from meta table every time. This fix solves this issue and which results in performance improvement for reverse scans. --- * [HBASE-3935](https://issues.apache.org/jira/browse/HBASE-3935) | *Major* | **HServerLoad.storefileIndexSizeMB should be changed to storefileIndexSizeKB** This patch removed the storefile\_index\_size\_MB in protobuf. It will cause the value of storefile\_index\_size\_MB is zero if user still use hbase-client 1.x. --- * [HBASE-18640](https://issues.apache.org/jira/browse/HBASE-18640) | *Major* | **Move mapreduce out of hbase-server into separate hbase-mapreduce module** - Moves all org.apache.hadoop.hbase.mapreduce.\* (except LoadIncrementalHFiles) and org.apache.hadoop.hbase.mapred.\* classes from hbase-server module to new hbase-mapreduce module. - Also moves following tools from hbase-server module to hbase-mapreduce module: CompactionTool, ExportSnapshot, PerformanceEvaluation, LoadTestTool - Very minor breakages in LoadTestTool(LimitedPrivate HBaseInterfaceAudience.TOOLS) --- * [HBASE-18519](https://issues.apache.org/jira/browse/HBASE-18519) | *Major* | **Use builder pattern to create cell** Introduce the CellBuilder helper. 1) Using CellBuilderFactory to get CellBuilder for creating cell with row, column, qualifier, type, and value. 2) For internal use, the ExtendedCellBuilder, which is created by ExtendedCellBuilderFactory, is able to build cell with extra fields - sequence id and tags - --- * [HBASE-18448](https://issues.apache.org/jira/browse/HBASE-18448) | *Minor* | **EndPoint example for refreshing HFiles for stores** Adds a new RefreshHFiles Coprocessor Endpoint example. Includes client and serverside-endpoint that iterates region Stores to call #refreshStoreFiles. --- * [HBASE-18658](https://issues.apache.org/jira/browse/HBASE-18658) | *Major* | **Purge hokey hbase Service implementation; use (internal) Guava Service instead** Removed hbase Service class. It was not fully-formed. Now Guava is relocated, use its Service instead internally; it has nice implementation facility too in AbstractService. --- * [HBASE-15982](https://issues.apache.org/jira/browse/HBASE-15982) | *Blocker* | **Interface ReplicationEndpoint extends Guava's Service** Breaking change to our ReplicationEndpoint and BaseReplicationEndpoint. ReplicationEndpoint implemented Guava 0.12 Service. An abstract subclass, BaseReplicationEndpoint, provided default implementations and facility, among other things, by extending Guava's AbstractService class. Both of these HBase classes were marked LimitedPrivate for REPLICATION so these classes were semi-public and made it so Guava 0.12 was part of our API. Having Guava in our API was a mistake. It anchors us and the implementation of the Interface to Guava 0.12. This is untenable given Guava changes and that the Service Interface in particular has had extensive revamp and improvement done. We can't hold to the Guava Interface. It changed. We can't stay on Guava 0.12; implementors and others on our CLASSPATH won't abide being stuck on an old Guava. So we make breaking changes. The unhitching of our Interface from Guava could only be done in a breaking manner. It undoes the LimitedPrivate on BaseReplicationEndpoint while keeping it for the RE Interface. It means consumers will have to copy/paste the AbstractService-based BRE into their own codebase also supplying their own Guava; HBase no longer 'supplies' this (our Guava usage has been internalized, relocated). This patch then adds into RE the basic methods RE needs of the old Guava Service rather than return a Service to start/stop only to go back to the RE instance to do actual work. A few method names had to be changed so could make implementations with Guava Service internally and not have RE method names and types clash). Semantics remained the same otherwise. For example startAsync and stopAsync in Guava are start and stop in RE. --- * [HBASE-18347](https://issues.apache.org/jira/browse/HBASE-18347) | *Major* | **Implement a BufferedMutator for async client** Introduce an AsyncBufferedMutator for batching requests to HBase for a single table. Use AsyncConnection.getBufferedMutator method to get an AsyncBufferedMutator instance. --- * [HBASE-18546](https://issues.apache.org/jira/browse/HBASE-18546) | *Critical* | **Always overwrite the TS for Append/Increment unless no existing cells are found** If there is no existing cell in submitting Append/Increment, the custom ts won't be overridden. By contrast, the cell's ts will always be overridden by server. --- * [HBASE-18224](https://issues.apache.org/jira/browse/HBASE-18224) | *Critical* | **Upgrade jetty** Moved from Jetty 9.3.x to 9.4.x. Jetty returns more correct HTTP code when Header is too long, 431 instead of 413, and it requires more threads to start up (made default 16 instead of 10). --- * [HBASE-17442](https://issues.apache.org/jira/browse/HBASE-17442) | *Critical* | **Move most of the replication related classes from hbase-client to hbase-replication package** Move replication implementation's classes from hbase-client to hbase-replication package. --- * [HBASE-18653](https://issues.apache.org/jira/browse/HBASE-18653) | *Major* | **Undo hbase2 check against \< hadoop2.6.x; i.e. implement agreed drop of hadoop 2.4 and 2.5 support in hbase2** Change the yetus profile for branch-2 so it no longer runs hadoop 2.4.x and 2.5.x build checks. --- * [HBASE-18630](https://issues.apache.org/jira/browse/HBASE-18630) | *Major* | **Prune dependencies; as is branch-2 has duplicates** Removed doubled instances of javax.inject and commons-beanutils where the versions were close. Other instances of 'double' includes have different groupids so wary pruning especially when transitive includes (hadoop or jetty et al.) --- * [HBASE-18631](https://issues.apache.org/jira/browse/HBASE-18631) | *Minor* | **Allow configuration of ChaosMonkey properties via hbase-site** This change invalidates the need for a separate Java properties file to configure the ChaosMonkey included with HBase. These properties can be provided directly in hbase-site.xml. If configuration in provided in both locations, the Java properties file takes precendence. --- * [HBASE-18489](https://issues.apache.org/jira/browse/HBASE-18489) | *Major* | **Expose scan cursor in RawScanResultConsumer** Add a 'cursor' method which returns an 'Optional\' in 'RawScanResultConsumer.ScanController'. You can use this method to obtain the scan cursor if available. --- * [HBASE-18511](https://issues.apache.org/jira/browse/HBASE-18511) | *Blocker* | **Default no regions on master** Changes the configuration hbase.balancer.tablesOnMaster from list of table names that the can carry (with 'none' meaning no tables on the master) to instead be a boolean that is set to true if master carries tables/regions and false if it does not. If true, the master acts like any regionserver. If false, then the master carries no tables. This is the default for hbase-2.0.0. Another boolean configuration, hbase.balancer.tablesOnMaster.systemTablesOnly, when set to true, enables hbase.balancer.tablesOnMaster and makes it so the master hosts system tables exclusively (the long-time deploy mode of master branch and branch-2 up until this commit). UPDATE: This is broke. See HBASE-19785. UPDATE2: Master carrying Regions does not work reliably, see HBASE-19828. See HBASE-19831, the issue to fix regions on Master The change of hbase.balancer.tablesOnMaster from String list to boolean and the addition of a simple boolean to enable system-tables on Master was done to constrain what operators might ask for via this master configuration. Stipulating what tables are bound to the Master server verges into regionserver grouping territory, a more robust means of specifying table and server combinations. Operators should use this latter if they want layouts more exotic than those supplied by the provided booleans. --- * [HBASE-18553](https://issues.apache.org/jira/browse/HBASE-18553) | *Major* | **Expose scan cursor for asynchronous scanner** The ResultScanner which is gotten from an AsyncTable will also return cursor results if Scan.isNeedCursorResult is true. --- * [HBASE-18598](https://issues.apache.org/jira/browse/HBASE-18598) | *Minor* | **AsyncNonMetaRegionLocator use FIFO algorithm to get a candidate locate request** Introduce FIFO algorithm to get a candidate locate request for AsyncNonMetaRegionLocator. --- * [HBASE-18533](https://issues.apache.org/jira/browse/HBASE-18533) | *Major* | **Expose BucketCache values to be configured** This patch exposes configuration for Bucketcache. These configs are very similar to those for the LRU cache, but are described below: "hbase.bucketcache.single.factor"; /\*\* Single access bucket size \*/ "hbase.bucketcache.multi.factor"; /\*\* Multiple access bucket size \*/ "hbase.bucketcache.memory.factor"; /\*\* In-memory bucket size \*/ "hbase.bucketcache.extrafreefactor"; /\*\* Free this floating point factor of extra blocks when evicting. For example free the number of blocks requested \* (1 + extraFreeFactor) \*/ "hbase.bucketcache.acceptfactor"; /\*\* Acceptable size of cache (no evictions if size \< acceptable) \*/ "hbase.bucketcache.minfactor"; /\*\* Minimum threshold of cache (when evicting, evict until size \< min) \*/ --- * [HBASE-18528](https://issues.apache.org/jira/browse/HBASE-18528) | *Critical* | **DON'T allow user to modify the passed table/column descriptor** **WARNING: No release note provided for this change.** --- * [HBASE-18271](https://issues.apache.org/jira/browse/HBASE-18271) | *Blocker* | **Shade netty** Depend on hbase-thirdparty for our netty instead of directly relying on netty-all. netty is relocated in hbase-thirdparty from io.netty to org.apache.hadoop.hbase.shaded.io.netty. One kink is that netty bundles an .so. Its files also are relocated. So netty can find the .so content, need to specify on command-line a system property telling netty about the shading. The .so trick is from https://stackoverflow.com/questions/33825743/rename-files-inside-a-jar-using-some-maven-plugin In essence we need the below defined whenever we run tests or deploy: -Dorg.apache.hadoop.hbase.shaded.io.netty.packagePrefix=org.apache.hadoop.hbase.shaded. (The trailing '.' is required) See toward the end of this issue for how to pass config: https://github.com/netty/netty/issues/6665 The system property has been added to bin/hbase. If starting hbase with other than bin/hbase, add this system property (at least on linux). For devs, going forward, do not reference io.netty. Reference org.apache.hadoop.hbase.io.netty instead. Here is sample: {code} -import io.netty.channel.Channel; -import io.netty.channel.EventLoop; +import org.apache.hadoop.hbase.shaded.io.netty.channel.Channel; +import org.apache.hadoop.hbase.shaded.io.netty.channel.EventLoop; {code} --- * [HBASE-15511](https://issues.apache.org/jira/browse/HBASE-15511) | *Major* | **ClusterStatus should be able to return responses by scope** Provide a new way to get desired ClusterStatus with a set of ClusterStatus.Option, such that the response back to client can be limited. Note that, the constructor way to new a ClusterStatus will be no longer support after 2.0.0, and use ClusterStatus.Builder instead. --- * [HBASE-18551](https://issues.apache.org/jira/browse/HBASE-18551) | *Major* | **[AMv2] UnassignProcedure and crashed regionservers** Unassign will not proceed if it is unable to talk to the remote server. Now it will expire the server it is unable to communicate with and then wait until it is signaled by ServerCrashProcedure that the server's logs have been split. Only then will judge the unassign successful. We do this because a subsequent assign lacking the crashed server context might open a region w/o first splitting logs. --- * [HBASE-18469](https://issues.apache.org/jira/browse/HBASE-18469) | *Critical* | **Correct RegionServer metric of totalRequestCount** In HBASE-18469 we introduced a new RegionServer metrics in name of "totalRowActionRequestCount" which counts in all row actions and equals to the sum of "readRequestCount" and "writeRequestCount". Meantime, we have changed "totalRequestCount" to count only once for multi request, while previously we will count in action number of the request. As a result, existing monitoring system on totalRequestCount will still work but see a smaller value, and we strongly recommend to change to use the new metrics to monitor server load. --- * [HBASE-18500](https://issues.apache.org/jira/browse/HBASE-18500) | *Major* | **Performance issue: Don't use BufferedMutator for HTable's put method** Remove the deprecated method get/setWriteBufferSize from Table and remove writeBufferSize from TableBuilder. Remove the BufferedMutatorImpl from HTable. --- * [HBASE-18387](https://issues.apache.org/jira/browse/HBASE-18387) | *Minor* | **[Thrift] Make principal configurable in DemoClient.java** This change allows the demonstration Thrift client to customize the server principal used by the Thrift server for instances secured with Kerberos. --- * [HBASE-17125](https://issues.apache.org/jira/browse/HBASE-17125) | *Critical* | **Inconsistent result when use filter to read data** Marked Scan and Get's setMaxVersions() and setMaxVersions(int) as deprecated. They are easy to misunderstand with column family's max versions, so use readAllVersions() and readVersions(int) instead. --- * [HBASE-18492](https://issues.apache.org/jira/browse/HBASE-18492) | *Major* | **[AMv2] Embed code for selecting highest versioned region server for system table regions in AssignmentManager.processAssignQueue()** Favors new servers over older versions when assigning system table regions (more to follow in this area; i.e. changes in the AM itself). --- * [HBASE-18517](https://issues.apache.org/jira/browse/HBASE-18517) | *Major* | **limit max log message width in log4j** Sets a log length max of 1000 characters. --- * [HBASE-18502](https://issues.apache.org/jira/browse/HBASE-18502) | *Critical* | **Change MasterObserver to use TableDescriptor and ColumnFamilyDescriptor** The methods which change to use TableDescriptor/ColumnFamilyDescriptor are shown below. + preCreateTable( ObserverContext,TableDescriptor, HRegionInfo[]) + postCreateTable(ObserverContext ,TableDescriptor, HRegionInfo[]) + preCreateTableAction(ObserverContext, TableDescriptor,HRegionInfo[]) + postCompletedCreateTableAction(ObserverContext,TableDescriptor,HRegionInfo[]) + preModifyTable(ObserverContext,TableName, TableDescriptor) + postModifyTable(ObserverContext,TableName, TableDescriptor) + preModifyTableAction( ObserverContext,TableName,TableDescriptor) + postCompletedModifyTableAction( ObserverContext,TableName,TableDescriptor) + preAddColumnFamily(ObserverContext,TableName, ColumnFamilyDescriptor) + postAddColumnFamily(ObserverContext,TableName, ColumnFamilyDescriptor) + preAddColumnFamilyAction(ObserverContext,TableName,ColumnFamilyDescriptor) + postCompletedAddColumnFamilyAction(ObserverContext,TableName, ColumnFamilyDescriptor) + preModifyColumnFamily(ObserverContext,TableName, ColumnFamilyDescriptor) + preModifyColumnFamilyAction(ObserverContext\,TableName,ColumnFamilyDescriptor) + preCloneSnapshot(ObserverContext\,SnapshotDescription,TableDescriptor) + postCloneSnapshot(ObserverContext\,SnapshotDescription,TableDescripto) + preRestoreSnapshot(ObserverContext\,List\, List\,String) + postGetTableDescriptors(ObserverContext\,List\, List\,String) + preGetTableNames(ObserverContext\,List\, String) + postGetTableNames(ObserverContext\,List\, String) --- * [HBASE-18520](https://issues.apache.org/jira/browse/HBASE-18520) | *Minor* | **Add jmx value to determine true Master Start time** This JIRA adds a JMX value to track when the Master has finished initializing. The jmx config is 'masterFinishedInitializationTime' and details the time in millis that the Master is fully usable and ready to serve requests. --- * [HBASE-17056](https://issues.apache.org/jira/browse/HBASE-17056) | *Critical* | **Remove checked in PB generated files** Purge all checked in generated protobuf files (30MB). Generate protobuf files inline with the build. Remove checked-in and patched protobuf. Get it from new hbase-thirdparty instead. Side-effect: Our protobuf went from 3.1.0 to 3.3.1. Build does not take noticeably longer (still about 2.5 minutes to do a mvn clean install -DskipTests). IDEs will probably require a mvn build first else they'll complain about missing (generated) files. --- * [HBASE-18374](https://issues.apache.org/jira/browse/HBASE-18374) | *Major* | **RegionServer Metrics improvements** This change adds the latency metrics checkAndPut, checkAndDelete, putBatch and deleteBatch . Also the previous regionserver "mutate" latency metrics are renamed to "put" metrics. Batch metrics capture the latency of the entire batch containing put/delete whereas put/delete metrics capture latency per operation. Note this change will break existing monitoring based on regionserver "mutate" latency metric. --- * [HBASE-18023](https://issues.apache.org/jira/browse/HBASE-18023) | *Minor* | **Log multi-\* requests for more than threshold number of rows** HBASE-18023 introduces a warning message in the RegionServer log when an RPC is received from a client that has more than 5000 "actions" (where an "action" is a collection of mutations for a specific row) in a single RPC. Misbehaving clients who send large RPCs to RegionServers can be malicious, causing temporary pauses via garbage collection or denial of service via crashes. The threshold of 5000 actions per RPC is defined by the property "hbase.rpc.rows.warning.threshold" in hbase-site.xml. --- * [HBASE-15968](https://issues.apache.org/jira/browse/HBASE-15968) | *Major* | **New behavior of versions considering mvcc and ts rather than ts only** This issue resolved two long-term issues in HBase: Puts may be masked by a delete before them. Major compactions change query results. This issue offer a new behavior to fix this issue with a little performance reduction. Set NEW\_VERSION\_BEHAVIOR to true to enable this feature in CF level. See HBASE-15968 for details. Note if you enable this feature, the order of Mutations matters. But replication will disorder the entries by default. So you have to enable serial replication if you have slave clusters. See HBASE-9465 for details. --- * [HBASE-18107](https://issues.apache.org/jira/browse/HBASE-18107) | *Major* | **[AMv2] Remove DispatchMergingRegionsRequest & DispatchMergingRegions** Removes merge region code added into branch-2 but that was not needed after all. Branch-2 replaced dispatchMergingRegions with MergeTableRegionsProcedure. Removed: # dispatchMergingRegions from Connection (was superceded long ago in branch-1). # mergeRegions from RsRpcServices (was not used). --- * [HBASE-15816](https://issues.apache.org/jira/browse/HBASE-15816) | *Major* | **Provide client with ability to set priority on Operations** Added setPriority(int priority) API to Put, Delete, Increment, Append, Get and Scan pojos. So for all these ops, the user can provide a custom priority level. --- * [HBASE-18430](https://issues.apache.org/jira/browse/HBASE-18430) | *Major* | **Typo in "contributing to documentation" page** Pushed to {{master}}. Thanks, Coral! Congratulations on your first Apache HBase commit! --- * [HBASE-17908](https://issues.apache.org/jira/browse/HBASE-17908) | *Critical* | **Upgrade guava** Use relocated guava 22.0 gotten from the new hbase-thirdparty ancillary project. Incompatible change. ReplicationEndpoint and subclasses extend guava Service which changed pretty radically between 12.0 and 22.0. Change is kosher because implementations are marked audience private. Still, this will likely cause grief for the likes of the downstream lily indexer. --- * [HBASE-16993](https://issues.apache.org/jira/browse/HBASE-16993) | *Major* | **BucketCache throw java.io.IOException: Invalid HFile block magic when configuring hbase.bucketcache.bucket.sizes** Any value for hbase.bucketcache.bucket.sizes configuration to be multiple of 256. If that is not the case, instantiation of L2 Bucket cache itself will fail throwing IllegalArgumentException. --- * [HBASE-16090](https://issues.apache.org/jira/browse/HBASE-16090) | *Major* | **ResultScanner is not closed in SyncTable#finishRemainingHashRanges()** pushed to 1.3 and 1.2. SyncTable was introduced in 1.2, so skipping 1.1. --- * [HBASE-18332](https://issues.apache.org/jira/browse/HBASE-18332) | *Minor* | **Upgrade asciidoctor-maven-plugin** Committed to master and branch-2. Thanks! --- * [HBASE-18161](https://issues.apache.org/jira/browse/HBASE-18161) | *Minor* | **Incremental Load support for Multiple-Table HFileOutputFormat** In order to use this feature, a user must 
 1. Register their tables when configuring their job 
2. Create a composite key of the tablename and original rowkey to send as the mapper output key. 

To register their tables (and configure their job for incremental load into multiple tables), a user must call the static MultiHFileOutputFormat.configureIncrementalLoad function to register the HBase tables that will be ingested into. 

 To create the composite key, a helper function MultiHFileOutputFormat2.createCompositeKey should be called with the destination tablename and rowkey as arguments, and the result should be output as the mapper key. 
Before this JIRA, for HFileOutputFormat2 a configuration for the storage policy was set per Column Family. This was set manually by the user. In this JIRA, this is unchanged when using HFileOutputFormat2. However, when specifically using MultiHFileOutputFormat2, the user now has to manually set the prefix by creating a composite of the table name and the column family. The user can create the new composite value by calling MultiHFileOutputFormat2.createCompositeKey with the tablename and column family as arguments. Changes added through this JIRA are backwards compatible with existing HFileOutputFormat2 apis and functionality. The configuration parameter "hbase.mapreduce.hfileoutputformat.table.name" is now a REQUIRED parameter though it is normally set automatically when configureIncrementalLoad method is called within HFileOutputFormat2 --- * [HBASE-18229](https://issues.apache.org/jira/browse/HBASE-18229) | *Critical* | **create new Async Split API to embrace AM v2** A new splitRegionAsync() API is added in client. The existing splitRegion() and split() API will call the new API so client does not have to change its code. Move HBaseAdmin.splitXXX() logic to master, client splitXXX() API now go to master directly instead of going to RegionServer first. Also added splitSync() API --- * [HBASE-18339](https://issues.apache.org/jira/browse/HBASE-18339) | *Major* | **Update test-patch to use hadoop 3.0.0-alpha4** HBase now defaults to Apache Hadoop 3.0.0-alpha4 when the Hadoop 3 profile is active. --- * [HBASE-18267](https://issues.apache.org/jira/browse/HBASE-18267) | *Major* | **The result from the postAppend is ignored** **WARNING: No release note provided for this change.** --- * [HBASE-18307](https://issues.apache.org/jira/browse/HBASE-18307) | *Major* | **Share the same EventLoopGroup for NettyRpcServer, NettyRpcClient and AsyncFSWALProvider at RS side** There are two configuration name changes as the event loop configs will not only effect rpc server but be shared by different components in the same RS instance. 'hbase.rpc.server.nativetransport' -\> 'hbase.netty.nativetransport' 'hbase.netty.rpc.server.worker.count' -\> 'hbase.netty.worker.count' --- * [HBASE-18241](https://issues.apache.org/jira/browse/HBASE-18241) | *Critical* | **Change client.Table, client.Admin, Region, Store, and HBaseTestingUtility to not use HTableDescriptor or HColumnDescriptor** - : removed API + : new API \* : deprecated API --------------------------- Region class - HTableDescriptor getTableDesc() +TableDescriptor getTableDescriptor() Store class - HColumnDescriptor getFamily() + ColumnFamilyDescriptor getColumnFamilyDescriptor() Table class \* HTableDescriptor getTableDescriptor() + TableDescriptor getDescriptor()\| \*Admin class\* \* HTableDescriptor getTableDescriptor(TableName) + List\ listTableDescriptor(TableName)\| \* HTableDescriptor[] getTableDescriptors(List\) \* HTableDescriptor[] getTableDescriptorsByTableName(List\) + List\ listTableDescriptors(List\) \* HTableDescriptor[] listTables() + List\ listTableDescriptors() \* HTableDescriptor[] listTables(Pattern) + List\ listTableDescriptors(Pattern) \* HTableDescriptor[] listTables(String) + List\ listTableDescriptors(String) \* HTableDescriptor[] listTables(Pattern, boolean) + List\ listTableDescriptors(Pattern, boolean) \* HTableDescriptor[] listTables(String, boolean) + List\ listTableDescriptors(String, boolean) \* HTableDescriptor[] deleteTables(String) \* HTableDescriptor[] deleteTables(Pattern) \* HTableDescriptor[] enableTables(String) \* HTableDescriptor[] enableTables(Pattern) \* HTableDescriptor[] disableTables(String) \* HTableDescriptor[] disableTables(Pattern) \* void modifyTable(TableName, HTableDescriptor) + void modifyTable(TableDescriptor) \* void modifyTableAsync(TableName, HTableDescriptor) + void modifyTableAsync(TableDescriptor) \* HTableDescriptor[] listTableDescriptorsByNamespace(String) + List\ listTableDescriptorsByNamespace(byte[]) \* void createTable(HTableDescriptor) + void createTable(TableDescriptor) \* void createTable(HTableDescriptor, byte[], byte[], int) + void createTable({color:red}TableDescriptor, byte[], byte[], int) \* void createTable(HTableDescriptor, byte[][]) + void createTable(TableDescriptor, byte[][]) \* Future\ createTableAsync(HTableDescriptor, byte[][]) + Future\ createTableAsync(TableDescriptor, byte[][]) \*HBaseTestingUtility class\* \* Table createTable(HTableDescriptor, byte[][], Configuration) + Table createTable(TableDescriptor, byte[][], Configuration) \* Table createTable(HTableDescriptor, byte[][], byte[][], Configuration) + Table createTable(TableDescriptor, byte[][], byte[][], Configuration) \* public Table createTable(HTableDescriptor, byte[][]) + public Table createTable(TableDescriptor, byte[][]) \* void modifyTableSync(Admin, HTableDescriptor) + void modifyTableSync(Admin, TableDescriptor) \* HRegion createLocalHRegion(HTableDescriptor, byte [], byte []) + HRegion createLocalHRegion(TableDescriptor, byte [], byte []) \* HRegion createLocalHRegion(HRegionInf, HTableDescriptor) + HRegion createLocalHRegion(HRegionInf, TableDescriptor) \* HRegion createLocalHRegion(HRegionInfo, HTableDescriptor, WAL) + HRegion createLocalHRegion(HRegionInfo, TableDescriptor, WAL) \* List createMultiRegionsInMeta(final Configuration, HTableDescriptor, byte [][]) + List createMultiRegionsInMeta(final Configuration, TableDescriptor, byte [][]) \* HRegion createRegionAndWAL(HRegionInfo, Path, Configuration, HTableDescriptor) + HRegion createRegionAndWAL(HRegionInfo, Path, Configuration, TableDescriptor) \* HRegion createRegionAndWAL(HRegionInfo, Pat, Configuration, HTableDescriptor, boolean) + HRegion createRegionAndWAL(HRegionInfo, Pat, Configuration, TableDescriptor, boolean) \* int createPreSplitLoadTestTable(Configuration,HTableDescriptor, HColumnDescriptor) + int createPreSplitLoadTestTable(Configuration,TableDescriptor, ColumnFamilyDescriptor) \* int createPreSplitLoadTestTable(Configuration, HTableDescriptor, HColumnDescriptor, int) + int createPreSplitLoadTestTable(Configuration, TableDescriptor, ColumnFamilyDescriptor, int) \* int createPreSplitLoadTestTable(Configuration, HTableDescriptor, HColumnDescriptor[], int) + int createPreSplitLoadTestTable(Configuration, TableDescriptor, ColumnFamilyDescriptor[], int) \* int createPreSplitLoadTestTable(Configuration,HTableDescriptor, HColumnDescriptor[],SplitAlgorithm, int) + int createPreSplitLoadTestTable(Configuration,TableDescriptor, ColumnFamilyDescriptor[],SplitAlgorithm, int) \* HRegion createTestRegion(String, HColumnDescriptor) + HRegion createTestRegion(String, ColumnFamilyDescriptor) --- * [HBASE-18083](https://issues.apache.org/jira/browse/HBASE-18083) | *Major* | **Make large/small file clean thread number configurable in HFileCleaner** After HBASE-18083 we could configure HFileCleaner to use multiple threads for large/small (archived) hfile cleaning with hbase.regionserver.hfilecleaner.large.thread.count and hbase.regionserver.hfilecleaner.small.thread.count, both default to 1. These properties support online configuration change. --- * [HBASE-17931](https://issues.apache.org/jira/browse/HBASE-17931) | *Blocker* | **Assign system tables to servers with highest version** We usually keep compatibility between old client and new server so we can do rolling upgrade, HBase cluster first, then HBase client. But we don't guarantee new client can access old server. In an HBase cluster, we have system tables and region servers will access these tables so for servers they are also an HBase client. So if the system tables are in region servers with lower version we may get trouble because region servers with higher version may can not access them. After this patch, we will move all system regions to region servers with highest version. So when we do a rolling upgrade across two major or minor versions, we should ALWAYS UPGRADE MASTER FIRST and then upgrade region servers. The new master will handle system tables correctly. --- * [HBASE-6581](https://issues.apache.org/jira/browse/HBASE-6581) | *Major* | **Build with hadoop.profile=3.0** Make us build against hadoop trunk (3.0) --- * [HBASE-16120](https://issues.apache.org/jira/browse/HBASE-16120) | *Minor* | **Add shell test for truncate\_preserve** Add unit tests for truncate\_preserve --- * [HBASE-18240](https://issues.apache.org/jira/browse/HBASE-18240) | *Major* | **Add hbase-thirdparty, a project with hbase utility including an hbase-shaded-thirdparty module with guava, netty, etc.** Adds a new project, hbase-thirdparty, at https://git-wip-us.apache.org/repos/asf/hbase-thirdparty used by core hbase. GroupID org.apache.hbase.thirdparty. Version 1.0.0. This project packages relocated third-party libraries used by Apache HBase such as protobuf, guava, and netty among others. HBase core depends on it. It has threre submodules, one to patch and then relocate (shade) protobuf, and one to do messy .so renaming (netty). The remainder module relocates a bundle of other (unpatched) libs used by hbase. This latter set includes protobuf-util, netty-all, gson, and guava. All shading is done using the same relocation offset of org.apache.hadoop.hbase.shaded; we add this prefix to the relocated thirdparty library class names. See the pom.xml in hbase-thirdparty for the explicit version of each third-party lib included (of note, we update out internal protobuf from 3.1.0 to 3.3.1). --- * [HBASE-15943](https://issues.apache.org/jira/browse/HBASE-15943) | *Major* | **Add page displaying JVM process metrics** Adds new "Process Metrics' tab along the top which leads to new page that dumps mbean -- mostly jvm -- metrics --- * [HBASE-14902](https://issues.apache.org/jira/browse/HBASE-14902) | *Major* | **Revert some of the stringency recently introduced by checkstyle tightening** Changes the checkstyle so that on a continuation line for javadoc, instead of default four spaces, instead now it is two spaces. Also one line statements as in if (true) x =1; now pass checkstyle. --- * [HBASE-17110](https://issues.apache.org/jira/browse/HBASE-17110) | *Major* | **Improve SimpleLoadBalancer to always take server-level balance into account** After HBASE-17110 the bytable strategy for SimpleLoadBalancer will also take server level balance into account --- * [HBASE-17928](https://issues.apache.org/jira/browse/HBASE-17928) | *Major* | **Shell tool to clear compaction queues** Adds clear\_compaction\_queues to the hbase shell. {code} Clear compaction queues on a regionserver. The queue\_name contains short and long. short is shortCompactions's queue,long is longCompactions's queue. Examples: hbase\> clear\_compaction\_queues 'host187.example.com,60020' hbase\> clear\_compaction\_queues 'host187.example.com,60020','long' hbase\> clear\_compaction\_queues 'host187.example.com,60020', ['long','short'] {code} --- * [HBASE-18164](https://issues.apache.org/jira/browse/HBASE-18164) | *Critical* | **Much faster locality cost function and candidate generator** New locality cost function and candidate generator that use caching and incremental computation to allow the stochastic load balancer to consider ~20x more cluster configurations for big clusters. --- * [HBASE-18226](https://issues.apache.org/jira/browse/HBASE-18226) | *Major* | **Disable reverse DNS lookup at HMaster and use the hostname provided by RegionServer** The following config is added by this JIRA: hbase.regionserver.hostname.disable.master.reversedns This config is for experts: don't set its value unless you really know what you are doing. When set to true, regionserver will use the current node hostname for the servername and HMaster will skip reverse DNS lookup and use the hostname sent by regionserver instead. Note that this config and hbase.regionserver.hostname are mutually exclusive. See https://issues.apache.org/jira/browse/HBASE-18226 for more details. Caution: please make sure rolling upgrade succeeds before turning on this feature. --- * [HBASE-16242](https://issues.apache.org/jira/browse/HBASE-16242) | *Major* | **Upgrade Avro to 1.7.7** Apache HBase now specifies that version 1.7.7 of the Apache Avro library should be pulled in by maven and included in the convenience binary tarball. --- * [HBASE-18213](https://issues.apache.org/jira/browse/HBASE-18213) | *Major* | **Add documentation about the new async client** Add documentation for async client in section '66. Client' in ref guide. --- * [HBASE-17008](https://issues.apache.org/jira/browse/HBASE-17008) | *Critical* | **Examples to make AsyncClient go down easy** Add two examples for async client. AsyncClientExample is a simple example to show you how to use AsyncTable. HttpProxyExample is an example for advance user to show you how to use RawAsyncTable to write a fully asynchronous HTTP proxy server. There is no extra thread pool, all operations are executed inside netty's event loop. --- * [HBASE-18200](https://issues.apache.org/jira/browse/HBASE-18200) | *Major* | **Set hadoop check versions for branch-2 and branch-2.x in pre commit** Allow setting different hadoop check versions for branch-2 and branch-2.x when running pre commit check. --- * [HBASE-18187](https://issues.apache.org/jira/browse/HBASE-18187) | *Major* | **Release hbase-2.0.0-alpha1** Pushed the release. For detail: http://apache-hbase.679495.n3.nabble.com/ANNOUNCE-Apache-HBase-2-0-0-alpha-1-is-now-available-for-download-td4088484.html --- * [HBASE-18137](https://issues.apache.org/jira/browse/HBASE-18137) | *Critical* | **Replication gets stuck for empty WALs** 0-length WAL files can potentially cause the replication queue to get stuck. A new config "replication.source.eof.autorecovery" has been added: if set to true (default is false), the 0-length WAL file will be skipped after 1) the max number of retries has been hit, and 2) there are more WAL files in the queue. The risk of enabling this is that there is a chance the 0-length WAL file actually has some data (e.g. block went missing and will come back once a datanode is recovered). --- * [HBASE-18192](https://issues.apache.org/jira/browse/HBASE-18192) | *Blocker* | **Replication drops recovered queues on region server shutdown** If a region server that is processing recovered queue for another previously dead region server is gracefully shut down, it can drop the recovered queue under certain conditions. Running without this fix on a 1.2+ release means possibility of continuing data loss in replication, irrespective of which WALProvider is used. If a single WAL group (or DefaultWALProvider) is used, running without this fix will always cause dataloss in replication whenever a region server processing recovered queues is gracefully shutdown. --- * [HBASE-18109](https://issues.apache.org/jira/browse/HBASE-18109) | *Critical* | **Assign system tables first (priority)** Adds a sort of procedures before submission so system tables are queued first (which will help ensure they go out first). This should be good enough along w/ existing scheduling mechanisms to ensure system/meta are assigned first (See reasoning below). Open new issue if insufficient. --- * [HBASE-18008](https://issues.apache.org/jira/browse/HBASE-18008) | *Major* | **Any HColumnDescriptor we give out should be immutable** 1) The HColumnDescriptor got from Admin, AsyncAdmin, and Table is immutable. 2) HColumnDescriptor have been marked as "Deprecated" and user should substituted ColumnFamilyDescriptor for HColumnDescriptor. 3) ColumnFamilyDescriptor is constructed through ColumnFamilyDescriptorBuilder and it contains all of the read-only methods from HColumnDescriptor 4) The value to which the IS\_MOB/MOB\_THRESHOLD is mapped is stored as String rather than Boolean/Long. The MOB is an new feature to 2.0 so this change should be acceptable --- * [HBASE-18149](https://issues.apache.org/jira/browse/HBASE-18149) | *Major* | **The setting rules for table-scope attributes and family-scope attributes should keep consistent** If the table-scope attributes value is false, you need not to enclose 'false' in single quotation.Both COMPACTION\_ENABLED =\> false and COMPACTION\_ENABLED =\> 'false' will take effect --- * [HBASE-17849](https://issues.apache.org/jira/browse/HBASE-17849) | *Major* | **PE tool random read is not totally random** When randomRead and randomSeekScan is used with PE tool, now we allow using both --size and --rows. The --size specifies the total size of the data (the range) on which the reads should be performed and --rows specifies the number of rows to be read by each client with in that range. --- * [HBASE-15576](https://issues.apache.org/jira/browse/HBASE-15576) | *Major* | **Scanning cursor to prevent blocking long time on ResultScanner.next()** If you don't like scanning being blocked too long because of heartbeat and partial result, you can use Scan#setNeedCursorResult(true) to get a special result within scanning timeout setting time which will tell you where row the server is scanning. See its javadoc for more details. --- * [HBASE-16549](https://issues.apache.org/jira/browse/HBASE-16549) | *Major* | **Procedure v2 - Add new AM metrics** Following AMv2 procedures are modified to override onSubmit(), onFinish() hooks provided by HBASE-17888 to do metrics calculations when procedures are submitted and finshed: \* AssignProcedure \* UnassignProcedure \* MergeTableRegionProcedure \* SplitTableRegionProcedure \* ServerCrashProcedure Following metrics is collected for each of the above procedure during lifetime of a process: \* Total number of requests submitted for a type of procedure \* Histogram of runtime in milliseconds for successfully completed procedures \* Total number of failed procedures As we are moving away from Hadoop's metric2, hbase-metrics-api module is used for newly added metrics. --- * [HBASE-9393](https://issues.apache.org/jira/browse/HBASE-9393) | *Critical* | **Hbase does not closing a closed socket resulting in many CLOSE\_WAIT** To handle this issue client need to have Hadoop client 2.6.4 or 2.7.0+ Hadoop version as CanUnBuffer interface which was added as part of HDFS-7694 is available in only those versions. --- * [HBASE-18038](https://issues.apache.org/jira/browse/HBASE-18038) | *Critical* | **Rename StoreFile to HStoreFile and add a StoreFile interface for CP** StoreFile is now changed to an interface. This is an incompatible change. The coprocessors which implement RegionObserver may need to modify their code. --- * [HBASE-16196](https://issues.apache.org/jira/browse/HBASE-16196) | *Critical* | **Update jruby to a newer version.** The bundled JRuby 1.6.8 has been updated to version 9.1.9.0. The represents a change from Ruby 1.8 to Ruby 2.3.3, which introduces non-compatible language changes for user scripts. This JRuby version update required an update to joni-2.1.11 and jcodings-1.0.18, used for regular expression matching, as well as several transitive dependency updates that should not be user-visible. --- * [HBASE-14614](https://issues.apache.org/jira/browse/HBASE-14614) | *Major* | **Procedure v2: Core Assignment Manager** Replaces the AssignmentManager with a new procedurev2-based AssignmentManager h1. AMv2 Puts AssignmentManager up on top of the ProcedureV2 state machine with persistence engine. Each assignment atom is now a Procedure implementation; e.g. an AssignProcedure and an UnassignProcedure. Molecules of aggregated Procedures are used to do more involved assignment steps: e.g. the move region procedure is made of an Unassign followed by an Assign subprocedure. AMv2 is 1500 lines. Old AM was near 4000. Functionality has been moved out to Procedures. In-memory states of regions and servers has been cleaned up stored in new RegionStates implementation. RegionStateStore takes care of publishing final region state out to the hbase:meta table. New RemoteProcedureDispatcher/RSProcedureDispatcher runs the Procedure-based assignments ‘remotely’. Knows about ‘servers’. Does aggregation of assignments by time on a time/count basis so can send procedures in batches rather than one per RPC. Procedure status comes back on the back of the RegionServer heartbeat reporting online regions. The response is passed to the AMv2 to ‘process’. It will check against the in-memory state. If there is a mismatch, it fences out the RegionServer on the assumption that something went wrong on the RS side.Timeouts trigger retries. The Procedure machine ensures only one operation at a time on any one region/table using locking and smarts about what is serial and what can be run concurrently. New accounting of RegionServer version will be used running rolling restarts. ‘States’ -- OPENING, CLOSING, etc. -- are now in-memory in-the-master only serialized out to the ProcedureV2 WAL. They are no longer persisted to ZooKeeper. h2. Assign Detail The Assign starts by pushing the "assign" operation to the AssignmentManager and then will go into a “waiting" state. The AM will batch the "assign" requests and ask the Balancer where to put the region (the various policies will be respected: retain, round-robin, random). Once the AM and the balancer have found a place for the region, the procedure will be resumed and an "open region" request will be placed in the Remote Dispatcher queue, and the procedure once again will go into a "waiting state". The Remote Dispatcher will batch the various requests for that server and they will be sent to the RS for execution. The RS will complete the open operation by calling master.reportRegionStateTransition(). The AM will intercept the transition report, and notify the procedure. The procedure will finish the assignment by publishing to new state on hbase:meta or it will retry the assignment. h3. Unassign Detail The Unassign starts by placing a "close region" request in the Remote Dispatcher queue, and the procedure will then go into a "waiting state". The Remote Dispatcher will batch the various requests for that server and they will be sent to the RS for execution. The RS will complete the open operation by calling master.reportRegionStateTransition(). The AM will intercept the transition report, and notify the procedure. The procedure will finish the unassign by publishing its new state on meta or it will retry the unassign. h1. New Configs \* "hbase.procedure.remote.dispatcher.threadpool.size" defaults 128 \* "hbase.procedure.remote.dispatcher.delay.msec" default 150ms \* "hbase.procedure.remote.dispatcher.max.queue.size" with default 32 \* "hbase.regionserver.rpc.startup.waittime" with default 60 seconds. h1. TODO As of this writing. Put up a model diagram. \* Handle region migration \* Handle meta assignment first \* Handle sys table assignment first (e.g. acl, namespace) \* Handle table priorities \* Do we report same AM metrics as we used too? We do it all in here now. INCOMPATIBLE A known incompatible is that because splits and merges are now run from the master, Coprocessors that used to watch for merge/split from a RegionObserver now no longer work; to watch split/merges, you need to have an observer on the Master instead. --- * [HBASE-3462](https://issues.apache.org/jira/browse/HBASE-3462) | *Major* | **Fix table.jsp in regards to splitting a region/table with an optional splitkey** UI pages for splitting/merging now operate by taking a row key prefix from the user rather than a full region name. --- * [HBASE-18129](https://issues.apache.org/jira/browse/HBASE-18129) | *Major* | **truncate\_preserve fails when the truncate method doesn't exists on the master** The command truncate\_preserve will be fine when the truncate method doesn't exist on the master --- * [HBASE-18122](https://issues.apache.org/jira/browse/HBASE-18122) | *Major* | **Scanner id should include ServerName of region server** The scanner id is not from 1 anymore. The first 32 bits are MurmurHash32 of ServerName string "host,port,ts". The ServerName contains both host, port, and start timestamp so it can prevent collision. The lowest 32bit is generated by atomic int. --- * [HBASE-17997](https://issues.apache.org/jira/browse/HBASE-17997) | *Major* | **In dev environment, add jruby-complete jar to classpath only when jruby is needed** When JRUBY\_HOME is specified, if the command is "hbase shell" or "hbase org.jruby.Main", CLASSPATH and HBASE\_OPTS will be updated according to JRUBY\_HOME specified \* Jar under JRUBY\_HOME is added to CLASSPATH \* The following will be added into HBASE\_OPTS -Djruby.home=$JRUBY\_HOME -Djruby.lib=$JRUBY\_HOME/lib That is, as long as JRUBY\_HOME is specified, JRUBY\_HOME specified will take precedence. \* In dev env, the jar recorded in cached\_classpath\_jruby.txt will be ignored \* In non dev env, jruby-complete jar packaged with HBase will be ignored --- * [HBASE-15616](https://issues.apache.org/jira/browse/HBASE-15616) | *Major* | **Allow null qualifier for all table operations** After this issue, all table operations will support null qualifier, such as put/get/scan/increment/append/checkAndMutate/checkAndPut/checkAndDelete. --- * [HBASE-18035](https://issues.apache.org/jira/browse/HBASE-18035) | *Critical* | **Meta replica does not give any primaryOperationTimeout to primary meta region** When a client is configured to use meta replica, it sends scan request to all meta replicas almost at the same time. Since meta replica contains stale data, if result from one of replica comes back first, the client may get wrong region locations. To fix this, "hbase.client.meta.replica.scan.timeout" is introduced, a client will always send to primary meta region first, wait the configured timeout for reply. If no result is received, it will send request to replica meta regions. The unit for "hbase.client.meta.replica.scan.timeout" is microsecond, the default value is 1000000 (1 second). --- * [HBASE-11013](https://issues.apache.org/jira/browse/HBASE-11013) | *Major* | **Clone Snapshots on Secure Cluster Should provide option to apply Retained User Permissions** While creating a snapshot, it will save permissions of the original table into .snapshotinfo file(Backward compatibility) , which is in the snapshot root directory. For clone\_snapshot/restore\_snapshot command, we provide an additional option( RESTORE\_ACL) to decide whether we will grant permissons of the origin table to the newly created table. --- * [HBASE-18018](https://issues.apache.org/jira/browse/HBASE-18018) | *Major* | **Support abort for all procedures by default** The default behavior for abort() method of StateMachineProcedure class is changed to support aborting all procedures irrespective of if procedure supports rollback or not. --- * [HBASE-16851](https://issues.apache.org/jira/browse/HBASE-16851) | *Major* | **User-facing documentation for the In-Memory Compaction feature** Two blog posts on Apache HBase blog: user manual and programmer manual. Ref. guide draft published: https://docs.google.com/document/d/1Xi1jh\_30NKnjE3wSR-XF5JQixtyT6H\_CdFTaVi78LKw/edit --- * [HBASE-17343](https://issues.apache.org/jira/browse/HBASE-17343) | *Blocker* | **Make Compacting Memstore default in 2.0 with BASIC as the default type** This JIRA changes the default MemStore to be CompactingMemStore instead of DefaultMemStore. In-memory compaction of CompactingMemStore demonstrated sizable improvement in HBase’s write amplification and read/write performance. CompactingMemStore achieves these gains through smart use of RAM. The algorithm periodically re-organizes the in-memory data in efficient data structures and reduces redundancies. The HBase server’s memory footprint therefore periodically expands and contracts. The outcome is longer lifetime of data in memory, less I/O, and overall faster performance. More details about the algorithm and its use appear in the Apache HBase Blog: https://blogs.apache.org/hbase/ How To Use: The in-memory compaction level can be configured both globally and per column family. The supported levels are none (DefaultMemStore), basic, and eager. By default, all tables apply basic in-memory compaction. This global configuration can be overridden in hbase-site.xml, as follows: \ \hbase.hregion.compacting.memstore.type\ \\\ \ The level can also be configured in the HBase shell per column family, as follows: create ‘\’, {NAME =\> ‘\’, IN\_MEMORY\_COMPACTION =\> ‘\’} --- * [HBASE-17786](https://issues.apache.org/jira/browse/HBASE-17786) | *Major* | **Create LoadBalancer perf-tests (test balancer algorithm decoupled from workload)** $ bin/hbase org.apache.hadoop.hbase.master.balancer.LoadBalancerPerformanceEvaluation -help usage: hbase org.apache.hadoop.hbase.master.balancer.LoadBalancerPerformanceEvaluation \ Options: -regions \ Number of regions to consider by load balancer. Default: 1000000 -servers \ Number of servers to consider by load balancer. Default: 1000 -load\_balancer \ Type of Load Balancer to use. Default: org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer --- * [HBASE-17887](https://issues.apache.org/jira/browse/HBASE-17887) | *Blocker* | **Row-level consistency is broken for read** Now we pass on list of memstoreScanners to the StoreScanner along with the new files to ensure that the StoreScanner sees the latest memstore after flush. --- * [HBASE-15296](https://issues.apache.org/jira/browse/HBASE-15296) | *Major* | **Break out writer and reader from StoreFile** \ The table information page in the Master UI now includes a schema section that describes the column families defined for that table as well as any column family specific properties that are set. --- * [HBASE-17472](https://issues.apache.org/jira/browse/HBASE-17472) | *Major* | **Correct the semantic of permission grant** Before this patch, later granted permissions will override previous granted permissions, and previous granted permissions LOST. this issue re-define grant semantic: for master branch, later granted permissions will merge with previous granted permissions. for branch-1.4, grant keep override behavior for compatibility purpose, and a grant with mergeExistingPermission flag provided. --- * [HBASE-17583](https://issues.apache.org/jira/browse/HBASE-17583) | *Major* | **Add inclusive/exclusive support for startRow and endRow of scan for sync client** Now you can include/exlude the startRow and stopRow for a scan. And the new methods to specify startRow and stopRow are withStartRow and withStopRow. The old methods to specify startRow and Row(include constructors) are marked as deprecated as in the old time if startRow and stopRow are equal then we will consider it as a get scan and include the stopRow implicitly. This is strange after we can set inclusiveness explicitly so we add new methods and depredate the old methods. The deprecated methods will be removed in the future. --- * [HBASE-9702](https://issues.apache.org/jira/browse/HBASE-9702) | *Major* | **Change unittests that use "table" or "testtable" to use method names.** Changes all tests to use the TestName JUnit Rule everywhere rather than hardcode table/region/store names. --- * [HBASE-17280](https://issues.apache.org/jira/browse/HBASE-17280) | *Minor* | **Add mechanism to control hbase cleaner behavior** The HBase cleaner chore process cleans up old WAL files and archived HFiles. Cleaner operation can affect query performance when running heavy workloads, so disable the cleaner during peak hours. The cleaner has the following HBase shell commands: - cleaner\_chore\_enabled: Queries whether cleaner chore is enabled/ disabled. - cleaner\_chore\_run: Manually runs the cleaner to remove files. - cleaner\_chore\_switch: enables or disables the cleaner and returns the previous state of the cleaner. For example, cleaner-switch true enables the cleaner. Following APIs are added in Admin: - setCleanerChoreRunning(boolean on): Enable/Disable the cleaner chore - runCleanerChore(): Ask for cleaner chore to run - isCleanerChoreEnabled(): Query whether cleaner chore is enabled/ disabled. --- * [HBASE-17599](https://issues.apache.org/jira/browse/HBASE-17599) | *Major* | **Use mayHaveMoreCellsInRow instead of isPartial** The word 'isPartial' is ambiguous so we introduce a new method 'mayHaveMoreCellsInRow' to replace it. And the old meaning of 'isPartial' is not the same with 'mayHaveMoreCellsInRow' as for batched scan, if the number of returned cells equals to the batch, isPartial will be false. After this change the meaning of 'isPartial' will be same with 'mayHaveMoreCellsInRow'. This is an incompatible change but it is not likely to break a lot of things as for batched scan the old 'isPartial' is just a redundant information, i.e, if the number of returned cells reaches the batch limit. You have already know the number of returned cells and the value of batch. --- * [HBASE-17437](https://issues.apache.org/jira/browse/HBASE-17437) | *Major* | **Support specifying a WAL directory outside of the root directory** This patch adds support for specifying a WAL directory outside of the HBase root directory. Multiple configuration variables were added to accomplish this: hbase.wal.dir: used to configure where the root WAL directory is located. Could be on a different FileSystem than the root directory. WAL directory can not be set to a subdirectory of the root directory. The default value of this is the root directory if unset. hbase.rootdir.perms: Configures FileSystem permissions to set on the root directory. This is '700' by default. hbase.wal.dir.perms: Configures FileSystem permissions to set on the WAL directory FileSystem. This is '700' by default. --- * [HBASE-17350](https://issues.apache.org/jira/browse/HBASE-17350) | *Critical* | **Fixup of regionserver group-based assignment** A few bug fixes and tweaks to the fsgroup feature. Renamed shell command move\_rsgroup\_servers as move\_servers\_rsgroup Renamed shell comand move\_rsgroup\_tables as move\_tables\_rsgroup Made the 'default' group more 'dynamic'; i.e. dead servers no longer show in the 'default' group. --- * [HBASE-17578](https://issues.apache.org/jira/browse/HBASE-17578) | *Major* | **Thrift per-method metrics should still update in the case of exceptions** In prior versions, the HBase Thrift handlers failed to increment per-method metrics when an exception was encountered. These metrics will now always be incremented, whether an exception is encountered or not. This change also adds exception-type metrics, similar to those exposed in regionservers, for individual exceptions which are received by the Thrift handlers. --- * [HBASE-17508](https://issues.apache.org/jira/browse/HBASE-17508) | *Major* | **Unify the implementation of small scan and regular scan for sync client** Now the scan.setSmall method is deprecated. Consider using scan.setLimit and scan.setReadType in the future. And we will open scanner lazily when you call scanner.next. This is an incompatible change which delays the table existence check and permission check. --- * [HBASE-16981](https://issues.apache.org/jira/browse/HBASE-16981) | *Major* | **Expand Mob Compaction Partition policy from daily to weekly, monthly** Mob compaction partition policy can be set by hbase\> create 't1', {NAME =\> 'f1', IS\_MOB =\> true, MOB\_THRESHOLD =\> 1000000, MOB\_COMPACT\_PARTITION\_POLICY =\> 'weekly'} or hbase\> alter 't1', {NAME =\> 'f1', IS\_MOB =\> true, MOB\_THRESHOLD =\> 1000000, MOB\_COMPACT\_PARTITION\_POLICY =\> 'monthly'} Available MOB\_COMPACT\_PARTITION\_POLICY options are "daily", "weekly" and "monthly", the default is "daily". When it is "weekly" policy, the mob compaction will try to compact files within one calendar week into one for a specific partition, similar for "daily" and "monthly". With "weekly" policy, one mob file normally is compacted twice during its lifetime (that is first on daily basis and then all such daily based compacted files belonging to a week at the weekly interval), for one region, there normally are 52 files for one year. With "Monthly" policy, one mob file normally is compacted 3 times during its lifetime (First daily and then weekly followed by monthly at end of every month) and normally there are 12 files for one year. --- * [HBASE-17197](https://issues.apache.org/jira/browse/HBASE-17197) | *Major* | **hfile does not work in 2.0** The -f argument is no longer required specifying target file; just pass the file as an argument. --- * [HBASE-16812](https://issues.apache.org/jira/browse/HBASE-16812) | *Minor* | **Clean up the locks in MOB** In MOB-enabled column family, the lock in the major compaction is removed. All the delete markers are retained in the major compaction, and a MOB reference tag is appended to each of the retained delete markers. --- * [HBASE-12894](https://issues.apache.org/jira/browse/HBASE-12894) | *Critical* | **Upgrade Jetty to 9.2.6** Upgrades Jetty to 9.x from 6.x (Jetty9 is in different namespace from Jetty6). Also updated Jersey to 2.x and Servlet to 3.x. --- * [HBASE-17566](https://issues.apache.org/jira/browse/HBASE-17566) | *Major* | **Jetty upgrade fixes** Fix inability at finding static content post push of parent issue moving us to jetty9. --- * [HBASE-9774](https://issues.apache.org/jira/browse/HBASE-9774) | *Major* | **HBase native metrics and metric collection for coprocessors** This issue adds two new modules, hbase-metrics and hbase-metrics-api which define and implement the "new" metric system used internally within HBase. These two modules (and some other code in hbase-hadoop2-compat) module are referred as "HBase metrics framework" which is HBase-specific and independent of any other metrics library (including Hadoop metrics2 and dropwizards metrics). HBase Metrics API (hbase-metrics-api) contains the interface that HBase exposes internally and to third party code (including coprocessors). It is a thin abstraction over the actual implementation for backwards compatibility guarantees. The metrics API in this hbase-metrics-api module is inspired by the Dropwizard metrics 3.1 API, however, the API is completely independent. hbase-metrics module contains implementation of the "HBase Metrics API", including MetricRegistry, Counter, Histogram, etc. These are highly concurrent implementations of the Metric interfaces. Metrics in HBase are grouped into different sets (like WAL, RPC, RegionServer, etc). Each group of metrics should be tracked via a MetricRegistry specific to that group. Historically, HBase has been using Hadoop's Metrics2 framework [3] for collecting and reporting the metrics internally. However, due to the difficultly of dealing with the Metrics2 framework, HBase is moving away from Hadoop's metrics implementation to its custom implementation. The move will happen incrementally, and during the time, both Hadoop Metrics2-based metrics and hbase-metrics module based classes will be in the source code. All new implementations for metrics SHOULD use the new API and framework. This jira also introduces the metrics API to coprocessor implementations. Coprocessor writes can export custom metrics using the API and have those collected via metrics2 sinks, as well as exported via JMX in regionserver metrics. More documentation available at: hbase-metrics-api/README.txt --- * [HBASE-17491](https://issues.apache.org/jira/browse/HBASE-17491) | *Major* | **Remove all setters from HTable interface and introduce a TableBuilder to build Table instance** After HBASE-17491 all setter methods in HTable are marked as deprecated, moved into TableBuilder, and will be removed later. --- * [HBASE-17067](https://issues.apache.org/jira/browse/HBASE-17067) | *Major* | **Procedure v2 - remove tryAcquire\*Lock and use wait/wake to make framework event based** Make the framework more 'lively'; undo 'suspend' notion in Procedure, rely on eventing mechanism instead. Lets us remove no longer needed synchronizations. Framework can now do more ops per second. --- * [HBASE-16698](https://issues.apache.org/jira/browse/HBASE-16698) | *Major* | **Performance issue: handlers stuck waiting for CountDownLatch inside WALKey#getWriteEntry under high writing workload** Assign sequenceid to an edit before we go on the ringbuffer; undoes contention on WALKey latch. Adds a new config "hbase.hregion.mvcc.preassign" which defaults to true: i.e. this speedup is enabled. User could set this per-table level, like: create 'table',{NAME=\>'f1',CONFIGURATION=\>{'hbase.hregion.mvcc.preassign'=\>'false'}} --- * [HBASE-17488](https://issues.apache.org/jira/browse/HBASE-17488) | *Trivial* | **WALEdit should be lazily instantiated** prevent creating unused objects in the WALEdit's construction. +If the cp#preBatchMutate returns true, the WALEdit is useless. So we should create the WALEdit after step 2. +The cells came from cp should be counted because they are added into the WALEdit . The use case is the local index of phoenix +If the mutation contains the SKIP\_WAL property, its cells aren't added into the WALEdit. So these cells shouldn't be counted. --- * [HBASE-16831](https://issues.apache.org/jira/browse/HBASE-16831) | *Minor* | **Procedure V2 - Remove org.apache.hadoop.hbase.zookeeper.lock** Purges code that did zk-hosted locks for table ops (we do procedure-based locks now) --- * [HBASE-16867](https://issues.apache.org/jira/browse/HBASE-16867) | *Major* | **Procedure V2 - Check ACLs for remote HBaseLock** Add checking ACL when taking locks. --- * [HBASE-16786](https://issues.apache.org/jira/browse/HBASE-16786) | *Major* | **Procedure V2 - Move ZK-lock's uses to Procedure framework locks (LockProcedure)** Move locking to be procedure (Pv2) rather than zookeeper based. All locking moved over to new infrastructure including MOBing locking. --- * [HBASE-17470](https://issues.apache.org/jira/browse/HBASE-17470) | *Major* | **Remove merge region code from region server** In 1.x branches, Admin.mergeRegions calls MASTER via dispatchMergingRegions RPC; when executing dispatchMergingRegions RPC, MASTER calls RS via MergeRegions to complete the merge in RS-side. With HBASE-16119, the merge logic moves to master-side. This JIRA cleans up unused RPCs (dispatchMergingRegions and MergeRegions) , removes dangerous tools such as Merge and HMerge, and deletes unused RegionServer-side merge region logic in 2.0 release. --- * [HBASE-16744](https://issues.apache.org/jira/browse/HBASE-16744) | *Major* | **Procedure V2 - Lock procedures to allow clients to acquire locks on tables/namespaces/regions** Lock for HBase Entity either a Table, a Namespace, or Regions. These are remote locks which live on master, and need periodic heartbeats to keep them alive. (Once we request the lock, internally an heartbeat thread will be started). If master doesn't receive the heartbeat in time, it'll release the lock and make it available to other users. Use {@link LockServiceClient} to build instances. Then call {@link #requestLock()}. {@link #requestLock} will contact master to queue the lock and start the heartbeat thread which will check lock's status periodically and once the lock is acquired, it will send the heartbeats to the master. Use {@link #await} or {@link #await(long, TimeUnit)} to wait for the lock to be acquired. Always call {@link #unlock()} irrespective of whether lock was acquired or not. If the lock was acquired, it'll be released. If it was not acquired, it is possible that master grants the lock in future and the heartbeat thread keeps it alive forever by sending heartbeats. Calling {@link #unlock()} will stop the heartbeat thread and cancel the lock queued on master. There are 4 ways in which these remote locks may be released/can be lost: \* Call {@link #unlock}. \* Lock times out on master: Can happen because of network issues, GC pauses, etc. Worker thread will call the given abortable as soon as it detects such a situation. Fail to contact master: If worker thread can not contact mater and thus fails to send heartbeat before the timeout expires, it assumes that lock is lost and calls the \* abortable. Worker thread is interrupted. Use example: EntityLock lock = lockServiceClient.\*Lock(...., "exampled lock", abortable); lock.requestLock(); .... ....can do other initializations here since lock is 'asynchronous'... .... if (lock.await(timeout)) { ....logic requiring mutual exclusion } lock.unlock(); --- * [HBASE-14061](https://issues.apache.org/jira/browse/HBASE-14061) | *Major* | **Support CF-level Storage Policy** After HBASE-14061 we support to set storage policy for HFile through "hbase.hstore.block.storage.policy" configuration, and we support CF-level setting to override the settings from configuration file. Currently supported storage policies include ALL\_SSD/ONE\_SSD/HOT/WARM/COLD, refer to http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html for more details For example, to create a table with two families: "cf1" with "ALL\_SSD" storage policy and "cf2" with "ONE\_SSD", we could use below command in hbase shell: create 'table',{NAME=\>'f1',STORAGE\_POLICY=\>'ALL\_SSD'},{NAME=\>'f2',STORAGE\_POLICY=\>'ONE\_SSD'} We could also set the configuration in table attribute like all other configurations: create 'table',{NAME=\>'f1',CONFIGURATION=\>{'hbase.hstore.block.storage.policy'=\>'ONE\_SSD'}} --- * [HBASE-17337](https://issues.apache.org/jira/browse/HBASE-17337) | *Major* | **list replication peers request should be routed through master** List replication peers request will be roughed through master. --- * [HBASE-15172](https://issues.apache.org/jira/browse/HBASE-15172) | *Major* | **Support setting storage policy in bulkload** After HBASE-15172/HBASE-19016 we could set storage policy through "hbase.hstore.block.storage.policy" property for bulkload, or "hbase.hstore.block.storage.policy.\" for a specified family. Supported storage policy includes: ALL\_SSD, ONE\_SSD, HOT, WARM, COLD, etc. --- * [HBASE-17336](https://issues.apache.org/jira/browse/HBASE-17336) | *Major* | **get/update replication peer config requests should be routed through master** Get/update replication peer config requests will be routed through master. --- * [HBASE-17320](https://issues.apache.org/jira/browse/HBASE-17320) | *Major* | **Add inclusive/exclusive support for startRow and endRow of scan** Now you can specific the inclusive of startRow and stopRow for a scan using the new methods withStartRow(byte[] startRow, boolean inclusive) and withStopRow(byte[] stopRow, boolean inclusive). The old setStartRow and setStopRow methods, and the constructors are marked as deprecated because of an strange behavior that we will include the stopRow implicitly if startRow equals to stopRow. This is used to support get scan in the old time. Use withStartRow and withStopRow instead. For developers, the ConnectionUtils.createClosestRowBefore is also marked as deprecated as the row returned by this method is only very very close to the current row, not closest. Avoid using this method in the future. --- * [HBASE-17314](https://issues.apache.org/jira/browse/HBASE-17314) | *Major* | **Limit total buffered size for all replication sources** Add a conf "replication.total.buffer.quota" to limit total size of buffered entries in all replication peers. It will prevent server getting OOM if there are many peers. Default value is 256MB. --- * [HBASE-17174](https://issues.apache.org/jira/browse/HBASE-17174) | *Minor* | **Refactor the AsyncProcess, BufferedMutatorImpl, and HTable** + cleanup some unused code + allow being able to share pool between BufferedMutatorImpl + setting "hbase.client.request.controller.impl" to the name of the alternate RequestController (traffic control) implementation class in Configuration + The default RequestController implementation is SimpleRequestController + setting "hbase.client.log.detail.period.ms" to call logger on a period when waiting for tasks to complete --- * [HBASE-17335](https://issues.apache.org/jira/browse/HBASE-17335) | *Major* | **enable/disable replication peer requests should be routed through master** Enable/Disable replication peer requests will be routed through master. --- * [HBASE-5401](https://issues.apache.org/jira/browse/HBASE-5401) | *Major* | **PerformanceEvaluation generates 10x the number of expected mappers** Changes how many tasks PE runs when clients are mapreduce. Now tasks == client count. Previous we hardcoded ten tasks per client instance. --- * [HBASE-11392](https://issues.apache.org/jira/browse/HBASE-11392) | *Critical* | **add/remove peer requests should be routed through master** Add/Remove replication peer requests will be routed through master. And make ReplicationAdmin as Deprecated. --- * [HBASE-15924](https://issues.apache.org/jira/browse/HBASE-15924) | *Major* | **Enhance hbase services autorestart capability to hbase-daemon.sh** Now one can start hbase services with enabled "autostart/autorestart" feature in controlled fashion with the help of "--autostart-window-size" to define the window period and the "--autostart-window-retry-limit" to define the number of times the hbase services have to be restarted upon being killed/terminated abnormally within the provided window perioid. The following cases are supported with "autostart/autorestart": a) --autostart-window-size=0 and --autostart-window-retry-limit=0, indicates infinite window size and no retry limit b) not providing the args, will default to a) c) --autostart-window-size=0 and --autostart-window-retry-limit=\ indicates the autostart process to bail out if the retry limit exceeds irrespective of window period d) --autostart-window-size=\ and --autostart-window-retry-limit=\ indicates the autostart process to bail out if the retry limit "y" is exceeded for the last window period "x". --- * [HBASE-17331](https://issues.apache.org/jira/browse/HBASE-17331) | *Minor* | **Avoid busy waiting in ThrottledInputStream** For each read(), old ThrottledInputStream sleeps/wakes/checks for many times for controlling the throughput. After this patch, ThrottledInputStream sleeps/wakes/checks only once. So we can reduce CPU usage. --- * [HBASE-17296](https://issues.apache.org/jira/browse/HBASE-17296) | *Major* | **Provide per peer throttling for replication** Provide per peer throttling for replication. Add the bandwidth upper limit to ReplicationPeerConfig and a new shell cmd set\_peer\_bandwidth to update the bandwidth in need. --- * [HBASE-17277](https://issues.apache.org/jira/browse/HBASE-17277) | *Major* | **Allow alternate BufferedMutator implementation** Specify the name of an alternate BufferedMutator implementation by either: \* Setting "hbase.client.bufferedmutator.classname" to the name of the alternate implementation class in Configuration \* Or, by setting BufferedMutatorParams#implementationClassName and passing the amended BufferedMutatorParams when calling Connection#getBufferedMutator. --- * [HBASE-17294](https://issues.apache.org/jira/browse/HBASE-17294) | *Major* | **External Configuration for Memory Compaction** This patch provides a single external knob to control memstore compaction. It also inmemory compaction with BASIC policy as our default (AFTERWORD: inmemory compaction as default was undone in HBASE-17333 because of test failures; will be reenabled in later, dedicated issue) Possible memstore compaction policies are: (1) None - no memory compaction, when size threshold is exceeded data is flushed to disk (2) Basic policy applies optimizations which modify the index to a more compacted representation. This is beneficial in all access patterns. The smaller the cells are the greater the benefit of this policy. This is the default policy. (3) Eager - in addition to compacting the index representation as the basic policy, eager policy eliminates duplication while the data is still in memory (much like the on-disk compaction does after the data is flushed to disk). This policy is most useful for applications with high data churn or small working sets. Memory compaction policeman be set at the column family level at table creation time: {code} create ‘\’, {NAME =\> ‘\’, IN\_MEMORY\_COMPACTION =\> ‘\’} {code} or as a property at the global configuration level by setting the property in hbase-site.xml, with BASIC being the default value: {code} \ \hbase.hregion.compacting.memstore.type\ \\\ \ {code} The values used in this property can change as memstore compaction policies evolve over time. --- * [HBASE-16336](https://issues.apache.org/jira/browse/HBASE-16336) | *Major* | **Removing peers seems to be leaving spare queues** Add a ReplicationZKNodeCleaner periodically check and delete the useless replication queue zk node belong to the peer which is not exist. --- * [HBASE-17272](https://issues.apache.org/jira/browse/HBASE-17272) | *Major* | **Doc how to run Standalone HBase over an HDFS instance; all daemons in one JVM but persisting to an HDFS instance** Adds section at http://hbase.apache.org/book.html#standalone.over.hdfs on how to make standalone persist to an hdfs instance (where standalone is all daemons in the one jvm). --- * [HBASE-16700](https://issues.apache.org/jira/browse/HBASE-16700) | *Minor* | **Allow for coprocessor whitelisting** Provides ability to restrict table coprocessors based on HDFS path whitelist. (Particularly useful for allowing Phoenix coprocessors but not arbitrary user created coprocessors.) --- * [HBASE-17221](https://issues.apache.org/jira/browse/HBASE-17221) | *Major* | **Abstract out an interface for RpcServer.Call** Provide an interface RpcCall on the server side. RpcServer.Call now is marked as @InterfaceAudience.Private, and implements the interface RpcCall, --- * [HBASE-16119](https://issues.apache.org/jira/browse/HBASE-16119) | *Major* | **Procedure v2 - Reimplement merge** The merge region logic is controlled by master in 2.0.0 (in 1.x, the core merge region logic is in the region server side). The coprocessors related to merge region in RS-side would be no-op in 2.0.0 and later release. Therefore, this is an incompatible change. Users needs to move the CP logic to new master CP and registers them. A new mergeRegionsAsync() API is added in client. The existing mergeRegions() API will call the new API so client does not have to change its code. --- * [HBASE-17112](https://issues.apache.org/jira/browse/HBASE-17112) | *Major* | **Prevent setting timestamp of delta operations the same as previous value's** Before this issue, two concurrent Increments/Appends done in same millisecond or RS's clock going back will result in two results have same TS, which is not friendly to versioning and will get wrong result in slave cluster if the replication is disordered. After this issue, the result of Increment/Append will always have an incremental TS. There is no any inconsistent in replication for these operations. But there is a rare case that if there is a Delete in same millisecond, the later result can not be masked by this Delete. This can be fixed after we have new semantics that previous Delete will never mask later Put even its timestamp is higher. --- * [HBASE-17181](https://issues.apache.org/jira/browse/HBASE-17181) | *Minor* | **Let HBase thrift2 support TThreadedSelectorServer** Add TThreadedSelectorServer support for HBase Thrift2 --- * [HBASE-17178](https://issues.apache.org/jira/browse/HBASE-17178) | *Major* | **Add region balance throttling** Add region balance throttling. Master execute every region balance plan per balance interval, which is equals to divide max balancing time by the size of region balance plan. And Introduce a new config hbase.master.balancer.maxRitPercent to protect availability. If config this to 0.01, then the max percent of regions in transition is 1% when balancing. Then the cluster's availability is at least 99% when balancing. --- * [HBASE-15786](https://issues.apache.org/jira/browse/HBASE-15786) | *Major* | **Create DBB backed MSLAB pool** Added a new config hbase.regionserver.offheap.global.memstore.size using which one can specify the global off heap limit that all memstores can use. When this config is in MSLAB should be turned ON and we will use the entire size for the MSLAB pool. It will make off heap chunks and pool then. It will behave as if we are working with off heap memstores. When this config is having a valid value and MSLAB is turned OFF, the system will just ignore the offheap size and continue to use global max heap space % for memstores and work with on heap memstores. --- * [HBASE-17132](https://issues.apache.org/jira/browse/HBASE-17132) | *Major* | **Cleanup deprecated code for WAL** Remove HLogKey and related classes and methods. Remove SequenceFile based log reader and writer. WALObserver and RegionObserver are changed so this is an incompatible change. --- * [HBASE-16169](https://issues.apache.org/jira/browse/HBASE-16169) | *Major* | **Make RegionSizeCalculator scalable** Added couple of API's to Admin.java: Returns region load map of all regions hosted on a region server Map\ getRegionLoad(ServerName sn) throws IOException; Returns region load map of all regions of a table hosted on a region server Map\ getRegionLoad(ServerName sn, TableName tableName) throws IOException Added an API to region server: public GetRegionLoadResponse getRegionLoad(RpcController controller, GetRegionLoadRequest request) throws ServiceException; Primary intention is to use this API for RegionSizeCalculator and not rely on Master for ClusterStatus. On large clusters, ClusterStatus() can take a long time. IfMaster is down/busy, then some of the jobs timeout/fail. Other possible uses: 1. If there is a lighter version of GetClusterStatus API (i.e without the ServerLoad for each RS), then custom maintenance tools can be better. In current world ClusterStatus is heavy. With the new APIs, each API's payload is smaller and distributed. So custom tools can call getRegionLoad() when needed, it will be more accurate. This helps with large clusters. For tools that don't need RegionLoad, the lighter version of API is fine enough. 2. Another use case is a tool like RSTop - since we can see selective metrics at RegionLevel (possibly even deltas between each RPC to the server). --- * [HBASE-15788](https://issues.apache.org/jira/browse/HBASE-15788) | *Major* | **Use Offheap ByteBuffers from BufferPool to read RPC requests.** Using the ByteBuffers from ByteBufferPool to read the request bytes at server. When the size of the request is smaller than 1/6th size of a BB in the pool, we will not use that but read into an on demand created, proper sized on heap ByteBuffer. --- * [HBASE-17046](https://issues.apache.org/jira/browse/HBASE-17046) | *Major* | **Add 1.1 doc to hbase.apache.org** Adds a 1.1. item to our 'Documentation and API' tab. Gives access to 1.1 APIs, XRef, etc. --- * [HBASE-16962](https://issues.apache.org/jira/browse/HBASE-16962) | *Major* | **Add readPoint to preCompactScannerOpen() and preFlushScannerOpen() API** The following RegionObserver methods are deprecated InternalScanner preFlushScannerOpen(final ObserverContext\ c, final Store store, final KeyValueScanner memstoreScanner, final InternalScanner s) throws IOException; InternalScanner preCompactScannerOpen(final ObserverContext\ c, final Store store, List\ scanners, final ScanType scanType, final long earliestPutTs, final InternalScanner s, CompactionRequest request) Instead, use the following methods: InternalScanner preFlushScannerOpen(final ObserverContext\ c, final Store store, final KeyValueScanner memstoreScanner, final InternalScanner s, final long readPoint) throws IOException; InternalScanner preCompactScannerOpen(final ObserverContext\ c, final Store store, List\ scanners, final ScanType scanType, final long earliestPutTs, final InternalScanner s, final CompactionRequest request, final long readPoint) throws IOException --- * [HBASE-17017](https://issues.apache.org/jira/browse/HBASE-17017) | *Major* | **Remove the current per-region latency histogram metrics** Removes per-region level (get size, get time, scan size and scan time histogram) metrics that was exposed before. Per-region histogram metrics with 1000+ regions causes millions of objects to be allocated on heap. The patch introduces getCount and scanCount as counters rather than histograms. Other per-region level metrics are kept as they are. --- * [HBASE-16955](https://issues.apache.org/jira/browse/HBASE-16955) | *Major* | **Fixup precommit protoc check to do new distributed protos and pb 3.1.0 build** Test that environment no longer has to have protoc (2.5 and 3.1) available. Needed small adjustment in yetus protoc build but otherwise all works. --- * [HBASE-17050](https://issues.apache.org/jira/browse/HBASE-17050) | *Minor* | **Upgrade Apache CLI version from 1.2 to 1.3.1** Upgrade Apache CLI version from 1.2 to 1.3.1. These are few good/important changes included in this update: - HelpFormatter now prints command-line options in the same order as they have been added. Fixes CLI-212. - Standard help text now shows mandatory arguments also for the first option. Fixes CLI-186. - A new parser is available: DefaultParser. It combines the features of the GnuParser and the PosixParser. It also provides additional features like partial matching for the long options, and long options without separator (i.e like the JVM memory settings: -Xmx512m). This new parser deprecates the previous ones. Fixes CLI-161,CLI-167,CLI-181. For full list of changes: https://commons.apache.org/proper/commons-cli/changes-report.html#a1.3 --- * [HBASE-15513](https://issues.apache.org/jira/browse/HBASE-15513) | *Major* | **hbase.hregion.memstore.chunkpool.maxsize is 0.0 by default** MSLAB chunk pool is on by default in hbase-2.0.0. --- * [HBASE-16972](https://issues.apache.org/jira/browse/HBASE-16972) | *Major* | **Log more details for Scan#next request when responseTooSlow** **WARNING: No release note provided for this change.** --- * [HBASE-17014](https://issues.apache.org/jira/browse/HBASE-17014) | *Minor* | **Add clearly marked starting and shutdown log messages for all services.** Delimit START, STOP, and ABORT messages with '\*\*\*\*\*' so denote. --- * [HBASE-16765](https://issues.apache.org/jira/browse/HBASE-16765) | *Critical* | **New SteppingRegionSplitPolicy, avoid too aggressive spread of regions for small tables.** Introduces a new split policy: SteppingSplitPolicy This will use a simple step function to split a region at (by default) 2 xflushSize when no other region of the same table is seen on the region server, or max-file-size when one or more other regions of the same table is seen. In HBase 2.0 this is going to be the default. In previous versions it can be configured. --- * [HBASE-16608](https://issues.apache.org/jira/browse/HBASE-16608) | *Major* | **Introducing the ability to merge ImmutableSegments without copy-compaction or SQM usage** The index-compation and data-compaction variants of CompactingMemStore are introduced. In both types the active (mutable) segment is periodically flushed-in-memory and is added as immutable segment in the compaction pipeline. The CompactingMemStore of index-compaction type is merging all immutable segments of the compacting pipeline into one. The merging of N segments is explained below. The CompactingMemStore of data-compaction type is compacting all immutable segments of the compacting pipeline into one. After the merge/compaction the old segments in the compacting pipeline are replaced with one new. Before explaining the process of merging N old segments into new one, note that segment structure includes ordered index that allows traversing the cells data efficiently. The merge is copying the ordered indexes of the old segments into one ordered index of new segment. No data is copied, no cells are filtered. Alternatively, in the process of compacting N old segments into new one, both data and index are copied. The old cells are filtered, meaning upon compaction unused versions of the cells are not copied so the new segment has less data then all old ones. This issue introduces only the merging ability and simplifies the user intervention for switching between types. The previous CompactingMemStore structure was added by HBASE-16420 and HBASE-16421. The future refinements of the policy or merging/compacting will come in HBASE-16417. In order to create a table with CompactingMemStore as a MemStore one should use: create ‘\’, {NAME =\> ‘\’, IN\_MEMORY\_COMPACTION =\> true} IN\_MEMORY\_COMPACTION default is false, so table created as following will have the known DefaultMemStore as a MemStore. create ‘\’, {NAME =\> ‘\’} The default type of CompactingMemStore is index-compaction. In order to change it to data-compaction one should add to the hbase-site.xml \ \hbase.hregion.compacting.memstore.type\ \data-compaction\ \ in addition to creating the table as following create ‘\’, {NAME =\> ‘\’, IN\_MEMORY\_COMPACTION =\> true} --- * [HBASE-16747](https://issues.apache.org/jira/browse/HBASE-16747) | *Major* | **Track memstore data size and heap overhead separately** Marking it as incompatible change as there is a change in behavior for region flush decision. The default flush size of 128 MB per region was tracked against both actual data bytes size + overhead of these cells in memstore memory (Overhead because of Cell java objects and CSLM entry). As part of this jira we will keep track of cell data size only in region level. So 128 MB flush size means, 128 MB of cell data bytes (key+ value+..) Globally we will track cell data size and heap overhead separately and will consider both for forced flushes. We will not allow over consume of heap memory by all memstore. This is as old case. Only tracking way is changed. --- * [HBASE-16974](https://issues.apache.org/jira/browse/HBASE-16974) | *Minor* | **Update os-maven-plugin to 1.4.1.final+ for building shade file on RHEL/CentOS** Upgrade os-maven-plugin mvn extension which figures the os we are running on from 1.4 to 1.5. --- * [HBASE-16952](https://issues.apache.org/jira/browse/HBASE-16952) | *Major* | **Replace hadoop-maven-plugins with protobuf-maven-plugin for building protos** Simplifies .proto manipulations. One step only now -- no need to keep pom.xml listing up to date with the protobuf protos directory content -- and no need to preinstall protoc; mvn does it all for you now. --- * [HBASE-14551](https://issues.apache.org/jira/browse/HBASE-14551) | *Minor* | **Procedure v2 - Reimplement split** Moved the Split Region logic to Master and most of split region coprocessor is in master now. Need to change dependency such as Phoenix. --- * [HBASE-15789](https://issues.apache.org/jira/browse/HBASE-15789) | *Major* | **PB related changes to work with offheap** This issue adds a patch to our checked in internal, shaded protobuf, but it also adds a general means of apply patches to our version of protobuf. Patches found in the new src/main/patches directory are all applied as the last task when you run a build with the -Pcompile-protobuf profile under the hbase-protocol-shaded module. This commit also includes our first patch to protobuf; it adds ByteInput to mimic pb3.1's ByteOutput (src/main/patches/HBASE-15789\_V2.patch attached here). --- * [HBASE-16930](https://issues.apache.org/jira/browse/HBASE-16930) | *Major* | **AssignmentManager#checkWals() function can recur infinitely** Fixed potential infinite recursion in AssignmentManager.checkWals(). --- * [HBASE-16463](https://issues.apache.org/jira/browse/HBASE-16463) | *Major* | **Improve transparent table/CF encryption with Commons Crypto** Improve transparent table/CF encryption with Commons Crypto. The change introduces a new optional CryptoCipherProvider (CommonsCryptoAES) for transparent table/CF encryption. And the encryption performance would be accelerated by hardware in modern CPU (AES-NI). This feature could be enabled by updating the configuration "hbase.crypto.cipherprovider" to "org.apache.hadoop.hbase.io.crypto.CryptoCipherProvider" in hbase-site.xml. For detailed information about transparent table/CF encryption including configuration examples see the Security section of the HBase manual. --- * [HBASE-16414](https://issues.apache.org/jira/browse/HBASE-16414) | *Major* | **Improve performance for RPC encryption with Apache Common Crypto** With the security RPC and encryption enabled, introduce Apache Commons Crypto to do the encryption/decryption which supports both supports both JCE Cipher and OpenSSL Cipher. Adds new configs "hbase.rpc.crypto.encryption.aes.enabled" which defaults to false, and "hbase.rpc.crypto.encryption.aes.cipher.class" which defaults to "org.apache.commons.crypto.cipher.JceCipher" to support JCE Cipher, it also can be set as "org.apache.hadoop.crypto.OpensslCipher" to support Openssl Cipher. --- * [HBASE-16721](https://issues.apache.org/jira/browse/HBASE-16721) | *Critical* | **Concurrency issue in WAL unflushed seqId tracking** Fixed a bug in sequenceId tracking for the WALs that caused WAL files to accumulate without being deleted due to a rare race condition. --- * [HBASE-16834](https://issues.apache.org/jira/browse/HBASE-16834) | *Major* | **Add AsyncConnection support for ConnectionFactory** Add createAsyncConnection method to ConnectionFactory for creating AsyncConnection. The default implementation is org.apache.hadoop.hbase.client.AsyncConnectionImpl. You can use 'hbase.client.async.connection.impl' to plug in your own AsyncConnection implementation. --- * [HBASE-16729](https://issues.apache.org/jira/browse/HBASE-16729) | *Trivial* | **Define the behavior of (default) empty FilterList** Empty filter list will behave as when there is no filter added. This change is a behavioral change for those who rely on Empty filter list. --- * [HBASE-16799](https://issues.apache.org/jira/browse/HBASE-16799) | *Major* | **CP exposed Store should not expose unwanted APIs** Below APIs from CP exposed Store interface are removed upsert(Iterable\ cells, long readpoint) add(Cell cell) add(Iterable\ cells) replayCompactionMarker(CompactionDescriptor compaction, boolean pickCompactionFiles, boolean removeFiles) assertBulkLoadHFileOk(Path srcPath) bulkLoadHFile(String srcPathStr, long sequenceId) bulkLoadHFile(StoreFileInfo fileInfo) --- * [HBASE-15921](https://issues.apache.org/jira/browse/HBASE-15921) | *Major* | **Add first AsyncTable impl and create TableImpl based on it** Add AsyncConnection, AsyncTable and AsyncTableRegionLocator. Now the AsyncTable only support get, put and delete. And the implementation of AsyncTableRegionLocator is synchronous actually. --- * [HBASE-16664](https://issues.apache.org/jira/browse/HBASE-16664) | *Major* | **Timeout logic in AsyncProcess is broken** This issue fix three bugs: 1. rpcTimeout configuration not work for one rpc call in AP 2. operationTimeout configuration not work for multi-request (batch, put) in AP 3. setRpcTimeout and setOperationTimeout in HTable is not worked for AP and BufferedMutator. --- * [HBASE-16661](https://issues.apache.org/jira/browse/HBASE-16661) | *Minor* | **Add last major compaction age to per-region metrics** This adds a new per-region metric named "lastMajorCompactionAge" for tracking time since the last major compaction ran on a given region. If a major compaction has never run, the age will be equal to the current timestamp. --- * [HBASE-16117](https://issues.apache.org/jira/browse/HBASE-16117) | *Major* | **Fix Connection leak in mapred.TableOutputFormat** (This change will be irrelevant after HBASE-16774 lands). There is a subtle change with error handling when a connection is not able to connect to ZK. Attempts to create a connection when ZK is not up will now fail immediately instead of silently creating and then failing on a subsequent HBaseAdmin call. --- * [HBASE-15984](https://issues.apache.org/jira/browse/HBASE-15984) | *Critical* | **Given failure to parse a given WAL that was closed cleanly, replay the WAL.** In some particular deployments, the Replication code believes it has reached EOF for a WAL prior to successfully parsing all bytes known to exist in a cleanly closed file. If an EOF is detected due to parsing or other errors while there are still unparsed bytes before the end-of-file trailer, we now reset the WAL to the very beginning and attempt a clean read-through. Because we will retry these failures indefinitely, two additional changes are made to help with diagnostics: \* On each retry attempt, a log message like the below will be emitted at the WARN level: Processing end of WAL file '{}'. At position {}, which is too far away from reported file length {}. Restarting WAL reading (see HBASE-15983 for details). \* additional metrics measure the use of this recovery mechanism. they are described in the reference guide. --- * [HBASE-16753](https://issues.apache.org/jira/browse/HBASE-16753) | *Minor* | **There is a mismatch between suggested Java version in hbase-env.sh** Updates the comments and default values in a few scripts and docs to reflect our Java 1.8+ requirement. --- * [HBASE-16567](https://issues.apache.org/jira/browse/HBASE-16567) | *Critical* | **Upgrade to protobuf-3.1.x** Core is now up on protobuf 3.1.0 (Coprocessor Endpoints and REST are still on protobuf 2.5.0). --- * [HBASE-15638](https://issues.apache.org/jira/browse/HBASE-15638) | *Critical* | **Shade protobuf** Shade/relocate and include the protobuf we use internally. See protobuf chapter in the refguide for more on how we protobuf in hbase-.2.0.0 and going forward. See https://docs.google.com/document/d/1H4NgLXQ9Y9KejwobddCqaVMEDCGbyDcXtdF5iAfDIEk/edit# for how we arrived at this approach. See http://mail-archives.apache.org/mod\_mbox/hbase-dev/201610.mbox/%3C07850EDD-7230-431B-9AB0-C5C91B105EEC%40gmail.com%3E for discussion around merging this change and of how we might revert if an alternative to this awkward patch presents itself; e.g. an hadoop with CLASSPATH isolation (and means of dealing with Sparks use of protobuf 2.5.0, etc.) --- * [HBASE-16264](https://issues.apache.org/jira/browse/HBASE-16264) | *Critical* | **Figure how to deal with endpoints and shaded pb** Shade/relocate the protobuf hbase uses internally. All core now refers to new module added in this patch, hbase-protocol-shaded. Coprocessor Endpoints carry-on with references to the original hbase-protocol module. See new chapter in book on protobufs on how-to going forward. --- * [HBASE-16672](https://issues.apache.org/jira/browse/HBASE-16672) | *Major* | **Add option for bulk load to always copy hfile(s) instead of renaming** This issue adds a config, always.copy.files, to LoadIncrementalHFiles. When set to true, source hfiles would be copied. Meaning source hfiles would be kept after bulk load is done. Default value is false. --- * [HBASE-16660](https://issues.apache.org/jira/browse/HBASE-16660) | *Critical* | **ArrayIndexOutOfBounds during the majorCompactionCheck in DateTieredCompaction** "Please do not use DateTieredCompaction with Major Compaction unless you have a version with this. Otherwise your cluster will not compact any store files and you can end up running out of file descriptors." @churro morales --- * [HBASE-16257](https://issues.apache.org/jira/browse/HBASE-16257) | *Blocker* | **Move staging dir to be under hbase root dir** The HBase property 'hbase.bulkload.staging.dir' is deprecated and is ignored from HBase 2.0. It will defaults to hbase.rootdir/staging automatically with the correct permissions. --- * [HBASE-16650](https://issues.apache.org/jira/browse/HBASE-16650) | *Major* | **Wrong usage of BlockCache eviction stat for heap memory tuning** Changed tracking of evictedBlocks count NOT to include evictions of blocks for a removed HFile. HFiles gets removed after compaction --- * [HBASE-16294](https://issues.apache.org/jira/browse/HBASE-16294) | *Minor* | **hbck reporting "No HDFS region dir found" for replicas** Fixed warning error message displayed for region directory not found for non-default/ non-primary replicas in hbck --- * [HBASE-16540](https://issues.apache.org/jira/browse/HBASE-16540) | *Major* | **Scan should do additional validation on start and stop row** Scan#setStartRow() and Scan#setStopRow() now validate the argument passed for each row key. If the length of the byte[] passed exceeds Short.MAX\_VALUE, an IllegalArgumentException will be thrown. --- * [HBASE-7612](https://issues.apache.org/jira/browse/HBASE-7612) | *Trivial* | **[JDK8] Replace use of high-scale-lib counters with intrinsic facilities** org.apache.hadoop.hbase.util.Counter is deprecated now and will be removed in 3.0. Use LongAdder instead. --- * [HBASE-16447](https://issues.apache.org/jira/browse/HBASE-16447) | *Critical* | **Replication by namespaces config in peer** Support replication by namespaces config in peer. 1. Set a namespace in peer config means that all tables in this namespace will be replicated. 2. If the namespaces config is null, then the table-cfs config decide which table's edit can be replicated. If the table-cfs config is null, then the namespaces config decide which table's edit can be replicated. 3. If you already have set a namespace in the peer config, then you can't set any table of this namespace to the peer config. If you already have set a table in the peer config, then you can't set this table's namespace to the peer config. --- * [HBASE-16598](https://issues.apache.org/jira/browse/HBASE-16598) | *Major* | **Enable zookeeper useMulti always and clean up in HBase code** Deprecate the configuration property 'hbase.zookeeper.useMulti'. useMulti will always be enabled. ZooKeeper 3.4.x and newer is required. Internal: The ZKUtil#multiOrSequential(ZooKeeperWatcher zkw, List\ ops, boolean runSequentialOnMultiFailure) will not check 'hbase.zookeeper.useMulti' anymore, and will always use multi. It can still fall back to sequential operations if: RunSequentialOnMultiFailure is true On calling multi, we get a ZooKeeper exception that can be handled by a sequential call. --- * [HBASE-16388](https://issues.apache.org/jira/browse/HBASE-16388) | *Major* | **Prevent client threads being blocked by only one slow region server** Add a new configuration, hbase.client.perserver.requests.threshold, to limit the max number of concurrent request to one region server. If the user still create new request after reaching the limit, client will throw ServerTooBusyException and do not send the request to the server. This is a client side feature and can prevent client's threads being blocked by one slow region server resulting in the availability of client is much lower than the availability of region servers. For completeness, here extract on new config from hbase-default.xml: Property: hbase.client.perserver.requests.threshold Default: 2147483647 Description: The max number of concurrent pending requests for one server in all client threads (process level). Exceeding requests will be thrown ServerTooBusyException immediately to prevent user's threads being occupied and blocked by only one slow region server. If you use a fix number of threads to access HBase in a synchronous way, set this to a suitable value which is related to the number of threads will help you. See https://issues.apache.org/jira/browse/HBASE-16388 for details. --- * [HBASE-15297](https://issues.apache.org/jira/browse/HBASE-15297) | *Minor* | **error message is wrong when a wrong namspace is specified in grant in hbase shell** The security admin instance available within the HBase shell now returns "false" from the namespace\_exists? method for non-existent namespaces rather than raising a wrapped NamespaceNotFoundException. As a side effect, when the "grant" and "revoke" commands in the HBase shell are invoked with a non-existent namespace the resulting error message now properly refers to said namespace rather than to the user. --- * [HBASE-16086](https://issues.apache.org/jira/browse/HBASE-16086) | *Major* | **TableCfWALEntryFilter and ScopeWALEntryFilter should not redundantly iterate over cells.** push to branch-1.3+ --- * [HBASE-16340](https://issues.apache.org/jira/browse/HBASE-16340) | *Critical* | **ensure no Xerces jars included** HBase no longer includes Xerces implementation jars that were previously included via transitive dependencies. Downstream users relying on HBase for these artifacts will need to update their dependencies. --- * [HBASE-16213](https://issues.apache.org/jira/browse/HBASE-16213) | *Major* | **A new HFileBlock structure for fast random get** HBASE-16213 introduced a new DataBlockEncoding in name of ROW\_INDEX\_V1, which could improve random read (get) performance especially when the average record size (key-value size per row) is small. To use this feature, please set DATA\_BLOCK\_ENCODING to ROW\_INDEX\_V1 for CF of newly created table, or change existing CF with below command: alter 'table\_name',{NAME =\> 'cf', DATA\_BLOCK\_ENCODING =\> 'ROW\_INDEX\_V1'}. Please note that if we turn this DBE on, HFile block will be bigger than NONE encoding because it adds some meta infos for binary search: /\*\* \* Store cells following every row's start offset, so we can binary search to a row's cells. \* \* Format: \* flat cells \* integer: number of rows \* integer: row0's offset \* integer: row1's offset \* .... \* integer: dataSize \* \*/ Seek in row when random reading is one of the main consumers of CPU. This helps. See slide #7 here https://www.slideshare.net/HBaseCon/lift-the-ceiling-of-hbase-throughputs?qid=597ee2fa-8125-4faa-bb3b-2bf1ba9ccafb&v=&b=&from\_search=6 --- * [HBASE-16409](https://issues.apache.org/jira/browse/HBASE-16409) | *Minor* | **Row key for bad row should be properly delimited in VerifyReplication** --delimiter= option is added to verifyrep. The delimiter would wrap bad rows in log output. --- * [HBASE-14921](https://issues.apache.org/jira/browse/HBASE-14921) | *Major* | **Inmemory Compaction Optimizations; Segment Structure** A long, working issue that discussed Segment formats introducing CellArrayMap (delivered as the patch attached to this issue) and CellChunkMap (to be delivered later in HBASE-16421 but see patch v02 for an embryonic form named CellBlockSerialized); when to copy Segment data (and when not too); and then what to include at flush time (the suffix Segment or all Segments). Designs that evolved as discussion went on are attached. Outstanding issues turned up here, not including a CellChunkMap implementation, are listed below but are to be addressed in follow-ons (See HBASE-16417): 1. The flattening without compaction is causing many small segments in pipeline, and they are not flushed all together. 2. The issue of compaction prediction cost. --- * [HBASE-16450](https://issues.apache.org/jira/browse/HBASE-16450) | *Major* | **Shell tool to dump replication queues** New tool to dump existing replication peers, configurations and queues when using HBase Replication. The tool provides two flags: --distributed This flag will poll each RS for information about the replication queues being processed on this RS. By default this is not enabled and the information about the replication queues and configuration will be obtained from ZooKeeper. --hdfs When --distributed is used, this flag will attempt to calculate the total size of the WAL files used by the replication queues. Since its possible that multiple peers can be configured this value can be overestimated. --- * [HBASE-16422](https://issues.apache.org/jira/browse/HBASE-16422) | *Major* | **Tighten our guarantees on compatibility across patch versions** Adds below change to our compat guarantees: {code} -\* Example: A user using a newly deprecated api does not need to modify application code with hbase api calls until the next major version. 10 +\* New APIs introduced in a patch version will only be added in a source compatible way footnote:[See 'Source Compatibility' https://blogs.oracle.com/darcy/entry/kinds\_of\_compatibility]: i.e. code that implements public APIs will continue to compile. {code} --- * [HBASE-7621](https://issues.apache.org/jira/browse/HBASE-7621) | *Major* | **REST client (RemoteHTable) doesn't support binary row keys** RemoteHTable now supports binary row keys with any character or byte by properly encoding request URLs. This is a both a behavioral change from earlier versions and an important fix for protocol correctness. --- * [HBASE-12721](https://issues.apache.org/jira/browse/HBASE-12721) | *Major* | **Create Docker container cluster infrastructure to enable better testing** Downstream users wishing to test HBase in a "distributed" fashion (multiple "nodes" running as separate containers on the same host) can now do so in an automated fashion while leveraging Docker for process isolation via the clusterdock project. For details see the README.md in the dev-support/apache\_hbase\_topology folder. --- * [HBASE-16267](https://issues.apache.org/jira/browse/HBASE-16267) | *Critical* | **Remove commons-httpclient dependency from hbase-rest module** This issue upgrades httpclient to 4.5.2 and httpcore to 4.4.4 which are the versions used by hadoop-2. This is to handle the following CVE's. https://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2015-5262 : http/conn/ssl/SSLConnectionSocketFactory.java in Apache HttpComponents HttpClient before 4.3.6 ignores the http.socket.timeout configuration setting during an SSL handshake, which allows remote attackers to cause a denial of service (HTTPS call hang) via unspecified vectors. https://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2012-6153 https://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2012-5783 Apache Commons HttpClient 3.x, as used in Amazon Flexible Payments Service (FPS) merchant Java SDK and other products, does not verify that the server hostname matches a domain name in the subject's Common Name (CN) or subjectAltName field of the X.509 certificate, which allows man-in-the-middle attackers to spoof SSL servers via an arbitrary valid certificate. Downstream users who are exposed to commons-httpclient via the HBase classpath will have to similarly update their dependency. --- * [HBASE-16308](https://issues.apache.org/jira/browse/HBASE-16308) | *Major* | **Contain protobuf references** Undo protobuf references through the codebase so protobuf references are contained rather than spread about the codebase. For example, moved protobuff-ing up into the various Callables rather than repeat on each method invocation cleaning up boilerplate around rpc calls. Having a few protobuf reference locations only simplifies the parent issue shading project. --- * [HBASE-16321](https://issues.apache.org/jira/browse/HBASE-16321) | *Blocker* | **Ensure findbugs jsr305 jar isn't present** HBase now ensures the jsr305 implementation from the findbugs project is not included in its binary artifacts or the compile / runtime dependencies of its user facing modules. Downstream users that rely on this jar will need to update their dependencies. --- * [HBASE-8386](https://issues.apache.org/jira/browse/HBASE-8386) | *Major* | **deprecate TableMapReduce.addDependencyJars(Configuration, class\ ...)** The MapReduce helper function \`TableMapReduce.addDependencyJars(Configuration, class\ ...)\` has been deprecated since it is easy to use incorrectly. Most users should rely on addDependencyJars(Job) instead. --- * [HBASE-16287](https://issues.apache.org/jira/browse/HBASE-16287) | *Major* | **LruBlockCache size should not exceed acceptableSize too many** In order to avoid blockcache size exceed acceptable size too much, we add one configuration "hbase.lru.blockcache.hard.capacity.limit.factor" to decide whether the block could be put into LruBlockCache or not. This factor defaults to 1.2 If blockcache size \>= factor\*acceptableSize, we will reject the block into cache. --- * [HBASE-16355](https://issues.apache.org/jira/browse/HBASE-16355) | *Major* | **hbase-client dependency on hbase-common test-jar should be test scope** The HBase client artifact previously incorrectly included the hbase-common test jar as a runtime dependency. With this change, that dependency has been moved to test scope. Downstream users are not expected to be impacted, unless they relied on the transitive dependency for these HBase internal test classes. --- * [HBASE-16317](https://issues.apache.org/jira/browse/HBASE-16317) | *Blocker* | **revert all ESAPI changes** This issue reverts fixes designed to prevent malicious content from rendering in HBase's UIs. Specifically, these changes shipped in 1.1.4+ and 1.2.0+. They were removed due to licensing issues discovered in the dependencies they introduced. Their implementation and those dependencies have been removed from HBase! Removal of these dependencies is against the strict definition of our version compatibility guidelines. However, inclusion of non-Apache approved licenses cannot be tolerated. Implementation of these fixes using an Apache-appropriate means is tracked in HBASE-16328. --- * [HBASE-16288](https://issues.apache.org/jira/browse/HBASE-16288) | *Critical* | **HFile intermediate block level indexes might recurse forever creating multi TB files** A new hfile configuration "hfile.index.block.min.entries" which defaults to 16 determines how many entries the hfile index block can have at least. The configuration which determines how large the index block can be at max (hfile.index.block.max.size) is ignored as long as we have fewer than hfile.index.block.min.entries entries. This ensures that multi-level index does not build up with too many levels. --- * [HBASE-16186](https://issues.apache.org/jira/browse/HBASE-16186) | *Major* | **Fix AssignmentManager MBean name** The AssignmentManager MBean was named AssignmentManger (note misspelling). This patch fixed the misspelling. --- * [HBASE-16289](https://issues.apache.org/jira/browse/HBASE-16289) | *Critical* | **AsyncProcess stuck messages need to print region/server** Adds logging of region and server. Helpful debugging. Logging now looks like this: {code} 2016-06-23 17:07:18,759 INFO [Thread-1] client.AsyncProcess$AsyncRequestFutureImpl(1601): #1, waiting for 1 actions to finish on table: DUMMY\_TABLE 2016-06-23 17:07:18,759 INFO [Thread-1] client.AsyncProcess(1720): Left over 1 task(s) are processed on server(s): [s1:1,1,1] 2016-06-23 17:07:18,759 INFO [Thread-1] client.AsyncProcess(1728): Regions against which left over task(s) are processed: [DUMMY\_TABLE,DUMMY\_BYTES\_1,1.3fd12ea80b4df621fb15497ba75f7368.,DUMMY\_TABLE,DUMMY\_BYTES\_2,2.924207e242e313d2e5491c625e0a296e.] {code} --- * [HBASE-14743](https://issues.apache.org/jira/browse/HBASE-14743) | *Minor* | **Add metrics around HeapMemoryManager** A memory metrics reveals situations happened in both MemStores and BlockCache in RegionServer. Through this metrics, users/operators can know 1). Current size of MemStores and BlockCache in bytes. 2). Occurrence for Memstore minor and major flush. (named unblocked flush and blocked flush respectively, shown in histogram) 3). Dynamic changes in size between MemStores and BlockCache. (with Increase/Decrease as prefix, shown in histogram). And a counter for no changes, named DoNothingCounter. 4). Occurrence for memory usage alarm (used more than 95% by default) in RegionServer. (named AboveHeapOccupancyLowWatermarkCounter) --- * [HBASE-13701](https://issues.apache.org/jira/browse/HBASE-13701) | *Major* | **Consolidate SecureBulkLoadEndpoint into HBase core as default for bulk load** SecureBulkLoadEndpoint  has been integrated into HBase core as default bulk load mechanism. It is no longer needed to install it as a coprocessor endpoint. The new server is backward compatible, accommodating non-secure old client and secure old client requesting SecureBulkLoadEndpoint service. SecureBulkLoadEndpoint is deprecated. The backward compatibility support may be removed in future releases. --- * [HBASE-16244](https://issues.apache.org/jira/browse/HBASE-16244) | *Major* | **LocalHBaseCluster start timeout should be configurable** When LocalHBaseCluster is started from the command line the Master would give up after 30 seconds due to a hardcoded timeout meant for unit tests. This change allows the timeout to be configured via hbase-site as well as sets it to 5 minutes when LocalHBaseCluster is started from the command line. --- * [HBASE-16052](https://issues.apache.org/jira/browse/HBASE-16052) | *Major* | **Improve HBaseFsck Scalability** HBASE-16052 improves the performance and scalability of HBaseFsck, especially for large clusters with a small number of large tables. Searching for lingering reference files is now a multi-threaded operation. Loading HDFS region directory information is now multi-threaded at the region-level instead of the table-level to maximize concurrency. A performance bug in HBaseFsck that resulted in redundant I/O and RPCs was fixed by introducing a FileStatusFilter that filters FileStatus objects directly. --- * [HBASE-16144](https://issues.apache.org/jira/browse/HBASE-16144) | *Major* | **Replication queue's lock will live forever if RS acquiring the lock has died prematurely** If zk based replication queue is used and useMulti is false, we will schedule a chore to clean up the orphan replication queue lock on zk. --- * [HBASE-3727](https://issues.apache.org/jira/browse/HBASE-3727) | *Minor* | **MultiHFileOutputFormat** MultiHFileOutputFormat support output of HFiles from multiple tables. It will output directories and hfiles as follow, --table1 --family1 --family2 --Hfiles --table2 --family3 --hfiles --family4 family directory and its hfiles match the output of HFileOutputFormat2 --- * [HBASE-16231](https://issues.apache.org/jira/browse/HBASE-16231) | *Major* | **Integration tests should support client keytab login for secure clusters** Prior to this change, the integration test clients (IntegrationTest\*) relied on the Kerberos credential cache for authentication against secured clusters. This could lead to the tests failing due to authentication failures when the tickets in the credential cache expired. With this change, the integration test clients will make use of the configuration properties for "hbase.client.keytab.file" and "hbase.client.kerberos.principal", when available. This will perform a login from the configured keytab file and automatically refresh the credentials in the background for the process lifetime. --- * [HBASE-13823](https://issues.apache.org/jira/browse/HBASE-13823) | *Major* | **Procedure V2: unnecessaery operations on AssignmentManager#recoverTableInDisablingState() and recoverTableInEnablingState()** For cluster upgraded from 1.0.x or older releases, master startup would not continue the in-progress enable/disable table process. If orphaned znode with ENABLING/DISABLING state exists in the cluster, run hbck or manually fix the issue. For new cluster or cluster upgraded from 1.1.x and newer release, there is no issue to worry about. --- * [HBASE-16095](https://issues.apache.org/jira/browse/HBASE-16095) | *Major* | **Add priority to TableDescriptor and priority region open thread pool** Adds a PRIORITY property to the HTableDescriptor. PRIORITY should be in the same range as the RpcScheduler defines it (HConstants.XXX\_QOS). Table priorities are only used for region opening for now. There can be other uses later (like RpcScheduling). Regions of high priority tables (priority \>= than HIGH\_QOS) are opened from a different thread pool than the regular region open thread pool. However, table priorities are not used as a global order for region assigning or opening. --- * [HBASE-16081](https://issues.apache.org/jira/browse/HBASE-16081) | *Blocker* | **Replication remove\_peer gets stuck and blocks WAL rolling** When a replication endpoint is sent a shutdown request by the replication source in situations like removing a peer, we now try to gracefully shut it down by draining the items already sent for replication to the peer cluster. If the drain does not complete in the specified time (hbase.rpc.timeout \* replication.source.maxterminationmultiplier), the regionserver is aborted to avoid blocking the WAL roll. --- * [HBASE-16087](https://issues.apache.org/jira/browse/HBASE-16087) | *Major* | **Replication shouldn't start on a master if if only hosts system tables** Masters will no longer start any replication threads if they are hosting only system tables. In order to change this add something to the config for tables on master that doesn't start with "hbase:" ( Replicating system tables is something that's currently unsupported and can open up security holes, so do this at your own peril) --- * [HBASE-14548](https://issues.apache.org/jira/browse/HBASE-14548) | *Major* | **Expand how table coprocessor jar and dependency path can be specified** Allow a directory containing the jars or some wildcards to be specified, such as: hdfs://namenode:port/user/hadoop-user/ or hdfs://namenode:port/user/hadoop-user/\*.jar Please note that if a directory is specified, all jar files(.jar) directly in the directory are added, but it does not search files in the subtree rooted in the directory. Do not contain any wildcard if you would like to specify a directory. --- * [HBASE-15925](https://issues.apache.org/jira/browse/HBASE-15925) | *Blocker* | **compat-module maven variable not evaluated** Downstream users of HBase dependencies that do not properly activate Maven profiles should now see a correct transitive dependency on the default hadoop-compatibility-module. --- * [HBASE-16140](https://issues.apache.org/jira/browse/HBASE-16140) | *Major* | **bump owasp.esapi from 2.1.0 to 2.1.0.1** The dependency owasp.esapi had a compatible change from 2.1.0 to 2.1.0.1. As a result, the transitive dependency commons-fileupload had a change from 1.2 to 1.3.1, which has some minor class changes that impact binary compatibility. Interested users should check the release notes of commons-fileupload to see if any of the incompatible changes impact them. http://commons.apache.org/proper/commons-fileupload/changes-report.html --- * [HBASE-16147](https://issues.apache.org/jira/browse/HBASE-16147) | *Major* | **Shell command for getting compaction state** compaction\_state shell command would return compaction state in String form: NONE, MINOR, MAJOR, MAJOR\_AND\_MINOR --- * [HBASE-14878](https://issues.apache.org/jira/browse/HBASE-14878) | *Major* | **maven archetype: client application with shaded jars** Adds new hbase-shaded-client archetype; also corrects an omission found in hbase-archetypes/README.md in the section headed "How to add a new archetype". --- * [HBASE-14877](https://issues.apache.org/jira/browse/HBASE-14877) | *Major* | **maven archetype: client application** This patch introduces a new infrastructure for creation and maintenance of Maven archetypes in the context of the hbase project, and it also introduces the first archetype, which end-users may utilize to generate a simple hbase-client dependent project. NOTE that this patch should introduce two new WARNINGs ("Using platform encoding ... to copy filtered resources") into the hbase install process. These warnings are hard-wired into the maven-archetype-plugin:create-from-project goal. See hbase/hbase-archetypes/README.md, footnote [6] for details. After applying the patch, see hbase/hbase-archetypes/README.md for details regarding the new archetype infrastructure introduced by this patch. (The README text is also conveniently positioned at the top of the patch itself.) Here is the opening paragraph of the README.md file: ================= The hbase-archetypes subproject of hbase provides an infrastructure for creation and maintenance of Maven archetypes pertinent to HBase. Upon deployment to the archetype catalog of the central Maven repository, these archetypes may be used by end-user developers to autogenerate completely configured Maven projects (including fully-functioning sample code) through invocation of the archetype:generate goal of the maven-archetype-plugin. ======== The README.md file also contains several paragraphs under the heading, "Notes for contributors and committers to the HBase project", which explains the layout of 'hbase-archetypes', and how archetypes are created and installed into the local Maven repository, ready for deployment to the central Maven repository. It also outlines how new archetypes may be developed and added to the collection in the future. --- * [HBASE-15977](https://issues.apache.org/jira/browse/HBASE-15977) | *Major* | **Failed variable substitution on home page** Done. Thanks, Dima, Andrew! --- * [HBASE-5291](https://issues.apache.org/jira/browse/HBASE-5291) | *Major* | **Add Kerberos HTTP SPNEGO authentication support to HBase web consoles** HBase Web UIs can be secured from general public access using SPNEGO to require a valid Kerberos ticket. Setting 'hbase.security.authentication.ui' to 'kerberos' in hbase-site.xml is a global switch to have all Web UIs allow only authenticated clients via Kerberos. 'hbase.security.authentication.spnego.kerberos.principal' and 'hbase.security.authentication.spnego.kerberos.keytab' are two other required properties in hbase-site.xml, the Kerberos principal and keytab to use for the server to use to log in. The primary in the Kerberos principal must be 'HTTP' as required by the SPNEGO mechanism, e.g. 'HTTP/host.domain.com@DOMAIN.COM'. --- * [HBASE-15950](https://issues.apache.org/jira/browse/HBASE-15950) | *Major* | **Fix memstore size estimates to be more tighter** The estimates of heap usage by the memstore objects (KeyValue, object and array header sizes, etc) have been made more accurate for heap sizes up to 32G (using CompressedOops), resulting in them dropping by 10-50% in practice. This also results in less number of flushes and compactions due to "fatter" flushes. YMMV. As a result, the actual heap usage of the memstore before being flushed may increase by up to 100%. If configured memory limits for the region server had been tuned based on observed usage, this change could result in worse GC behavior or even OutOfMemory errors. Set the environment property (not hbase-site.xml) "hbase.memorylayout.use.unsafe" to false to disable. --- * [HBASE-16023](https://issues.apache.org/jira/browse/HBASE-16023) | *Major* | **Fastpath for the FIFO rpcscheduler** Adds a 'fastpath' when using the default FIFO rpc scheduler ('fifo'). Does direct handoff from Reader thread to Handler if there is one ready and willing. Will shine best when high random read workload (YCSB workloadc for instance) --- * [HBASE-15971](https://issues.apache.org/jira/browse/HBASE-15971) | *Critical* | **Regression: Random Read/WorkloadC slower in 1.x than 0.98** Change the default rpc scheduler from 'deadline' to 'fifo' instead so it is the same as in branch 0.98. 'deadline' was of questionable benefit but with a high cost scheduling. To re-enable 'deadline', set hbase.ipc.server.callqueue.type to 'deadline' in your hbase-site.xml. --- * [HBASE-15525](https://issues.apache.org/jira/browse/HBASE-15525) | *Critical* | **OutOfMemory could occur when using BoundedByteBufferPool during RPC bursts** Added a new ByteBufferPool which pools N ByteBuffers. By default it makes off heap ByteBuffers when getBuffer() is called. The size of each buffer defaults to 64KB. This can be configured using 'hbase.ipc.server.reservoir.initial.buffer.size'. The max number of buffers which can be pooled defaults to twice the number of handler threads in RS. This can be configured with key 'hbase.ipc.server.reservoir.initial.max'. While responding to read requests and client support Codec, we will create CellBlocks and directly return it as PB payload. For making this block, we will use N ByteBuffers from pool as per the total size of the response cells. The default size of 64 KB for the buffer is inline with the number of bytes written to RPC layer in one short.(That is also 64KB). When at point of time, the calle not able to get a free buffer from the pool (it returns null then), it will make on heap Buffer of same size (as that of Buffers in pool) and use that to create cell block. --- * [HBASE-15994](https://issues.apache.org/jira/browse/HBASE-15994) | *Major* | **Allow selection of RpcSchedulers** Adds a FifoRpcSchedulerFactory so you can try the FifoRpcScheduler by setting "hbase.region.server.rpc.scheduler.factory.class" --- * [HBASE-15989](https://issues.apache.org/jira/browse/HBASE-15989) | *Major* | **Remove hbase.online.schema.update.enable** Removes the "hbase.online.schema.update.enable" property. from now, every operation that alter the schema (e.g. modifyTable, addFamily, removeFamily, ...) will use the online schema update. there is no need to disable/enable the table. --- * [HBASE-15981](https://issues.apache.org/jira/browse/HBASE-15981) | *Minor* | **Stripe and Date-tiered compactions inaccurately suggest disabling table in docs** Removes reference to disabling table in docs for stripe and date-tiered compactions --- * [HBASE-15931](https://issues.apache.org/jira/browse/HBASE-15931) | *Critical* | **Add log for long-running tasks in AsyncProcess** After HBASE-15931, we will log more details for long-running tasks in AsyncProcess#waitForMaximumCurrentTasks every 10 seconds, including: 1. Table name will be included in the tasks status log 2. On which regionserver(s) the tasks are runnning will be logged when less than hbase.client.threshold.log.details tasks left, by default 10. 3. Against which regions the tasks are running will be logged when less than 2 tasks left. --- * [HBASE-15907](https://issues.apache.org/jira/browse/HBASE-15907) | *Major* | **Missing documentation of create table split options** documentation changes only - added section to Shell tricks and cross reference from region splitting section --- * [HBASE-15915](https://issues.apache.org/jira/browse/HBASE-15915) | *Major* | **Set timeouts on hanging tests** Use @ClassRule to set timeout on test case level (instead of @Rule which sets timeout for the test methods). CategoryBasedTimeout.forClass(..) determines the timeout value based on category annotation (small/medium/large) on the test case. --- * [HBASE-15875](https://issues.apache.org/jira/browse/HBASE-15875) | *Major* | **Remove HTable references and HTableInterface** **WARNING: No release note provided for this change.** --- * [HBASE-15610](https://issues.apache.org/jira/browse/HBASE-15610) | *Blocker* | **Remove deprecated HConnection for 2.0 thus removing all PB references for 2.0** **WARNING: No release note provided for this change.** --- * [HBASE-15890](https://issues.apache.org/jira/browse/HBASE-15890) | *Major* | **Allow thrift to set/unset "cacheBlocks" for Scans** Adds cacheBlocks to Scan --- * [HBASE-15876](https://issues.apache.org/jira/browse/HBASE-15876) | *Blocker* | **Remove doBulkLoad(Path hfofDir, final HTable table) though it has not been through a full deprecation cycle** Removes a doBulkLoad method though it has not been through a full deprecation cycle (but it is 'damaged' because it has a parameter that has been properly deprecated). Use the alternative {code}public void doBulkLoad(Path hfofDir, final Admin admin, Table table, RegionLocator regionLocator){code} See http://mail-archives.apache.org/mod\_mbox/hbase-dev/201605.mbox/%3CCAMUu0w-ZiLoLBLO3D76=n3AjUr=VMtTUeYA28weLHYeq8+e3bQ@mail.gmail.com%3E for NOTICE on this 'premature' removal. --- * [HBASE-15228](https://issues.apache.org/jira/browse/HBASE-15228) | *Major* | **Add the methods to RegionObserver to trigger start/complete restoring WALs** Added two hooks around WAL restore. preReplayWALs(final ObserverContext\ ctx, HRegionInfo info, Path edits) and postReplayWALs(final ObserverContext\ ctx, HRegionInfo info, Path edits) Will be called at start and end of restore of a WAL file. The other hook around WAL restore (preWALRestore ) will be called before restore of every entry within the WAL file. --- * [HBASE-15856](https://issues.apache.org/jira/browse/HBASE-15856) | *Critical* | **Cached Connection instances can wind up with addresses never resolved** During periods where DNS resolution was not available or not working correctly, we could previously cache unresolved hostnames forever, in some cases preventing further connections to these hosts even when DNS service was restored. With this change, unresolved hostnames will no longer be cached, and will instead throw an UnknownHostException during connection setup. --- * [HBASE-15593](https://issues.apache.org/jira/browse/HBASE-15593) | *Major* | **Time limit of scanning should be offered by client** Add a new configuration: hbase.ipc.min.client.request.timeout Minimum allowable timeout (in milliseconds) in rpc request's header. This configuration exists to prevent the rpc service regarding this request as timeout immediately. --- * [HBASE-15784](https://issues.apache.org/jira/browse/HBASE-15784) | *Major* | **Misuse core/maxPoolSize of LinkedBlockingQueue in ThreadPoolExecutor** The core pool size and max pool size of ThreadPoolExecutor should be the same when LinkedBlockingQueue is used. Thus the configurations hbase.hconnection.threads.max, hbase.hconnection.meta.lookup.threads.max, hbase.region.replica.replication.threads.max and hbase.multihconnection.threads.max are used as the number of the core threads, and the related configurations \*.thread.core are not used any more. --- * [HBASE-15651](https://issues.apache.org/jira/browse/HBASE-15651) | *Major* | **Add report-flakies.py to use jenkins api to get failing tests** To find recent set of flakies, run the script added by this patch. Run it to get usage information passing -h: {code} $ ./dev-support/report-flakies.py -h {code} If you get the below: {code} $ python ./dev-support/report-flakies.py Traceback (most recent call last): File "./dev-support/report-flakies.py", line 25, in \ import requests ImportError: No module named requests {code} ... install the requests module: {code} $ sudo pip install requests {code} --- * [HBASE-15780](https://issues.apache.org/jira/browse/HBASE-15780) | *Critical* | **Expose AuthUtil as IA.Public** Downstream users with long lived applications that need to communicate with secure HBase instances can now rely on the AuthUtil class to handle authenticating via keytab. For more information, see the javadoc for the org.apache.hadoop.hbase.AuthUtil class. --- * [HBASE-15811](https://issues.apache.org/jira/browse/HBASE-15811) | *Blocker* | **Batch Get after batch Put does not fetch all Cells** We were not waiting on all executors in a batch to complete which meant a read-your-own-writes could sometimes fail -- especially if client is loaded; i.e. putting to multiple machines in a cluster. The test for no-more-executors was damaged by the 0.99/0.98.4 fix "HBASE-11403 Fix race conditions around Object#notify" --- * [HBASE-15801](https://issues.apache.org/jira/browse/HBASE-15801) | *Major* | **Upgrade checkstyle for all branches** All active branches now use maven-checkstyle-plugin 2.17 and checkstyle 6.18. --- * [HBASE-15236](https://issues.apache.org/jira/browse/HBASE-15236) | *Major* | **Inconsistent cell reads over multiple bulk-loaded HFiles** This jira fixes that following bug: During bulkloading, if there are multiple hfiles corresponding to same region, and if they have same timestamps (which may have been set using importtsv.timestamp) and duplicate keys across them, then get and scan may return values coming from different hfiles. --- * [HBASE-15740](https://issues.apache.org/jira/browse/HBASE-15740) | *Major* | **Replication source.shippedKBs metric is undercounting because it is in KB** Removed Replication source.shippedKBs metric in favor of source.shippedBytes --- * [HBASE-15773](https://issues.apache.org/jira/browse/HBASE-15773) | *Major* | **CellCounter improvements** The CellCounter map reduce job now supports additional configuration options on the Scan instance it creates, using the org.apache.hadoop.hbase.mapreduce.TableInputFormat defined property names. For a full list of the options, run ./hbase org.apache.hadoop.hbase.mapreduce.CellCounter with no arguments. CellCounter also no longer creates job counters for per-rowkey and per-rowkey/qualifier cell counts. For most tables, these counters would cause the job to fail due to mapreduce job counter limits. --- * [HBASE-15759](https://issues.apache.org/jira/browse/HBASE-15759) | *Minor* | **RegionObserver.preStoreScannerOpen() doesn't have acces to current readpoint** The following RegionObserver method is deprecated and would no longer be called in hbase 2.0: public KeyValueScanner preStoreScannerOpen(final ObserverContext\ c, final Store store, final Scan scan, final NavigableSet\ targetCols, final KeyValueScanner s) throws IOException { Instead, override this method: public KeyValueScanner preStoreScannerOpen(final ObserverContext\ c, final Store store, final Scan scan, final NavigableSet\ targetCols, final KeyValueScanner s, final long readPt) throws IOException { --- * [HBASE-15743](https://issues.apache.org/jira/browse/HBASE-15743) | *Major* | **Add Transparent Data Encryption support for FanOutOneBlockAsyncDFSOutput** Now the AsyncFSWAL can write data to a encryption zone on HDFS. --- * [HBASE-15767](https://issues.apache.org/jira/browse/HBASE-15767) | *Major* | **Upgrade httpclient dependency** HBase now relies on version 4.3.6 of the Apache Commons HTTPClient library. Downstream users who are exposed to it via the HBase classpath will have to similarly update their dependency. --- * [HBASE-15575](https://issues.apache.org/jira/browse/HBASE-15575) | *Minor* | **Rename table DDL \*Handler methods in MasterObserver to more meaningful names** **WARNING: No release note provided for this change.** --- * [HBASE-15720](https://issues.apache.org/jira/browse/HBASE-15720) | *Major* | **Print row locks at the debug dump page** Adds a section to the debug dump page listing current row locks held. --- * [HBASE-15703](https://issues.apache.org/jira/browse/HBASE-15703) | *Critical* | **Deadline scheduler needs to return to the client info about skipped calls, not just drop them** With previous deadline mode of RPC scheduling (the implementation in SimpleRpcScheduler, which is basically a FIFO except that long-running scans are de-prioritized) and FIFO-based RPC scheduler clients are getting CallQueueTooBigException when RPC call queue is full. With this patch and when hbase.ipc.server.callqueue.type property is set to "codel" mode, clients will also be getting CallDroppedException, which means that the request was discarded by the server as it considers itself to be overloaded and starts to drop requests to avoid going down under the load. The clients will retry upon receiving this exception. It doesn't clear MetaCache with region locations. --- * [HBASE-15281](https://issues.apache.org/jira/browse/HBASE-15281) | *Major* | **Allow the FileSystem inside HFileSystem to be wrapped** This patch adds new configuration property - hbase.fs.wrapper. If provided, it should be fully qualified class name of the class used as a pluggable wrapper for HFileSystem. This may be useful for specific debugging/tracing needs. --- * [HBASE-15551](https://issues.apache.org/jira/browse/HBASE-15551) | *Minor* | **Make call queue too big exception use servername** Fixes issue when CallQueueTooBig exception returned to the client could print useless address info (like 0.0.0.0) if RPC server is listening on something other than the host name, making troubleshooting inconvenient. --- * [HBASE-15711](https://issues.apache.org/jira/browse/HBASE-15711) | *Major* | **Add client side property to allow logging details for batch errors** In HBASE-15711 a new client side property hbase.client.log.batcherrors.details is introduced to allow logging full stacktrace of exceptions for batch error. It's disabled by default and set the property to true will enable it. --- * [HBASE-15686](https://issues.apache.org/jira/browse/HBASE-15686) | *Major* | **Add override mechanism for the exempt classes when dynamically loading table coprocessor** New coprocessor table descriptor attribute, hbase.coprocessor.classloader.included.classes, is added. User can specify class name prefixes (semicolon separated) which should be loaded by CoprocessorClassLoader through this attribute using the following syntax: {code} hbase\> alter 't1', 'coprocessor'=\>'hdfs:///foo.jar\|com.foo.FooRegionObserver\|1001\|arg1=1,arg2=2' {code} --- * [HBASE-15645](https://issues.apache.org/jira/browse/HBASE-15645) | *Critical* | **hbase.rpc.timeout is not used in operations of HTable** Fixes regression where hbase.rpc.timeout configuration was ignored in branch-1.0+ Adds new methods setOperationTimeout, getOperationTimeout, setRpcTimeout, and getRpcTimeout to Table. In branch-1.3+ they are public interfaces and in 1.0-1.2 they are labeled as @InterfaceAudience.Private. Adds hbase.client.operation.timeout to hbase-default.xml with default of 1200000 --- * [HBASE-15477](https://issues.apache.org/jira/browse/HBASE-15477) | *Major* | **Do not save 'next block header' when we cache hfileblocks** Fix over-persisting in blockcache; no longer save the block PLUS the header of the next block (33 bytes) when writing the cache. Also removes support for hfileblock v1; hfile block v1 was used writing hfile v1. hfile v1 was the default in hbase before hbase-0.92. hbase.96 would not start unless all v1 hfiles had been compacted out of the cluster. --- * [HBASE-15628](https://issues.apache.org/jira/browse/HBASE-15628) | *Major* | **Implement an AsyncOutputStream which can work with any FileSystem implementation** Introduce an AsyncFSOutput interface which is an abstraction of the original FanOutOneBlockAsyncDFSOutput. Now you can create AsyncFSOutput on any FileSystem using the method AsyncFSOutputHelper.createOutput. The returned AsyncFSOutput will be FanOutOneBlockAsyncDFSOutput if the given FileSystem is a DistributedFileSystem. --- * [HBASE-15392](https://issues.apache.org/jira/browse/HBASE-15392) | *Major* | **Single Cell Get reads two HFileBlocks** When an explicit Get with a one or more columns specified, we at a minimum, were overseeking, reading until we tripped over the next row, regardless, and only then returning. If the next row was in-block, we'd just do too much seeking but if the next row was in the next (or in the next block beyond that), we would keep seeking and loading blocks until we found the next row before we'd return. There remains one case where we will still 'overread'. It is when the row end aligns with the end of the block. In this case we will load the next block just to find that there are no more cells in the current row. See HBASE-15457. --- * [HBASE-15671](https://issues.apache.org/jira/browse/HBASE-15671) | *Major* | **Add per-table metrics on memstore, storefile and regionsize** Adds storeFileSize, memstoreSize and tableSize to the per-table metrics. --- * [HBASE-15366](https://issues.apache.org/jira/browse/HBASE-15366) | *Major* | **Add doc, trace-level logging, and test around hfileblock** No functional change. Added javadoc, comments, and extra trace-level logging to make clear what is happening around the reading and caching of hfile blocks. --- * [HBASE-15368](https://issues.apache.org/jira/browse/HBASE-15368) | *Major* | **Add pluggable window support** Use 'hbase.hstore.compaction.date.tiered.window.factory.class' to specify the window implementation you like for date tiered compaction. Now the only and default implementation is org.apache.hadoop.hbase.regionserver.compactions.ExponentialCompactionWindowFactory. {code} \ \hbase.hstore.compaction.date.tiered.window.factory.class\ \org.apache.hadoop.hbase.regionserver.compactions.ExponentialCompactionWindowFactory\ \ \ {code} --- * [HBASE-15518](https://issues.apache.org/jira/browse/HBASE-15518) | *Major* | **Add Per-Table metrics back** Adds per-table metrics aggregated from per-region metrics in region server metrics. New metrics are available under JMX section "Hadoop:service=HBase,name=RegionServer,sub=Tables" and they are available via hadoop metrics2 collectors. --- * [HBASE-15640](https://issues.apache.org/jira/browse/HBASE-15640) | *Major* | **L1 cache doesn't give fair warning that it is showing partial stats only when it hits limit** The blockcache UI tab would stop refreshing at 100k blocks (configurable, see "hbase.ui.blockcache.by.file.max"), which isn't very many blocks when doing a big cache, giving a misleading picture of the content of L1 and/or L2 cache. Up the default limit to 1M blocks (UI takes a while but just a few seconds counting over 1M blocks). Also, when beyond the limit give the user a noticeable WARNING in the UI. --- * [HBASE-15386](https://issues.apache.org/jira/browse/HBASE-15386) | *Major* | **PREFETCH\_BLOCKS\_ON\_OPEN in HColumnDescriptor is ignored** This was a non-issue. The PREFETCH\_... flag actually works. While here though made the following additions. Changes the prefetch TRACE-level loggings to include the word 'Prefetch' in them so you know what they are about. Changes the cryptic logging of the CacheConfig#toString to have some preamble saying why and what column family is responsible (helps figure what is going on) Add test that verifies setting flag on HColumnDescriptor actually works. --- * [HBASE-13372](https://issues.apache.org/jira/browse/HBASE-13372) | *Major* | **Unit tests for SplitTransaction and RegionMergeTransaction listeners** HBASE-13372 Add unit tests for SplitTransaction and RegionMergeTransaction listeners --- * [HBASE-15187](https://issues.apache.org/jira/browse/HBASE-15187) | *Major* | **Integrate CSRF prevention filter to REST gateway** Protection against CSRF attack can be turned on with config parameter, hbase.rest.csrf.enabled - default value is false. The custom header to be sent can be changed via config parameter, hbase.rest.csrf.custom.header whose default value is "X-XSRF-HEADER". Config parameter, hbase.rest.csrf.methods.to.ignore , controls which HTTP methods are not associated with customer header check. Config parameter, hbase.rest-csrf.browser-useragents-regex , is a comma-separated list of regular expressions used to match against an HTTP request's User-Agent header when protection against cross-site request forgery (CSRF) is enabled for REST server by setting hbase.rest.csrf.enabled to true. The implementation came from hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/http/RestCsrfPreventionFilter.java We should periodically update the RestCsrfPreventionFilter.java in hbase codebase to include fixes to the hadoop implementation. --- * [HBASE-15481](https://issues.apache.org/jira/browse/HBASE-15481) | *Trivial* | **Add pre/post roll to WALObserver** WALObserver coprocessors now can receive notifications of WAL rolling via the new methods `preWALRoll` and `postWALRoll`. This change is incompatible due to the addition of these methods to the `WALObserver` interface. Downstream users are encouraged to instead extend the `BaseWALObserver` class, which remains compatible through this change. --- * [HBASE-15507](https://issues.apache.org/jira/browse/HBASE-15507) | *Major* | **Online modification of enabled ReplicationPeerConfig** Added update\_peer\_config to the HBase shell and ReplicationAdmin, and provided a callback for custom replication endpoints to be notified of changes to their configuration and peer data --- * [HBASE-15537](https://issues.apache.org/jira/browse/HBASE-15537) | *Major* | **Make multi WAL work with WALs other than FSHLog** Add the delegate config for multiwal back. Now you can use 'hbase.wal.regiongrouping.delegate.provider' to specify the wal provider you want to use for multiwal. For example: {code} \ \hbase.wal.regiongrouping.delegate.provider\ \asyncfs\ \ {code} And the default value is filesystem which is the alias of DefaultWALProvider, i.e., the FSHLog. --- * [HBASE-15400](https://issues.apache.org/jira/browse/HBASE-15400) | *Major* | **Use DateTieredCompactor for Date Tiered Compaction** With this patch combined with HBASE-15389, when we compact, we can output multiple files along the current window boundaries. There are two use cases: 1. Major compaction: We want to output date tiered store files with data older than max age archived in trunks of the window size on the higher tier. Once a window is old enough, we don't combine the windows to promote to the next tier any further. So files in these windows retain the same timespan as they were minor-compacted last time, which is the window size of the highest tier. Major compaction will touch these files and we want to maintain the same layout. This way, TTL and archiving will be simpler and more efficient. 2. Bulk load files and the old file generated by major compaction before upgrading to DTCP. This will change the way to enable date tiered compaction. To turn it on: hbase.hstore.engine.class: org.apache.hadoop.hbase.regionserver.DateTieredStoreEngine With tiered compaction all servers in the cluster will promote windows to higher tier at the same time, so using a compaction throttle is recommended: hbase.regionserver.throughput.controller:org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController hbase.hstore.compaction.throughput.higher.bound and hbase.hstore.compaction.throughput.lower.bound need to be set for desired throughput range as uncompressed rates. Because there will most likely be more store files around, we need to adjust the configuration so that flush won't be blocked and compaction will be properly throttled: hbase.hstore.blockingStoreFiles: change to 50 if using all default parameters when turning on date tiered compaction. Use 1.5~2 x projected file count if changing the parameters, Projected file count = windows per tier x tier count + incoming window min + files older than max age Because major compaction is turned on now, we also need to adjust the configuration for max file to compact according to the larger file count: hbase.hstore.compaction.max: set to the same number as hbase.hstore.blockingStoreFiles. For more details, please refer to the design spec at https://docs.google.com/document/d/1\_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG\_uy8/edit# --- * [HBASE-15592](https://issues.apache.org/jira/browse/HBASE-15592) | *Major* | **Print Procedure WAL content** Use hbase org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALPrettyPrinter to print the content of a Procedure WAL. e.g. hbase org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALPrettyPrinter -f /hbase/MasterProcWALs/state-00000000000000002571.log --- * [HBASE-15396](https://issues.apache.org/jira/browse/HBASE-15396) | *Minor* | **Enhance mapreduce.TableSplit to add encoded region name** To aid troubleshooting of MapReduce job that rely on the HBase provided input format, splits now include the encoded region name they cover. --- * [HBASE-15568](https://issues.apache.org/jira/browse/HBASE-15568) | *Major* | **Procedure V2 - Remove CreateTableHandler in HBase Apache 2.0 release** **WARNING: No release note provided for this change.** --- * [HBASE-15521](https://issues.apache.org/jira/browse/HBASE-15521) | *Major* | **Procedure V2 - RestoreSnapshot and CloneSnapshot** **WARNING: No release note provided for this change.** --- * [HBASE-15538](https://issues.apache.org/jira/browse/HBASE-15538) | *Major* | **Implement secure async protobuf wal writer** Add the following config in hbase-site.xml if you want to use secure protobuf wal writer together with AsyncFSWAL {code} \ \hbase.regionserver.hlog.async.writer.impl\ \org.apache.hadoop.hbase.regionserver.wal.SecureAsyncProtobufLogWriter\ \ \ {code} --- * [HBASE-11393](https://issues.apache.org/jira/browse/HBASE-11393) | *Major* | **Replication TableCfs should be a PB object rather than a string** **WARNING: No release note provided for this change.** --- * [HBASE-15265](https://issues.apache.org/jira/browse/HBASE-15265) | *Major* | **Implement an asynchronous FSHLog** To enable, set the WALProvider as follows: {code} \ \hbase.wal.provider\ \asyncfs\ \ \ {code} To check which provider is active, look for the log line: LOG.info("Instantiating WALProvider of type " + clazz); --- * [HBASE-14256](https://issues.apache.org/jira/browse/HBASE-14256) | *Major* | **Flush task message may be confusing when region is recovered** HBASE-14256 Correct confusing flush task message --- * [HBASE-15212](https://issues.apache.org/jira/browse/HBASE-15212) | *Major* | **RPCServer should enforce max request size** Adds a configuration parameter "hbase.ipc.max.request.size" which defaults to 256MB to protect the server against very large incoming RPC requests. All requests larger than this size will be immediately rejected before allocating any resources (memory allocation, etc). --- * [HBASE-15412](https://issues.apache.org/jira/browse/HBASE-15412) | *Major* | **Add average region size metric** Adds a new metric for called "averageRegionSize" that is emitted as a regionserver metric. Metric description: Average region size over the region server including memstore and storefile sizes --- * [HBASE-15479](https://issues.apache.org/jira/browse/HBASE-15479) | *Major* | **No more garbage or beware of autoboxing** This fix decreases client's memory allocation during writes by more than 50%. --- * [HBASE-15322](https://issues.apache.org/jira/browse/HBASE-15322) | *Critical* | **Operations using Unsafe path broken for platforms not having sun.misc.Unsafe** **WARNING: No release note provided for this change.** --- * [HBASE-12940](https://issues.apache.org/jira/browse/HBASE-12940) | *Major* | **Expose listPeerConfigs and getPeerConfig to the HBase shell** Adds get\_peer\_config and list\_peer\_configs to the hbase shell. --- * [HBASE-15430](https://issues.apache.org/jira/browse/HBASE-15430) | *Critical* | **Failed taking snapshot - Manifest proto-message too large** Failed taking snapshot - Manifest proto-message too large. add property ("snapshot.manifest.size.limit") to change max size of proto-message --- * [HBASE-15323](https://issues.apache.org/jira/browse/HBASE-15323) | *Major* | **Hbase Rest CheckAndDeleteAPi should be able to delete more cells** Fixed an issue in REST server checkAndDelete operation where the remaining cells other than the to-be-checked column are also applied in the Delete operation. Also fixed an issue in RemoteHTable where the Delete object was not passed correctly to the REST server side. --- * [HBASE-15377](https://issues.apache.org/jira/browse/HBASE-15377) | *Major* | **Per-RS Get metric is time based, per-region metric is size-based** Per-region metrics related to Get histograms are changed from being response size based into being latency based similar to the per-regionserver metrics of the same name. Added GetSize histogram metrics at the per-regionserver and per-region level for the response sizes. --- * [HBASE-6721](https://issues.apache.org/jira/browse/HBASE-6721) | *Major* | **RegionServer Group based Assignment** [ADVANCED USERS ONLY] This patch adds a new experimental module hbase-rsgroup. It is an advanced feature for partitioning regionservers into distinctive groups for strict isolation, and should only be used by users who are sophisticated enough to understand the full implications and have a sufficient background in managing HBase clusters. RSGroups can be defined and managed with shell commands or corresponding Java APIs. A server can be added to a group with hostname and port pair, and tables can be moved to this group so that only regionservers in the same rsgroup can host the regions of the table. RegionServers and tables can only belong to 1 group at a time. By default, all tables and regionservers belong to the "default" group. System tables can also be put into a group using the regular APIs. A custom balancer implementation tracks assignments per rsgroup and makes sure to move regions to the relevant regionservers in that group. The group information is stored in a regular HBase table, and a zookeeper-based read-only cache is used at the cluster bootstrap time. To enable, add the following to your hbase-site.xml and restart your Master: \ \hbase.coprocessor.master.classes\ \org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint\ \ \ \hbase.master.loadbalancer.class\ \org.apache.hadoop.hbase.rsgroup.RSGroupBasedLoadBalancer\ \ Then use the shell 'rsgroup' commands to create and manipulate regionserver groups: e.g. to add a group and then add a server to it, do as follows: hbase(main):008:0\> add\_rsgroup 'my\_group' Took 0.5610 seconds This adds a group to the 'hbase:rsgroup' system table. Add a server (hostname + port) to the group using the 'move\_rsgroup\_servers' command as follows: hbase(main):010:0\> move\_rsgroup\_servers 'my\_group',['k.att.net:51129'] --- * [HBASE-15435](https://issues.apache.org/jira/browse/HBASE-15435) | *Major* | **Add WAL (in bytes) written metric** Adds a new metric named "writtenBytes" as a per-regionserver metric. Metric Description: Size (in bytes) of the data written to the WAL. --- * [HBASE-13963](https://issues.apache.org/jira/browse/HBASE-13963) | *Critical* | **avoid leaking jdk.tools** HBase now ensures that the JDK tools jar used during the build process is not exposed to downstream clients as a transitive dependency of hbase-annotations. If you need to have the JDK tools jar in your classpath, you should add a system dependency on it. See the hbase-annotations pom for an example of the necessary pom additions. --- * [HBASE-15271](https://issues.apache.org/jira/browse/HBASE-15271) | *Major* | **Spark Bulk Load: Need to write HFiles to tmp location then rename to protect from Spark Executor Failures** When using the bulk load helper provided by the hbase-spark module, output files will now be written into temporary files and only made available when the executor has successfully completed. Previously, failed executors would leave their files in place in a way that would be picked up by a bulk load command. This caused retried failures to include spurious copies of some cells. --- * [HBASE-15364](https://issues.apache.org/jira/browse/HBASE-15364) | *Major* | **Fix unescaped \< characters in Javadoc** HBASE-15364 Fix unescaped \< and \> characters in Javadoc --- * [HBASE-15243](https://issues.apache.org/jira/browse/HBASE-15243) | *Major* | **Utilize the lowest seek value when all Filters in MUST\_PASS\_ONE FilterList return SEEK\_NEXT\_USING\_HINT** When all filters in a MUST\_PASS\_ONE FilterList return a SEEK\_USING\_NEXT\_HINT code, we return SEEK\_NEXT\_USING\_HINT from the FilterList#filterKeyValue() to utilize the lowest seek value. --- * [HBASE-15354](https://issues.apache.org/jira/browse/HBASE-15354) | *Major* | **Use same criteria for clearing meta cache for all operations** This patch fixes some issues when MetaCache (region location cache) gets unnecessarily dropped on the client. On master branch we now in RegionServerCallable and RegionServerAdminCallable pass the actual exception down to Connection#updateCachedLocation, so we could check there if the exception is "meta-clearing" or not. on branch-1, branch-1.2 and branch 1.3 we now check if the exception is meta-clearing or not in AsyncProcess (this check was there on master, but not on earlier branches) --- * [HBASE-15376](https://issues.apache.org/jira/browse/HBASE-15376) | *Major* | **ScanNext metric is size-based while every other per-operation metric is time based** Removed ScanNext histogram metrics as regionserver level and per-region level metrics since the semantics is not compatible with other similar metrics (size histogram vs latency histogram). Instead, this patch adds ScanTime and ScanSize histogram metrics at the regionserver and per-region level. --- * [HBASE-15338](https://issues.apache.org/jira/browse/HBASE-15338) | *Minor* | **Add a option to disable the data block cache for testing the performance of underlying file system** Add a new config: hbase.block.data.cacheonread, which is a global switch for caching data blocks on read. The default value of this switch is true, and data blocks will be cached on read if the block cache is enabled for the family and cacheBlocks flag is set to be true for get and scan operations. If this global switch is set to false, data blocks won't be cached even if the block cache is enabled for the family and the cacheBlocks flag of Gets or Scans are sets as true. Bloom blocks and index blocks are always be cached if the block cache of the regionserver is enabled. One usage of this switch is for the performance tests for the extreme case that the cache for data blocks all missed and all data blocks are read from underlying file system. --- * [HBASE-15136](https://issues.apache.org/jira/browse/HBASE-15136) | *Critical* | **Explore different queuing behaviors while busy** Previously RPC request scheduler in HBase had 2 modes in could operate in: - simple FIFO - "partial" deadline, where deadline constraints are only imposed on long-running scan requests. This patch adds new type of scheduler to HBase, based on the research around controlled delay (CoDel) algorithm [1], used in networking to combat bufferbloat, as well as some analysis on generalizing it to generic request queues [2]. The purpose of that work is to prevent long standing call queues caused by discrepancy between request rate and available throughput, caused by kernel/disk IO/networking stalls. New RPC scheduler could be enabled by setting hbase.ipc.server.callqueue.type=codel in configuration. Several additional params allow to configure algorithm behavior - hbase.ipc.server.callqueue.codel.target.delay hbase.ipc.server.callqueue.codel.interval hbase.ipc.server.callqueue.codel.lifo.threshold [1] Controlling Queue Delay / A modern AQM is just one piece of the solution to bufferbloat. http://queue.acm.org/detail.cfm?id=2209336 [2] Fail at Scale / Reliability in the face of rapid change. http://queue.acm.org/detail.cfm?id=2839461 --- * [HBASE-15181](https://issues.apache.org/jira/browse/HBASE-15181) | *Major* | **A simple implementation of date based tiered compaction** Date tiered compaction policy is a date-aware store file layout that is beneficial for time-range scans for time-series data. When it performs well: reads for limited time ranges, especially scans of recent data When it doesn't perform as well: random gets without a time range frequent deletes and updates out of order data writes, especially writes with timestamps in the future bulk loads of historical data Recommended configuration: To turn on Date Tiered Compaction (It is not recommended to turn on for the whole cluster because that will put meta table on it too and random get on meta table will be impacted): hbase.hstore.compaction.compaction.policy: org.apache.hadoop.hbase.regionserver.compactions.DateTieredCompactionPolicy Parameters for Date Tiered Compaction: hbase.hstore.compaction.date.tiered.max.storefile.age.millis: Files with max-timestamp smaller than this will no longer be compacted.Default at Long.MAX\_VALUE. hbase.hstore.compaction.date.tiered.base.window.millis: base window size in milliseconds. Default at 6 hours. hbase.hstore.compaction.date.tiered.windows.per.tier: number of windows per tier. Default at 4. hbase.hstore.compaction.date.tiered.incoming.window.min: minimal number of files to compact in the incoming window. Set it to expected number of files in the window to avoid wasteful compaction. Default at 6. hbase.hstore.compaction.date.tiered.window.policy.class: the policy to select store files within the same time window. It doesn’t apply to the incoming window. Default at exploring compaction. This is to avoid wasteful compaction. With tiered compaction all servers in the cluster will promote windows to higher tier at the same time, so using a compaction throttle is recommended: hbase.regionserver.throughput.controller:org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController Because there will most likely be more store files around, we need to adjust the configuration so that flush won't be blocked and compaction will be properly throttled: hbase.hstore.blockingStoreFiles: change to 50 if using all default parameters when turning on date tiered compaction. Use 1.5~2 x projected file count if changing the parameters, Projected file count = windows per tier x tier count + incoming window min + files older than max age For more details, please refer to the design spec at https://docs.google.com/document/d/1\_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG\_uy8/edit# --- * [HBASE-15290](https://issues.apache.org/jira/browse/HBASE-15290) | *Major* | **Hbase Rest CheckAndAPI should save other cells along with compared cell** Fixed an issue in REST server checkAndPut operation where the remaining cells other than the to-be-checked column are also applied in the put operation . --- * [HBASE-15264](https://issues.apache.org/jira/browse/HBASE-15264) | *Major* | **Implement a fan out HDFS OutputStream** Implement a fan-out asynchronous DFSOutputStream for implementing new WAL writer. --- * [HBASE-13259](https://issues.apache.org/jira/browse/HBASE-13259) | *Critical* | **mmap() based BucketCache IOEngine** mmap() based bucket cache can be configured by specifying the property {code} \ \hbase.bucketcache.ioengine\ \ mmap://filepath \ \ {code} This mode of bucket cache is ideal when your file based bucket cache size is lesser than then available RAM. When the cache is bigger than the available RAM then the kernel page faults will make this cache perform lesser particularly in case of scans. --- * [HBASE-11927](https://issues.apache.org/jira/browse/HBASE-11927) | *Major* | **Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)** Checksumming is cpu intensive. HBase computes additional checksums for HFiles (hdfs does checksums too) and stores them inline with file data. During reading, these checksums are verified to ensure data is not corrupted. This patch tries to use Hadoop Native Library for checksum computation, if it’s available, otherwise falls back to standard Java libraries. Instructions to load NHL in HBase can be found here (http://hbase.apache.org/book.html#hadoop.native.lib). Default checksum algorithm has been changed from CRC32 to CRC32C primarily because of two reasons: 1) CRC32C has better error detection properties, and 2) New Intel processors have a dedicated instruction for crc32c computation (SSE4.2 instruction set)\*. This change is fully backward compatible. Also, users should not see any differences except decrease in cpu usage. To keep old settings, set configuration ‘hbase.hstore.checksum.algorithm’ to ‘CRC32’. \* On linux, run 'cat /proc/cpuinfo’ and look for sse4\_2 in list of flags to see if your processor supports SSE4.2. --- * [HBASE-15219](https://issues.apache.org/jira/browse/HBASE-15219) | *Critical* | **Canary tool does not return non-zero exit code when one of regions is in stuck state** A new flag is added for Canary tool: -treatFailureAsError When this flag is specified, read / write failure would result in Canary tool exit code of 5. --- * [HBASE-14949](https://issues.apache.org/jira/browse/HBASE-14949) | *Major* | **Resolve name conflict when splitting if there are duplicated WAL entries** Now we can write duplicated WAL entries into different WAL files. This feature is required by the replication consistency fix and new implementation of WAL writer. --- * [HBASE-15100](https://issues.apache.org/jira/browse/HBASE-15100) | *Blocker* | **Master WALProcs still never clean up** The constructor for o.a.h.hbase.ProcedureInfo was mistakenly labeled IA.Public in previous releases and has now changed to IA.Private. Downstream users are safe to consume ProcedureInfo objects returned from HBase public interfaces, but should not expect to be able to reliably create new instances themselves. The method ProcedureInfo.setNonceKey has been removed, because it should not have been exposed to clients. --- * [HBASE-14355](https://issues.apache.org/jira/browse/HBASE-14355) | *Major* | **Scan different TimeRange for each column family** Adds being able to Scan each column family with a different time range. Adds new methods setColumnFamilyTimeRange and getColumnFamilyTimeRange to Scan. --- * [HBASE-14460](https://issues.apache.org/jira/browse/HBASE-14460) | *Critical* | **[Perf Regression] Merge of MVCC and SequenceId (HBASE-8763) slowed Increments, CheckAndPuts, batch operations** This release note tries to tell the general story. Dive into sub-tasks for more specific release noting. Increments, appends, checkAnd\* have been slow since hbase-.1.0.0. The unification of mvcc and sequence id done by HBASE-8763 was responsible. A ‘fast-path’ workaround was added by HBASE-15031 “Fix merge of MVCC and SequenceID performance regression in branch-1.0 for Increments”. It became available in 1.0.3 and 1.1.3. To enable the fast path, set "hbase.increment.fast.but.narrow.consistency" and then rolling restart. The workaround was for increments only (appends, checkAndPut, etc., were not addressed. See HBASE-15031 release note for more detail). Subsequently, the regression was properly identified and fixed in HBASE-15213 and the fix applied to branch-1.0 and branch-1.1. As it happens, hbase-1.2.0 does not suffer from the performance regression (though the thought was that it did -- and so it got the fast-path patch too via HBASE-15092) nor does the master branch. HBASE-15213 identified that HBASE-12751 (as a side effect) had cured the regression. hbase-1.0.4 (if it is ever released -- 1.0 has been end-of-lifed) and hbase-1.1.4 will have the HBASE-15213 fix. If you are suffering from the increment regression and you are on 1.0.3 or 1.1.3, you can enable the work around to get back your increment performance but you should upgrade. --- * [HBASE-15046](https://issues.apache.org/jira/browse/HBASE-15046) | *Major* | **Perf test doing all mutation steps under row lock** In here we perf tested a realignment of the write pipeline and mvcc handling. Thought was that this work was a predicate for a general fix of HBASE-14460 (turns out, realignment of write path was not needed to fix the increment perf regression). The perf testing here made it so we were able to simplify writing. HBASE-15158 was just committed. This work is done. --- * [HBASE-15158](https://issues.apache.org/jira/browse/HBASE-15158) | *Major* | **Change order in which we do write pipeline operations; do all under row locks!** Changed the write pipeline order; made it more rational, easier-to-reason-about doing all updates to WA, MemStore, and mvcc while read/write rowlock is held where before we'd release after WAL append and then do sync and mvcc. --- * [HBASE-15157](https://issues.apache.org/jira/browse/HBASE-15157) | *Major* | **Add \*PerformanceTest for Append, CheckAnd\*** Add append, increment, checkAndMutate, checkAndPut, and checkAndDelete tests to PerformanceEvaluation tool. Below are excerpts from new usage from PE: .... Command: append Append on each row; clients overlap on keyspace so some concurrent operations checkAndDelete CheckAndDelete on each row; clients overlap on keyspace so some concurrent operations checkAndMutate CheckAndMutate on each row; clients overlap on keyspace so some concurrent operations checkAndPut CheckAndPut on each row; clients overlap on keyspace so some concurrent operations filterScan Run scan test using a filter to find a specific row based on it's value (make sure to use --rows=20) increment Increment on each row; clients overlap on keyspace so some concurrent operations randomRead Run random read test .... Examples: ... To run 10 clients doing increments over ten rows: $ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation --rows=10 --nomapred increment 10 Removed IncrementPerformanceTest. It is not as configurable as the additions made here. --- * [HBASE-15218](https://issues.apache.org/jira/browse/HBASE-15218) | *Blocker* | **On RS crash and replay of WAL, loosing all Tags in Cells** This issue fixes - In case of normal WAL (Not encrypted) we were loosing all cell tags on WAL replay after an RS crash - In case of encrypted WAL we were not even persisting Cell tags in WAL. Tags from all unflushed (to HFile) Cells will get lost even after WAL replay recovery is done. As we use tags for Cell level security, this fixes 2 security issues - Cell level visibility labels security breach . Making a visibility restricted cell global readable - Cell level ACL availability issue. A user who is cell level authorized to read this cell can not read it. It is a data loss for him. --- * [HBASE-15129](https://issues.apache.org/jira/browse/HBASE-15129) | *Major* | **Set default value for hbase.fs.tmp.dir rather than fully depend on hbase-default.xml** Before HBASE-15129, if somehow hbase-default.xml is not on classpath, default values for hbase.fs.tmp.dir and hbase.bulkload.staging.dir are left empty. After HBASE-15129, default values of both properties are set to "/user/\/hbase-staging". --- * [HBASE-14969](https://issues.apache.org/jira/browse/HBASE-14969) | *Major* | **Add throughput controller for flush** Adds means of throttling flush throughput. By default there is no limit; we use NoLimitThroughputController. An alternative controller, PressureAwareFlushThroughputController, allows specifying throughput bounds. A new simple factor, flush pressure, influences throughput. See PressureAwareFlushThroughputController.java class for detail. --- * [HBASE-11425](https://issues.apache.org/jira/browse/HBASE-11425) | *Major* | **Cell/DBB end-to-end on the read-path** For E2E off heaped read path, first of all there should be an off heap backed BucketCache(BC). Configure 'hbase.bucketcache.ioengine' to offheap in hbase-site.xml. Also specify the total capacity of the BC using hbase.bucketcache.size config. Please remember to adjust value of 'HBASE\_OFFHEAPSIZE' in hbase-env.sh as per this capacity. Here-by we specify the max possible off-heap memory allocation for the RS java process. So this should be bigger than the off-heap BC size. Please keep in mind that there is no default for hbase.bucketcache.ioengine which means the BC is turned OFF by default. Next thing to tune is the ByteBuffer pool in the RPC server side. The buffers from this pool will be used to accumulate the cell bytes and create a result cell block to send back to the client side. 'hbase.ipc.server.reservoir.enabled' can be used to turn this pool ON or OFF. By default this pool is ON and available. HBase will create off heap ByteBuffers and pool them. Please make sure not to turn this OFF if you want E2E off heaping in read path. If this pool is turned off, the server will create temp buffers on heap to accumulate the cell bytes and make a result cell block. This can impact the GC on a highly read loaded server. The user can tune this pool with respect to how many buffers are in the pool and what should be the size of each ByteBuffer. Use the config 'hbase.ipc.server.reservoir.initial.buffer.size' to tune each of the buffer sizes. Defaults is 64 KB. When the read pattern is a random row read and each of the rows are smaller in size compared to this 64 KB, try reducing this. When the result size is larger than one ByteBuffer size, the server will try to grab more than one buffer and make a result cell block out of these. When the pool is running out of buffers, the server will end up creating temporary on-heap buffers. The maximum number of ByteBuffers in the pool can be tuned using the config 'hbase.ipc.server.reservoir.initial.max'. Its value defaults to 64 \* region server handlers configured (See the config 'hbase.regionserver.handler.count'). The math is such that by default we consider 2 MB as the result cell block size per read result and each handler will be handling a read. For 2 MB size, we need 32 buffers each of size 64 KB (See default buffer size in pool). So per handler 32 ByteBuffers(BB). We allocate twice this size as the max BBs count such that one handler can be creating the response and handing it to the RPC Responder thread and then handling a new request creating a new response cell block (using pooled buffers). Even if the responder could not send back the first TCP reply immediately, our count should allow that we should still have enough buffers in our pool without having to make temporary buffers on the heap. Again for smaller sized random row reads, tune this max count. There are lazily created buffers and the count is the max count to be pooled. The setting for HBASE\_OFFHEAPSIZE in hbase-env.sh should consider this off heap buffer pool at the RPC side also. We need to config this max off heap size for RS as a bit higher than the sum of this max pool size and the off heap cache size. The TCP layer will also need to create direct bytebuffers for TCP communication. Also the DFS client will need some off-heap to do its workings especially if short-circuit reads are configured. Allocating an extra of 1 - 2 GB for the max direct memory size has worked in tests. If you still see GC issues even after making E2E read path off heap, look for issues in the appropriate buffer pool. Check the below RS log with INFO level: "Pool already reached its max capacity : XXX and no free buffers now. Consider increasing the value for 'hbase.ipc.server.reservoir.initial.max' ?" If you are using co processors and refer the Cells in the read results, DO NOT store reference to these Cells out of the scope of the CP hook methods. Some times the CPs need store info about the cell (Like its row key) for considering in the next CP hook call etc. For such cases, pls clone the required fields of the entire Cell as per the use cases. [ See CellUtil#cloneXXX(Cell) APIs ] --- * [HBASE-15145](https://issues.apache.org/jira/browse/HBASE-15145) | *Major* | **HBCK and Replication should authenticate to zookepeer using server principal** Added a new command line argument: --auth-as-server to enable authenticating to ZooKeeper as the HBase Server principal. This is required for secure clusters for doing replication operations like add\_peer, list\_peers, etc until HBASE-11392 is fixed. This advanced option can also be used for manually fixing secure znodes. Commands can now be invoked like: hbase --auth-as-server shell hbase --auth-as-server zkcli HBCK in secure setup also needs to authenticate to ZK using servers principals.This is turned on by default (no need to pass additional argument). When authenticating as server, HBASE\_SERVER\_JAAS\_OPTS is concatenated to HBASE\_OPTS if defined in hbase-env.sh. Otherwise, HBASE\_REGIONSERVER\_OPTS is concatenated. --- * [HBASE-15125](https://issues.apache.org/jira/browse/HBASE-15125) | *Major* | **HBaseFsck's adoptHdfsOrphan function creates region with wrong end key boundary** **WARNING: No release note provided for this change.** --- * [HBASE-13082](https://issues.apache.org/jira/browse/HBASE-13082) | *Major* | **Coarsen StoreScanner locks to RegionScanner** After this JIRA we will not be doing any scanner reset after compaction during a course of a scan. The files that were compacted will still be continued to be used in the scan process. The compacted files will be archived by a background thread that runs every 2 mins by default only when there are no active scanners on those comapcted files. The above duration can be controlled using the knob 'hbase.hfile.compactions.cleaner.interval'. --- * [HBASE-14865](https://issues.apache.org/jira/browse/HBASE-14865) | *Major* | **Support passing multiple QOPs to SaslClient/Server via hbase.rpc.protection** With this patch, hbase.rpc.protection can now take multiple comma-separate QOP values. Accepted QOP values remain unchanged and are 'authentication', 'integrity', and 'privacy'. Server or client can use this configuration to specify their preference (in decreasing order) while negotiating QOP. This feature can be used to upgrade or downgrade QOP in an online cluster without compromising availability (i.e. taking cluster offline). For e.g. to change qop from A to B, typical steps would be: "A" --\> "B,A" --\> rolling restart --\> "B" --\> rolling restart Sidenote: Based on experimentation, server's choice is given higher preference than client's choice. i.e. if server's choices are "A,B,C" and client's choices are "B,C,A", both A and B are acceptable, but A is chosen. --- * [HBASE-15098](https://issues.apache.org/jira/browse/HBASE-15098) | *Blocker* | **Normalizer switch in configuration is not used** The config parameter, hbase.normalizer.enabled, has been dropped since it is not used in the code base. --- * [HBASE-15111](https://issues.apache.org/jira/browse/HBASE-15111) | *Trivial* | **"hbase version" should write to stdout** The \`hbase version\` command now outputs directly to stdout rather than to a logger. This change allows the version information to be output consistently regardless of logger configuration. Naturally, this also means the command output ignores all logger configuration. Furthermore, the move from loggers to direct output changes the output of the command to omit metadata commonly included in logger ouput such as a timestamp, log level, and logger name. --- * [HBASE-15027](https://issues.apache.org/jira/browse/HBASE-15027) | *Major* | **Refactor the way the CompactedHFileDischarger threads are created** The property 'hbase.hfile.compactions.discharger.interval' has been renamed to 'hbase.hfile.compaction.discharger.interval' that describes the interval after which the compaction discharger chore service should run. The property 'hbase.hfile.compaction.discharger.thread.count' describes the thread count that does the compaction discharge work. The CompactedHFilesDischarger is a chore service now started as part of the RegionServer and this chore service iterates over all the onlineRegions in that RS and uses the RegionServer's executor service to launch a set of threads that does this job of compaction files clean up. --- * [HBASE-14468](https://issues.apache.org/jira/browse/HBASE-14468) | *Major* | **Compaction improvements: FIFO compaction policy** FIFO compaction policy selects only files which have all cells expired. The column family MUST have non-default TTL. Essentially, FIFO compactor does only one job: collects expired store files. Because we do not do any real compaction, we do not use CPU and IO (disk and network), we do not evict hot data from a block cache. The result: improved throughput and latency both write and read. See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style --- * [HBASE-14888](https://issues.apache.org/jira/browse/HBASE-14888) | *Major* | **ClusterSchema: Add Namespace Operations** This patch changes the semantic around namespace create/delete/modify when coprocessor asks that the invocation be by-passed. Previous the by-pass was done silently -- the method would just return with no indication as to whether by-pass route had been taken or not. This patch adds throwing of a BypassCoprocessorException which is thrown if we have been asked to bypass a call. The bypass facility has been in place since hbase 1.0.0 when namespace creation/deletion, etc.., was originally added in HBASE-8408 (HBASE-15071 is about addressing bypass handling in a general way) --- * [HBASE-15018](https://issues.apache.org/jira/browse/HBASE-15018) | *Major* | **Inconsistent way of handling TimeoutException in the rpc client implementations** When using the new AsyncRpcClient introduced in HBase 1.1.0 (HBASE-12684), time outs now result in an IOException wrapped around a CallTimeoutException instead of a bare CallTimeoutException. This change makes the AsyncRpcClient behave the same as the default HBase 1.y RPC client implementation. --- * [HBASE-14796](https://issues.apache.org/jira/browse/HBASE-14796) | *Minor* | **Enhance the Gets in the connector** spark.hbase.bulkGetSize in HBaseSparkConf is for grouping bulkGet, and default value is 1000. --- * [HBASE-14976](https://issues.apache.org/jira/browse/HBASE-14976) | *Minor* | **Add RPC call queues to the web ui** Adds column displaying current aggregated call queues size in region server queues tab UI. --- * [HBASE-14822](https://issues.apache.org/jira/browse/HBASE-14822) | *Major* | **Renewing leases of scanners doesn't work** And 1.1, 1.0, and 0.98. --- * [HBASE-14205](https://issues.apache.org/jira/browse/HBASE-14205) | *Critical* | **RegionCoprocessorHost System.nanoTime() performance bottleneck** **WARNING: No release note provided for this change.** --- * [HBASE-14978](https://issues.apache.org/jira/browse/HBASE-14978) | *Blocker* | **Don't allow Multi to retain too many blocks** Limiting the amount of memory resident for any one request allows the server to handle concurrent requests smoothly. To this end we added the ability to limit the size of responses to a multi request. That worked well however it correctly represent the amount of memory resident. So this issue adds on a an approximation of the number of blocks held for a request. All clients before 1.2.0 will not get this multi request chunking based upon blocks kept. All clients 1.2.0 and after will. --- * [HBASE-14951](https://issues.apache.org/jira/browse/HBASE-14951) | *Minor* | **Make hbase.regionserver.maxlogs obsolete** Rolling WAL events across a cluster can be highly correlated, hence flushing memstores, hence triggering minor compactions, that can be promoted to major ones. These events are highly correlated in time if there is a balanced write-load on the regions in a table. Default value for maximum WAL files (\* hbase.regionserver.maxlogs\*), which controls WAL rolling events - 32 is too small for many modern deployments. Now we calculate this value dynamically (if not defined by user), using the following formula: maxLogs = Math.max( 32, HBASE\_HEAP\_SIZE \* memstoreRatio \* 2/ LogRollSize), where memstoreRatio is \*hbase.regionserver.global.memstore.size\* LogRollSize is maximum WAL file size (default 0.95 \* HDFS block size) We need to make sure that we avoid fully or minimize events when RS has to flush memstores prematurely only because it reached artificial limit of hbase.regionserver.maxlogs, this is why we put this 2 x multiplier in equation, this gives us maximum WAL capacity of 2 x RS memstore-size. Runaway WAL files. The default log rolling period (1h) allows to accumulate up to 2 X Memstore Size data in a WAL. For heap size - 32G and all other default setting, this gives ~ 26GB of data. Under heavy write load, the number of WAL files can increase dramatically. RegionServer LogRoller will be archiving old WALs periodically. User has three options, either override default hbase.regionserver.maxlogs or override default hbase.regionserver.logroll.period (decrease), or both to control runaway WALs. For system with bursty write load, the hbase.regionserver.logroll.period can be decreased to lower value. In this case the maximum number of wal files will be defined by the total size of memstore (unflushed data), not by the hbase.regionserver.maxlogs. But for majority of applications there will be no issues with defaults. Data will be flushed periodically from memstore, the LogRoller will archive old wal files and the system will never reach the new defaults for hbase.regionserver.maxlogs, unless the system is under extreme load for prolonged period of time, but in this case, decreasing hbase.regionserver.logroll.period allows us to control runaway wal files. The following table gives the new default maximum log files values for several different Region Server heap sizes: heap memstore perc maxLogs 1G 40% 32 2G 40% 32 10G 40% 80 20G 40% 160 32G 40% 256 --- * [HBASE-14984](https://issues.apache.org/jira/browse/HBASE-14984) | *Major* | **Allow memcached block cache to set optimze to false** Setting hbase.cache.memcached.spy.optimze to true will allow the spy memcached client to try and optimize for the number of requests outstanding. This can increase throughput but can also increase variance for request times. Setting it to true will help when round trip times are longer. Setting it to false ( the default ) will help ensure a more even distribution of response times. --- * [HBASE-14534](https://issues.apache.org/jira/browse/HBASE-14534) | *Minor* | **Bump yammer/coda/dropwizard metrics dependency version** Updated yammer metrics to version 3.1.2 (now it's been renamed to dropwizard). API has changed quite a bit, consult https://dropwizard.github.io/metrics/3.1.0/manual/core/ for additional information. Note that among other things, in yammer 2.2.0 histograms were by default created in non-biased mode (uniform sampling), while in 3.1.0 histograms created via MetricsRegistry.histogram(...) are by default exponentially decayed. This shouldn't affect end users, though. --- * [HBASE-14960](https://issues.apache.org/jira/browse/HBASE-14960) | *Major* | **Fallback to using default RPCControllerFactory if class cannot be loaded** If the configured RPC controller factory (via hbase.rpc.controllerfactory.class) cannot be found in the classpath or loaded, we fall back to using the default RPC controller factory in HBase. --- * [HBASE-14946](https://issues.apache.org/jira/browse/HBASE-14946) | *Critical* | **Don't allow multi's to over run the max result size.** The HBase region server will now send a chunk of get responses to a client if the total response size is too large. This will only be done for clients 1.2.0 and beyond. Older clients by default will have the old behavior. This patch is for the case where the basic flow is like this: I want to get a single column from lots of rows. So I create a list of gets. Then I send them to table.get(List\). If the regions for that table are spread out then those requests get chunked out to all the region servers. No one regionserver gets too many. However if one region server contains lots of regions for that table then a multi action can contain lots of gets. No single get is too onerous. However the regionserver won't return until every get is complete. So if there are thousands of gets that are sent in one multi then the regionserver can retain lots of data in one thread. --- * [HBASE-14906](https://issues.apache.org/jira/browse/HBASE-14906) | *Major* | **Improvements on FlushLargeStoresPolicy** In HBASE-14906 we use "hbase.hregion.memstore.flush.size/column\_family\_number" as the default threshold for memstore flush instead of the fixed value through "hbase.hregion.percolumnfamilyflush.size.lower.bound" property, which makes the default threshold more flexible to various use case. We also introduce a new property in name of "hbase.hregion.percolumnfamilyflush.size.lower.bound.min" with 16M as the default value to avoid small flush in cases like hundreds of column families. After this change setting "hbase.hregion.percolumnfamilyflush.size.lower.bound" in hbase-site.xml won't take effect anymore, but expert users could still set this property in table descriptor to override the default value just as before --- * [HBASE-14769](https://issues.apache.org/jira/browse/HBASE-14769) | *Major* | **Remove unused functions and duplicate javadocs from HBaseAdmin** - Removes functions from HBaseAdmin which require table name parameter as either byte[] or String. Use their counterparts which take TableName instead. - Removes redundant javadocs from HBaseAdmin as they will be automatically inherited from Admin interface. - HBaseAdmin is marked Audience.private so it should have been straight forward okay to remove the functions. But HBaseTestingUtility, which is marked Audience.public had a public function returning its instance, which moved this decision into gray area. Discussing in the community, it was decided that it would be okay to do so in this particular case. --- * [HBASE-13153](https://issues.apache.org/jira/browse/HBASE-13153) | *Major* | **Bulk Loaded HFile Replication** This enhances the HBase replication to support replication of bulk loaded data. This is configurable, by default it is set to false which means it will not replicate the bulk loaded data to its peer(s). To enable it set "hbase.replication.bulkload.enabled" to true. Following are the additional configurations added for this enhancement, a. hbase.replication.cluster.id - This is manadatory to configure in cluster where replication for bulk loaded data is enabled. A source cluster is uniquely identified by sink cluster using this id. This should be configured in the source cluster configuration file for all the RS. b. hbase.replication.conf.dir - This represents the directory where all the active cluster's file system client configurations are defined in subfolders corresponding to their respective replication cluster id in peer cluster. This should be configured in the peer cluster configuration file for all the RS. Default is HBASE\_CONF\_DIR. c. hbase.replication.source.fs.conf.provider - This represents the class which provides the source cluster file system client configuration to peer cluster. This should be configured in the peer cluster configuration file for all the RS. Default is org.apache.hadoop.hbase.replication.regionserver.DefaultSourceFSConfigurationProvider For example: If source cluster FS client configurations are copied in peer cluster under directory /home/user/dc1/ then hbase.replication.cluster.id should be configured as dc1 and hbase.replication.conf.dir as /home/user Note: a. Any modification to source cluster FS client configuration files in peer cluster side replication configuration directory then it needs to restart all its peer(s) cluster RS with default hbase.replication.source.fs.conf.provider. b. Only 'xml' type files will be loaded by the default hbase.replication.source.fs.conf.provider. As part of this we have made following changes to LoadIncrementalHFiles class which is marked as Public and Stable class, a. Raised the visibility scope of LoadQueueItem class from package private to public. b. Added a new method loadHFileQueue, which loads the queue of LoadQueueItem into the table as per the region keys provided. --- * [HBASE-7171](https://issues.apache.org/jira/browse/HBASE-7171) | *Major* | **Initial web UI for region/memstore/storefiles details** HBASE-7171 adds 2 new pages to the region server Web UI to ease debugging and provide greater insight into the physical data layout. Region names in UI table listing all regions (on the RS status page) are now hyperlinks leading to region detail page which shows some aggregate memstore information (currently just memory used) along with the list of all Store Files (HFiles) in the region. Names of Store Files are also hyperlinks leading to Store File detail page, which currently runs 'hbase hfile' command behind the scene and displays statistics about store file. --- * [HBASE-14655](https://issues.apache.org/jira/browse/HBASE-14655) | *Blocker* | **Narrow the scope of doAs() calls to region observer notifications for compaction** Region observer notifications w.r.t. compaction request are now audited with request user through proper scope of doAs() calls. --- * [HBASE-14631](https://issues.apache.org/jira/browse/HBASE-14631) | *Blocker* | **Region merge request should be audited with request user through proper scope of doAs() calls to region observer notifications** Region observer notifications w.r.t. merge request are now audited with request user through proper scope of doAs() calls. --- * [HBASE-14605](https://issues.apache.org/jira/browse/HBASE-14605) | *Blocker* | **Split fails due to 'No valid credentials' error when SecureBulkLoadEndpoint#start tries to access hdfs** When split is requested by non-super user, split related notifications for Coprocessor are executed using the login of the request user. Previously the notifications were carried out as super user. --- * [HBASE-14926](https://issues.apache.org/jira/browse/HBASE-14926) | *Major* | **Hung ThriftServer; no timeout on read from client; if client crashes, worker thread gets stuck reading** Adds a timeout to server read from clients. Adds new configs hbase.thrift.server.socket.read.timeout for setting read timeout on server socket in milliseconds. Default is 60000; --- * [HBASE-14825](https://issues.apache.org/jira/browse/HBASE-14825) | *Minor* | **HBase Ref Guide corrections of typos/misspellings** Corrections to content of "book.html", which is pulled from various \*.adoc files and \*.xml files. -- corrects typos/misspellings -- corrects incorrectly formatted links --- * [HBASE-14821](https://issues.apache.org/jira/browse/HBASE-14821) | *Major* | **CopyTable should allow overriding more config properties for peer cluster** Configuration properties for org.apache.hadoop.hbase.mapreduce.TableOutputFormat can now be overridden by prefixing the property keys with "hbase.mapred.output.". When the configuration is applied to TableOutputFormat, these entries will be rewritten with the prefix removed -- ie. "hbase.mapred.output.hbase.security.authentication" becomes "hbase.security.authentication". This can be useful when directing output to a peer cluster with different security configuration, for example. --- * [HBASE-14799](https://issues.apache.org/jira/browse/HBASE-14799) | *Critical* | **Commons-collections object deserialization remote command execution vulnerability** This issue resolves a potential security vulnerability. For all versions we update our commons-collections dependency to the release that fixes the reported vulnerability in that library. In 0.98 we additionally disable by default a feature of code carried from 0.94 for backwards compatibility that is not needed. --- * [HBASE-12751](https://issues.apache.org/jira/browse/HBASE-12751) | *Major* | **Allow RowLock to be reader writer** Locks on row are now reader/writer rather than exclusive. Moves sequenceid out of HRegion and into MVCC class; MVCC is now in charge. A WAL append is still stamped in same way (we pass MVCC context in a few places where we previously we did not). MVCC methods cleaned up. Make a bit more sense now. Less of them. Simplifies our update of MemStore/WAL. Now we update memstore AFTER we add to WAL (but before we sync). This fixes possible dataloss when two edits came in with same coordinates; we could order the edits in memstore differently to how they arrived in the WAL. Marked as an incompatible change because it breaks Distributed Log Replay, a feature we'd determined already was unreliable and to be removed. --- * [HBASE-14793](https://issues.apache.org/jira/browse/HBASE-14793) | *Major* | **Allow limiting size of block into L1 block cache.** Very large blocks can fragment the heap and cause bad issues for the garbage collector, especially the G1GC. Now there is a maximum size that a block can be and still stick in the LruBlockCache. That size defaults to 16mb but can be controlled by changing "hbase.lru.max.block.size" --- * [HBASE-14387](https://issues.apache.org/jira/browse/HBASE-14387) | *Major* | **Compaction improvements: Maximum off-peak compaction size** New configuration option: hbase.hstore.compaction.max.size.offpeak - maximum selection size eligible for minor compaction during off peak hours. hbase.hstore.compaction.max.size - this is default maximum if no off-peak hours are defined or if no maximum off-peak maximum size is defined. --- * [HBASE-12822](https://issues.apache.org/jira/browse/HBASE-12822) | *Minor* | **Option for Unloading regions through region\_mover.rb without Acknowledging** Incorporated in HBASE-13014. --- * [HBASE-14700](https://issues.apache.org/jira/browse/HBASE-14700) | *Major* | **Support a "permissive" mode for secure clusters to allow "simple" auth clients** Secure HBase now supports a permissive mode to allow mixed secure and insecure clients. This allows clients to be incrementally migrated over to a secure configuration. To enable clients to continue to connect using SIMPLE authentication when the cluster is configured for security, set "hbase.ipc.server.fallback-to-simple-auth-allowed" equal to "true" in hbase-site.xml. NOTE: This setting should ONLY be used as a temporary measure while converting clients over to secure authentication. It MUST BE DISABLED for secure operation. --- * [HBASE-14257](https://issues.apache.org/jira/browse/HBASE-14257) | *Major* | **Periodic flusher only handles hbase:meta, not other system tables** Memstore periodic flusher used to flush META table every 5 minutes but not any other system tables. This jira extends it to flush all system tables within this time period. --- * [HBASE-14658](https://issues.apache.org/jira/browse/HBASE-14658) | *Major* | **Allow loading a MonkeyFactory by class name** You can specify one of the predefined set of Monkeys when you run Integration Tests by passing the -m\|--monkey arguments on the command line; e.g -m CALM or -m SLOW\_DETERMINISTIC This patch makes it so you can pass the name of a class as the monkey to run: e.g. -m org.example.KingKong --- * [HBASE-14521](https://issues.apache.org/jira/browse/HBASE-14521) | *Major* | **Unify the semantic of hbase.client.retries.number** After this change, hbase.client.reties.number universally means the number of retry which is one less than total tries number, for both non-batch operations like get/scan/increment etc. which uses RpcRetryingCallerImpl#callWithRetries to submit the call or batch operations like put through AsyncProcess#submit. Note that previously this property means total tries number for puts, so please adjust the setting of its value if necessary. Please also be cautious when setting it to zero since retry is necessary for client cache update when region move happens. --- * [HBASE-13819](https://issues.apache.org/jira/browse/HBASE-13819) | *Major* | **Make RPC layer CellBlock buffer a DirectByteBuffer** For master branch(2.0 version), the BoundedByteBufferPool always create Direct (off heap) ByteBuffers and return that. For branch-1(1.3 version), byte default the buffers returned will be off heap. This can be changed to return on heap ByteBuffers by configuring 'hbase.ipc.server.reservoir.direct.buffer' to false. --- * [HBASE-14517](https://issues.apache.org/jira/browse/HBASE-14517) | *Minor* | **Show regionserver's version in master status page** Adds server version to the listing of regionservers on the master home page. if a cluster where the versions deviate, at the bottom of the 'Version' column on the master home page listing of 'Region Servers', you will see a note in red that says something like: 'Total:10 9 nodes with inconsistent version' --- * [HBASE-12911](https://issues.apache.org/jira/browse/HBASE-12911) | *Major* | **Client-side metrics** Introduces collection and reporting of various client-perceived metrics. Metrics are exposed via JMX under "org.apache.hadoop.hbase.client.MetricsConnection". Metrics are scoped according to connection instance, so multiple connection objects (ie, to different clusters) will report their metrics separately. Metrics are disabled by default, must be enabled by configuring "hbase.client.metrics.enable=true". --- * [HBASE-14529](https://issues.apache.org/jira/browse/HBASE-14529) | *Major* | **Respond to SIGHUP to reload config** HBase daemons can now be signaled to reload their config by sending SIGHUP to the java process. Not all config parameters can be reloaded. In order for this new feature to work the hbase-daemon.sh script was changed to use disown rather than nohup. Functionally this shouldn't change anything but the processes will have a different parent when being run from a connected login shell. --- * [HBASE-14502](https://issues.apache.org/jira/browse/HBASE-14502) | *Major* | **Purge use of jmock and remove as dependency** HBASE-14502 Purge use of jmock and remove as dependency --- * [HBASE-14544](https://issues.apache.org/jira/browse/HBASE-14544) | *Major* | **Allow HConnectionImpl to not refresh the dns on errors** By setting hbase.resolve.hostnames.on.failure to false you can reduce the number of dns name resolutions that a client will do. However if machines leave and come back with different ip's the changes will not be noticed by the clients. So only set hbase.resolve.hostnames.on.failure to false if your cluster dns is not changing while clients are connected. --- * [HBASE-14367](https://issues.apache.org/jira/browse/HBASE-14367) | *Major* | **Add normalization support to shell** This patch adds shell support for region normalizer (see HBASE-13103). 3 commands have been added to hbase shell 'tools' command group (modeled on how the balancer works): - 'normalizer\_enabled' checks whether region normalizer is turned on - 'normalizer\_switch' allows user to turn normalizer on and off - 'normalize' runs region normalizer if it's turned on. Also 'alter' command has been extended to allow user to enable/disable region normalization per table (disabled by default). Use it as alter 'testtable', {NORMALIZATION\_MODE =\> 'true'} Here is the help for the normalize command: {code} hbase(main):008:0\> help 'normalize' Trigger region normalizer for all tables which have NORMALIZATION\_MODE flag set. Returns true if normalizer ran successfully, false otherwise. Note that this command has no effect if region normalizer is disabled (make sure it's turned on using 'normalizer\_switch' command). Examples: hbase\> normalize {code} --- * [HBASE-14475](https://issues.apache.org/jira/browse/HBASE-14475) | *Major* | **Region split requests are always audited with "hbase" user rather than request user** Region observer notifications w.r.t. split request are now audited with request user through proper scope of doAs() calls. --- * [HBASE-14230](https://issues.apache.org/jira/browse/HBASE-14230) | *Minor* | **replace reflection in FSHlog with HdfsDataOutputStream#getCurrentBlockReplication()** Remove calling getNumCurrentReplicas on HdfsDataOutputStream via reflection. getNumCurrentReplicas showed up in hadoop 1+ and hadoop 0.2x. In hadoop-2 it was deprecated. --- * [HBASE-14495](https://issues.apache.org/jira/browse/HBASE-14495) | *Major* | **TestHRegion#testFlushCacheWhileScanning goes zombie** The WAL append was changed by HBASE-12751. Every append now sets a latch on an edit. The latch needs to be cleared or else the WAL will hang. The original failures in TestHRegion turned up 'holes' where we were failing to throw the latch if we skipped out early because we were interrupted. Other 'holes' were found where we had mocked up a WAL so the latch would just stay in place. Futher holes were found appending WAL markers... here we were skipping the mvcc completely for a few edits. A clean up of WALUtils made all markers take the same code paths. --- * [HBASE-14280](https://issues.apache.org/jira/browse/HBASE-14280) | *Minor* | **Bulk Upload from HA cluster to remote HA hbase cluster fails** Patch will effectively work with Hadoop version 2.6 or greater with a launch of "internal.nameservices". There will be no change in versions older than 2.6. --- * [HBASE-14334](https://issues.apache.org/jira/browse/HBASE-14334) | *Major* | **Move Memcached block cache in to it's own optional module.** Move external block cache to it's own module. This will reduce dependencies for people who use hbase-server. Currently Memcached is the reference implementation for external block cache. External block caches allow HBase to take advantage of other more complex caches that can live longer than the HBase regionserver process and are not necessarily tied to a single computer life time. However external block caches add in extra operational overhead. --- * [HBASE-14433](https://issues.apache.org/jira/browse/HBASE-14433) | *Major* | **Set down the client executor core thread count from 256 in tests** Tests run with client executors that have core thread count of 4 and a keepalive of 3 seconds. They used to default to 256 core threads and 60 seconds for keepalive. --- * [HBASE-14400](https://issues.apache.org/jira/browse/HBASE-14400) | *Critical* | **Fix HBase RPC protection documentation** To use rpc protection in HBase, set the value of 'hbase.rpc.protection' to: 'authentication' : simple authentication using kerberos 'integrity' : authentication and integrity 'privacy' : authentication and confidentiality Earlier, HBase reference guide erroneously mentioned in some places to set the value to 'auth-conf'. This patch fixes the guide and adds temporary support for erroneously recommended values. --- * [HBASE-14306](https://issues.apache.org/jira/browse/HBASE-14306) | *Major* | **Refine RegionGroupingProvider: fix issues and make it more scalable** In HBASE-14306 we've changed default strategy of RegionGroupingProvider from "identify" to "bounded", so it's required to explicitly set "hbase.wal.regiongrouping.strategy" to "identify" if user still wants to use one WAL per region Please also notice that in the new framework there will be one WAL per group, and the region-group mapping is decided by RegionGroupingStrategy. Accordingly, we've removed BoundedRegionGroupingProvider and added BoundedRegionGroupingStrategy as a replacement. If you already have a customized class for hbase.wal.regiongrouping.strategy, please check the new logic and make updates if necessary. --- * [HBASE-6617](https://issues.apache.org/jira/browse/HBASE-6617) | *Major* | **ReplicationSourceManager should be able to track multiple WAL paths** ReplicationSourceManager now could track multiple wal paths. Notice that although most changes are internal and all metrics names remain the same, signature of below methods in MetricsSource are changed: 1. refreshAgeOfLastShippedOp now requires a String parameter which indicates the wal group id of the reporter 2. setAgeOfLastShippedOp also adds a String parameter for wal group id --- * [HBASE-14314](https://issues.apache.org/jira/browse/HBASE-14314) | *Major* | **Metrics for block cache should take region replicas into account** The following metrics for primary region replica are added: blockCacheHitCountPrimary blockCacheMissCountPrimary blockCacheEvictionCountPrimary --- * [HBASE-14317](https://issues.apache.org/jira/browse/HBASE-14317) | *Blocker* | **Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL** Tighten up WAL-use semantic. 1. If an append or a sync throws an exception, all subsequent attempts at using the log will also throw this same exception. The WAL is now a lame-duck until you roll it. 2. If a successful append, and then we fail to sync the append, this is a fatal exception. The container must abort to replay the WAL logs even though we have told the client that the appends failed. The above rules have been applied laxly up to this; it used to be possible to get a good sync to go in over the top of a failed append. This has been fixed in this patch. Also fixed a hang in the WAL subsystem if a request to pause the write pipeline took on a failed sync. before the roll requests sync got scheduled. TODO: Revisit our WAL system. HBASE-12751 helps rationalize our write pipeline. In particular, it manages sequenceid inside mvcc which should make it so we can purge mechanism that writes empty, unflushed appends just to get the next sequenceid... problematic when WAL goes lame-duck. Lets get it in. TODO: A successful append followed by a failed sync probably only needs us replace the WAL (if we have signalled the client that the appends failed). Bummer is that replicating, these last appends might make it to the sink cluster or get replayed during recovery. HBase should keep its own WAL length? Or sequenceid of last successful sync should be passed when doing recovery and replication? --- * [HBASE-14261](https://issues.apache.org/jira/browse/HBASE-14261) | *Major* | **Enhance Chaos Monkey framework by adding zookeeper and datanode fault injections.** This change augments existing chaos monkey framework with actions for restarting underlying zookeeper quorum and hdfs nodes of distributed hbase cluster. One assumption made while creating zk actions are that zookeper ensemble is an independent external service and won't be managed by hbase cluster. For these actions to work as expected, the following parameters need to be configured appropriately. {code} \ \hbase.it.clustermanager.hadoop.home\ \$HADOOP\_HOME\ \ \ \hbase.it.clustermanager.zookeeper.home\ \$ZOOKEEPER\_HOME\ \ \ \hbase.it.clustermanager.hbase.user\ \hbase\ \ \ \hbase.it.clustermanager.hadoop.hdfs.user\ \hdfs\ \ \ \hbase.it.clustermanager.zookeeper.user\ \zookeeper\ \ {code} The service user related configurations are newly introduced since in prod/test environments each service is managed by different user. Once the above parameters are configured properly, you can start using them as needed. An example usage for invoking these new actions is: {{./hbase org.apache.hadoop.hbase.IntegrationTestAcidGuarantees -m serverAndDependenciesKilling}} --- * [HBASE-14309](https://issues.apache.org/jira/browse/HBASE-14309) | *Major* | **Allow load balancer to operate when there is region in transition by adding force flag** This issue adds boolean parameter, force, to 'balancer' command so that admin can force region balancing even when there is region (other than hbase:meta) in transition - assuming RIT being transient. If hbase:meta is in transition, balancer command returns false. WARNING: For experts only. Forcing a balance may do more damage than repair when assignment is confused Note: enclose the force parameter in double quotes --- * [HBASE-14313](https://issues.apache.org/jira/browse/HBASE-14313) | *Critical* | **After a Connection sees ConnectionClosingException it never recovers** HConnection could get stuck when talking to a host that went down and then returned. This has been fixed by closing the connection in all paths. --- * [HBASE-13339](https://issues.apache.org/jira/browse/HBASE-13339) | *Blocker* | **Update default Hadoop version to latest for master** Master/2.0.0 now builds on the latest stable hadoop by default. --- * [HBASE-14224](https://issues.apache.org/jira/browse/HBASE-14224) | *Critical* | **Fix coprocessor handling of duplicate classes** Prevent Coprocessors being doubly-loaded; a particular coprocessor can only be loaded once. --- * [HBASE-13127](https://issues.apache.org/jira/browse/HBASE-13127) | *Major* | **Add timeouts on all tests so less zombie sightings** Use junit facility to impose timeout on test. Use test category to chose which timeout to apply: small tests timeout after 30 seconds, medium tests after 180 seconds, and large tests after ten minutes. Updated junit version from 4.11 to 4.12. 4.12 has support for feature used here. Add this at the head of your junit4 class to add a category-based timeout: {code} @Rule public final TestRule timeout = CategoryBasedTimeout.builder().withTimeout(this.getClass()). withLookingForStuckThread(true).build(); {code} For example: --- * [HBASE-14148](https://issues.apache.org/jira/browse/HBASE-14148) | *Major* | **Web UI Framable Page** Security fix: Adds protection from clickjacking using X-Frame-Options header. This will prevent use of HBase UI in frames. To disable this feature, set the configuration 'hbase.http.filter.xframeoptions.mode' to 'ALLOW' (default is 'DENY'). --- * [HBASE-10844](https://issues.apache.org/jira/browse/HBASE-10844) | *Major* | **Coprocessor failure during batchmutation leaves the memstore datastructs in an inconsistent state** Promotes an -ea assert to logged FATAL and RS abort when memstore is found to be in an inconsistent state. --- * [HBASE-13966](https://issues.apache.org/jira/browse/HBASE-13966) | *Minor* | **Limit column width in table.jsp** Wraps region, start key, end key columns if too long. --- * [HBASE-13706](https://issues.apache.org/jira/browse/HBASE-13706) | *Minor* | **CoprocessorClassLoader should not exempt Hive classes** Starting from HBase 2.0, CoprocessorClassLoader will not exempt hadoop classes or zookeeper classes. This means that if the custom coprocessor jar contains hadoop or zookeeper packages and classes, they will be loaded by the CoprocessorClassLoader. Only hbase packages and classes are exempted from the CoprocessorClassLoader. They (and their dependencies) are loaded by the parent server class loader. --- * [HBASE-14054](https://issues.apache.org/jira/browse/HBASE-14054) | *Major* | **Acknowledged writes may get lost if regionserver clock is set backwards** In {{checkAndPut}} write path use max(max timestamp for the row, System.currentTimeMillis()) in the, instead of blindly taking System.currentTimeMillis() to ensure that checkAndPut() cannot do writes which is already eclipsed. This is similar to what has been done in HBASE-12449 for increment and append. --- * [HBASE-13985](https://issues.apache.org/jira/browse/HBASE-13985) | *Minor* | **Add configuration to skip validating HFile format when bulk loading** A new config, hbase.loadincremental.validate.hfile , is introduced - default to true When set to false, checking hfile format is skipped during bulkloading. --- * [HBASE-14201](https://issues.apache.org/jira/browse/HBASE-14201) | *Major* | **hbck should not take a lock unless fixing errors** HBCK no longer takes a lock until there are changes to the cluster being made. The old behavior can be achieved by passing the -exclusive flag. --- * [HBASE-14081](https://issues.apache.org/jira/browse/HBASE-14081) | *Minor* | **(outdated) references to SVN/trunk in documentation** HBASE-14081 Remove (outdated) references to SVN/trunk from documentation --- * [HBASE-13865](https://issues.apache.org/jira/browse/HBASE-13865) | *Trivial* | **Increase the default value for hbase.hregion.memstore.block.multipler from 2 to 4 (part 2)** Increase default hbase.hregion.memstore.block.multiplier from 2 to 4 in the code to match the default value in the config files. --- * [HBASE-12295](https://issues.apache.org/jira/browse/HBASE-12295) | *Major* | **Prevent block eviction under us if reads are in progress from the BBs** We try to delay the eviction of the block till the cellblocks are formed at the Rpc layer. A simple reference counting mechanism is introduced when ever a block is accessed from the Bucket cache. Once a scanner completes using a block the reference count is decremented. The eviction of the block happens only when the reference count of that block is 0. We also introduce a concept of ShareableMemory based on the type of blocks we create from the Block cache. The blocks from the ByteBufferIOEngine directly refer to the buckets in offheap and such blocks are marked SHARED memory type. The blocks from LRU, HDFS and file mode of Bucket cache are all marked EXCLUSIVE because these blocks have their own exclusive memory. For the CP case, any cell coming out of SHARED memory block is copied before returning the results, because CPs can use the results as its state so that eviction cannot corrupt the results. --- * [HBASE-11339](https://issues.apache.org/jira/browse/HBASE-11339) | *Major* | **HBase MOB** The Moderate Object Storage (MOB) feature (HBASE-11339[1]) is modified I/O and compaction path that allows individual moderately sized values (100KB-10MB) to be stored in a way that write amplification is reduced when compared to the normal I/O path. MOB is defined in the column family and it is almost isolated with other components, the features and performance cannot be effected in normal columns. For more details on how to use the feature please consult the HBase Reference Guide --- * [HBASE-13954](https://issues.apache.org/jira/browse/HBASE-13954) | *Major* | **Remove HTableInterface#getRowOrBefore related server side code** Removed Table#getRowOrBefore, Region#getClosestRowBefore, Store#getRowKeyAtOrBefore, RemoteHTable#getRowOrBefore apis and Thrift support for getRowOrBefore. Also removed two coprocessor hooks preGetClosestRowBefore and postGetClosestRowBefore. User using this api can instead use reverse scan something like below, {code} Scan scan = new Scan(row); scan.setSmall(true); scan.setCaching(1); scan.setReversed(true); scan.addFamily(family); {code} pass this scan object to the scanner and retrieve the first Result from scanner output. --- * [HBASE-12296](https://issues.apache.org/jira/browse/HBASE-12296) | *Major* | **Filters should work with ByteBufferedCell** Change to support offheaping. Incompatible change for filters ColumnPrefixFilter and MultipleColumnPrefixFilter Changes parameters to filterColumn so takes a Cell rather than a byte []. hbase-client-1.2.7-SNAPSHOT.jar, ColumnPrefixFilter.class package org.apache.hadoop.hbase.filter ColumnPrefixFilter.filterColumn ( byte[ ] buffer, int qualifierOffset, int qualifierLength ) : Filter.ReturnCode org/apache/hadoop/hbase/filter/ColumnPrefixFilter.filterColumn:([BII)Lorg/apache/hadoop/hbase/filter/Filter$ReturnCode; Ditto for filterColumnValue in SingleColumnValueFilter. Takes a Cell instead of byte array. --- * [HBASE-14045](https://issues.apache.org/jira/browse/HBASE-14045) | *Major* | **Bumping thrift version to 0.9.2.** This changes upgrades thrift dependency of HBase to 0.9.2. Though this doesn't break any HBase compatibility promises, it might impact any downstream projects that share thrift dependency with HBase. --- * [HBASE-14027](https://issues.apache.org/jira/browse/HBASE-14027) | *Major* | **Clean up netty dependencies** HBase's convenience binary artifact no longer contains the netty 3.2.4 jar . This jar was not directly used by HBase, but may have been relied on by downstream applications. --- * [HBASE-7782](https://issues.apache.org/jira/browse/HBASE-7782) | *Minor* | **HBaseTestingUtility.truncateTable() not acting like CLI** HBaseTestingUtility now uses the truncate API added in HBASE-8332 so that calls to HBTU.truncateTable will behave like the shell command: effectively dropping the table and recreating a new one with the same split points. Previously, HBTU.truncateTable instead issued deletes for all the data already in the table. If you wish to maintain the same behavior, you should use the newly added HBTU.deleteTableData method. --- * [HBASE-14047](https://issues.apache.org/jira/browse/HBASE-14047) | *Major* | **Cleanup deprecated APIs from Cell class** The following API from Cell (which were deprecated since past few major versions) are removed now. getRow getFamily getQualifier getValue getMvccVersion The above apis can be replaced with their respective CellUtil#cloneXXX (allocates a copy) or Cell#getXXXArray (essentially just returns a pointer) based on the use case. --- * [HBASE-14029](https://issues.apache.org/jira/browse/HBASE-14029) | *Major* | **getting started for standalone still references hadoop-version-specific binary artifacts** HBASE-14029 Correct documentation for Hadoop version specific artifacts --- * [HBASE-13849](https://issues.apache.org/jira/browse/HBASE-13849) | *Major* | **Remove restore and clone snapshot from the WebUI** The HBase master status web page no longer allows operators to clone snapshots nor restore snapshots. --- * [HBASE-13646](https://issues.apache.org/jira/browse/HBASE-13646) | *Major* | **HRegion#execService should not try to build incomplete messages** When RegionServerCoprocessors throw an exception we will no longer attempt to build an incomplete RPC response message. Instead, the response message will be null. --- * [HBASE-13639](https://issues.apache.org/jira/browse/HBASE-13639) | *Major* | **SyncTable - rsync for HBase tables** Tool to sync two tables that tries to send the differences only like rsync. Adds two new MapReduce jobs, SyncTable and HashTable. See usage for these jobs on how to use. See design doc for generally overview: https://docs.google.com/document/d/1-2c9kJEWNrXf5V4q\_wBcoIXfdchN7Pxvxv1IO6PW0-U/edit From comments below, "It can be challenging to run against a table getting live writes, if those writes are updates/overwrites. In general, you can run it against a time range to ignore new writes, but if those writes update existing cells, then the time range scan may or may not see older versions of those cells depending on whether major compaction has happened, which may be different in remote clusters." --- * [HBASE-13895](https://issues.apache.org/jira/browse/HBASE-13895) | *Critical* | **DATALOSS: Region assigned before WAL replay when abort** If the master went to assign a region concurrent with a RegionServer abort, the returned RegionServerAbortedException was being handled as though the region had been cleanly offlined so assign was allowed proceed. If the region was opened in its new location before WAL replay completion, the replayed edits were ignored, worst case, or were later played over the top of edits that had come in since open and so susceptible to overwrite. In either case, DATALOSS. --- * [HBASE-13983](https://issues.apache.org/jira/browse/HBASE-13983) | *Minor* | **Doc how the oddball HTable methods getStartKey, getEndKey, etc. will be removed in 2.0.0** Adds extra doc on getStartKeys, getEndKeys, and getStartEndKeys in HTable explaining that they will be removed in 2.0.0 (these methods did not get the proper full major version deprecation cycle). In this issue, we actually also remove these methods in master/2.0.0 branch. --- * [HBASE-13747](https://issues.apache.org/jira/browse/HBASE-13747) | *Critical* | **Promote Java 8 to "yes" in support matrix** Java 8 is considered supported and tested as of HBase 1.2+ --- * [HBASE-13959](https://issues.apache.org/jira/browse/HBASE-13959) | *Critical* | **Region splitting uses a single thread in most common cases** The performance of region splitting has been improved by using a thread pool to split the store files concurrently. Prior to this change, the store files were always split sequentially in a single thread, so a region with multiple store files ended up taking several seconds. The thread pool is sized dynamically with the aim of getting maximum concurrency, without exceeding the number of cores available for HBase Java process. A lower limit for the thread pool can be explicitly set using the property hbase.regionserver.region.split.threads.max. --- * [HBASE-13930](https://issues.apache.org/jira/browse/HBASE-13930) | *Major* | **Exclude Findbugs packages from shaded jars** Exclude Findbugs packages from shaded jars --- * [HBASE-13214](https://issues.apache.org/jira/browse/HBASE-13214) | *Major* | **Remove deprecated and unused methods from HTable class** **WARNING: No release note provided for this change.** --- * [HBASE-13869](https://issues.apache.org/jira/browse/HBASE-13869) | *Trivial* | **Fix typo in HBase book** Fix typo in HBase book --- * [HBASE-13938](https://issues.apache.org/jira/browse/HBASE-13938) | *Major* | **Deletes done during the region merge transaction may get eclipsed** Use the master's timestamp when sending hbase:meta edits on region merge to ensure proper ordering of new region addition and old region deletes. --- * [HBASE-13898](https://issues.apache.org/jira/browse/HBASE-13898) | *Minor* | **correct additional javadoc failures under java 8** Correct Javadoc generation errors --- * [HBASE-13103](https://issues.apache.org/jira/browse/HBASE-13103) | *Major* | **[ergonomics] add region size balancing as a feature of master** This patch adds optional ability for HMaster to normalize regions in size (disabled by default, change hbase.normalizer.enabled property to true to turn it on). If enabled, HMaster periodically (every 30 minutes by default) monitors tables for which normalization is enabled in table configuration and performs splits/merges as seems appropriate. Users may implement their own normalization strategies by implementing RegionNormalizer interface and configuring it in hbase-site.xml. --- * [HBASE-13900](https://issues.apache.org/jira/browse/HBASE-13900) | *Minor* | **duplicate methods between ProtobufMagic and ProtobufUtil** Use ProtobufMagic methods in ProtobufUtil --- * [HBASE-13843](https://issues.apache.org/jira/browse/HBASE-13843) | *Trivial* | **Fix internal constant text in ReplicationManager.java** In previous versions of HBase, the ReplicationAdmin utility erroneously used the string key "columnFamlyName" when listing replicated column families. It now uses the corrected spelling of "columnFamilyName" (note the added "i"). Downstream code that parsed the replication entries returned from listReplicated will need to be updated to use the new key. Previously compiled code that relied on the static CFNAME member of ReplicationAdmin will need to be recompiled in order to see the updated value. --- * [HBASE-13886](https://issues.apache.org/jira/browse/HBASE-13886) | *Major* | **Return empty value when the mob file is corrupt instead of throwing exceptions** By default the Get/Scan will throw Exception when it is not able to find a mob cell because the mob file is missing/corrupted. This jira adds a facility to continue scan/get and get other cells with mob cell value as empty. Set an attribute MobConstants.EMPTY\_VALUE\_ON\_MOBCELL\_MISS = true in Scan/Get for getting this behaviour --- * [HBASE-13686](https://issues.apache.org/jira/browse/HBASE-13686) | *Major* | **Fail to limit rate in RateLimiter** As per this jira contribution. We now support two kinds of RateLimiter. 1) org.apache.hadoop.hbase.quotas.AverageIntervalRateLimiter : This limiter will refill resources at every TimeUnit/resources interval. Example: For a limiter configured with 10resources/second, then 1resource will be refilled after every 100ms. 2) org.apache.hadoop.hbase.quotas.FixedIntervalRateLimiter: This limiter will refill resources only after a given fixed interval of time. Client can configure anyone of this rate limiter for the cluster by setting the value for the property "hbase.quota.rate.limiter" in the hbase-site.xml. org.apache.hadoop.hbase.quotas.AverageIntervalRateLimiter is the default value. Note: Client needs to restart the cluster for the configuration to take into effect. --- * [HBASE-13816](https://issues.apache.org/jira/browse/HBASE-13816) | *Major* | **Build shaded modules only in release profile** hbase-shaded-client and hbase-shaded-server modules will not build the actual jars unless -Prelease is supplied in mvn. --- * [HBASE-13754](https://issues.apache.org/jira/browse/HBASE-13754) | *Major* | **Allow non KeyValue Cell types also to oswrite** This jira has removed the already deprecated method KeyValue#oswrite(final KeyValue kv, final OutputStream out) --- * [HBASE-13375](https://issues.apache.org/jira/browse/HBASE-13375) | *Major* | **Provide HBase superuser higher priority over other users in the RPC handling** This JIRA modifies the signature of PriorityFunction#getPriority() method to also take request user as a parameter; all RPC requests sent by super users (as determined by cluster configuration) are executed with Admin QoS. --- * [HBASE-5980](https://issues.apache.org/jira/browse/HBASE-5980) | *Minor* | **Scanner responses from RS should include metrics on rows/KVs filtered** Adds scan metrics to the result. In the shell, set the ALL\_METRICS attribute to true on your scan to see dump of metrics after results (see the scan help for examples). If you would prefer to see only a subset of the metrics, the METRICS array can be defined to include the names of only the metrics you care about. --- * [HBASE-13698](https://issues.apache.org/jira/browse/HBASE-13698) | *Major* | **Add RegionLocator methods to Thrift2 proxy.** Added getRegionLocation and getAllRegionLocations to the thrift2 interface. --- * [HBASE-13636](https://issues.apache.org/jira/browse/HBASE-13636) | *Major* | **Remove deprecation for HBASE-4072 (Reading of zoo.cfg)** Purge support for parsing zookeepers zoo.cfg deprecated since hbase-0.96.0 --- * [HBASE-13071](https://issues.apache.org/jira/browse/HBASE-13071) | *Major* | **Hbase Streaming Scan Feature** MOTIVATION A pipelined scan API is introduced for speeding up applications that combine massive data traversal with compute-intensive processing. Traditional HBase scans save network trips through prefetching the data to the client side cache. However, they prefetch synchronously: the fetch request to regionserver is invoked only when the entire cache is consumed. This leads to a stop-and-wait access pattern, in which the client stalls until the next chunk of data is fetched. Applications that do significant processing can benefit from background data prefetching, which eliminates this bottleneck. The pipelined scan implementation overlaps the cache population at the client side with application processing. Namely, it issues a new scan RPC when the iteration retrieves 50% of the cache. If the application processing (that is, the time between invocations of next()) is substantial, the new chunk of data will be available before the previous one is exhausted, and the client will not experience any delay. Ideally, the prefetch and the processing times should be balanced. API AND CONFIGURATION Asynchronous scanning can be configured either globally for all tables and scans, or on per-scan basis via a new Scan class API. Configuration in hbase-site.xml: hbase.client.scanner.async.prefetch, default false: \ \hbase.client.scanner.async.prefetch\ \true\ \ API - Scan#setAsyncPrefetch(boolean) Scan scan = new Scan(); scan.setCaching(1000); scan.setMaxResultSize(BIG\_SIZE); scan.setAsyncPrefetch(true); ... ResultScanner scanner = table.getScanner(scan); IMPLEMENTATION NOTES Pipelined scan is implemented by a new ClientAsyncPrefetchScanner class, which is fully API-compatible with the synchronous ClientSimpleScanner. ClientAsyncPrefetchScanner is not instantiated in case of small (Scan#setSmall) and reversed (Scan#setReversed) scanners. The application is responsible for setting the prefetch size in a way that the prefetch time and the processing times are balanced. Note that due to double buffering, the client side cache can use twice as much memory as the synchronous scanner. Generally, this feature will put more load on the server (higher fetch rate -- which is the whole point). Also, YMMV. --- * [HBASE-13533](https://issues.apache.org/jira/browse/HBASE-13533) | *Trivial* | **section on configuring ~/.m2/settings.xml has no anchor** Correct setting.xml anchor in book --- * [HBASE-13625](https://issues.apache.org/jira/browse/HBASE-13625) | *Major* | **Use HDFS for HFileOutputFormat2 partitioner's path** Introduces a new config hbase.fs.tmp.dir which is a directory in HDFS (or default file system) to use as a staging directory for HFileOutputFormat2. This is also used as the default for hbase.bulkload.staging.dir --- * [HBASE-10800](https://issues.apache.org/jira/browse/HBASE-10800) | *Major* | **Use CellComparator instead of KVComparator** From 2.0 branch onwards KVComparator and its subclasses MetaComparator, RawBytesComparator are all deprecated. All the comparators are moved to CellComparator. MetaCellComparator, a subclass of CellComparator, will be used to compare hbase:meta cells. Previously exposed static instances KeyValue.COMPARATOR, KeyValue.META\_COMPARATOR and KeyValue.RAW\_COMPARATOR are deprecated instead use CellComparator.COMPARATOR and CellComparator.META\_COMPARATOR. Also note that there will be no RawBytesComparator. Where ever we need to compare raw bytes use Bytes.BYTES\_RAWCOMPARATOR. CellComparator will always operate on cells and its components, abstracting the fact that a cell can be backed by a single byte[] as opposed to how KVComparators were working. --- * [HBASE-13333](https://issues.apache.org/jira/browse/HBASE-13333) | *Major* | **Renew Scanner Lease without advancing the RegionScanner** Adds a renewLease call to ClientScanner --- * [HBASE-13564](https://issues.apache.org/jira/browse/HBASE-13564) | *Major* | **Master MBeans are not published** To use the coprocessor-based JMX implementation provided by HBase for Master. Add below property in hbase-site.xml file: \ \hbase.coprocessor.master.classes\ \org.apache.hadoop.hbase.JMXListener\ \ NOTE: DO NOT set \`com.sun.management.jmxremote.port\` for Java VM at the same time. By default, the JMX listens on TCP port 10101 for Master, we can further configure the port using below properties: \ \master.rmi.registry.port\ \61110\ \ \ \master.rmi.connector.port\ \61120\ \ ---- The registry port can be shared with connector port in most cases, so you only need to configure master.rmi.registry.port. However if you want to use SSL communication, the 2 ports must be configured to different values. --- * [HBASE-13537](https://issues.apache.org/jira/browse/HBASE-13537) | *Major* | **Procedure V2 - Change the admin interface for async operations to return Future (incompatible with branch-1.x)** As we made changes to return types in asynchronous methods of Admin API, this change is going to break binary compatibility. The source compatibility is kept intact though. The applications running against this change needs to be recompiled to keep things working. --- * [HBASE-13517](https://issues.apache.org/jira/browse/HBASE-13517) | *Major* | **Publish a client artifact with shaded dependencies** HBase now provides added convenience artifacts that shade most dependencies. These jars hbase-shaded-client and hbase-shaded-server are meant to be used when dependency conflicts can not be solved any other way. The normal jars hbase-client and hbase-server should still be preferred when possible. Do not use hbase-shaded-server or hbase-shaded-client inside of a co-processor as bad things will happen. --- * [HBASE-13149](https://issues.apache.org/jira/browse/HBASE-13149) | *Blocker* | **HBase MR is broken on Hadoop 2.5+ Yarn** In HBase 1.1.0 and above we have upgraded the version of Jackson dependencies (jackson-core-asl, jackson-mapper-asl, jackson-jaxrs and jackson-xc) from 1.8.8 to 1.9.13. This is to follow the upgrade to Jackson 1.9.13 in Hadoop 2.5 and above which causes Jackson class incompatibility for HBase as reported in HBASE-13149. Refer to HADOOP-10104 and YARN-2092 for additional information. Jackson1.9.13 is not completely backward compatible with the prior version 1.8.8 used in HBase. See the Compatibility reports attached in HBASE-13149 and http://svn.codehaus.org/jackson/trunk/release-notes/VERSION for more information. This upgrade does not have direct impact on HBase users and HBase applications in most cases. In the rare case where your HBase application uses Jackson directly AND your application has compatibility issue with Jackson 1.9.13, you can do the following to mitigate the problem. 1. If you are on Hadoop 2.5 or above, and your HBase application involves running Yarn jobs, we recommend you update your application to use Jackson 1.9.13. You may be able to explore classpath isolation options (e.g. HADOOP-10893) or have your own classpath isolation strategy that works for you, but the general recommendation is that you upgrade to Jackson 1.9.13. 2. You may choose to continue using Jackson 1.8.8 and not to use Jackson 1.9.13 in your classpath. You can also choose to replace the Jackson 1.9.13 jars in $HBASE\_HOME/lib with 1.8.8 jars. It can work for you in the following cases: a) You are on a Hadoop version earlier than Hadoop 2.5, or b) You are on Hadoop 2.5 or above, but your HBase application does not involve running Yarn jobs. 3. You may experiment with further isolation using the shaded jars introduced with 1.1.0 via HBASE-13517. Note that it may not be tested or guaranteed that using Jackson 1.8.8 in $HBASE\_HOME/lib will work in future HBase releases. It is recommended that your HBase application matches the Jackson version provided in HBase. In HBase 0.98.x and HBase 1.0.x, we have NOT upgraded the version of Jackson dependencies. If you are on Hadoop 2.5 or above, and your HBase application involves running Yarn jobs, you may encounter Jackson class incomparability issue, as reported in HBASE-13149. You can do the following to mitigate the problem: 1. Use 'hadoop jar' command to run your HBase jobs. 2. Explore classpath isolation options (e.g. HADOOP-10893) or have your own classpath isolation strategy that works for you. 3. You can also choose to replace the Jackson 1.8.8 jars in $HBASE\_HOME/lib with 1.9.13 jars from your Hadoop lib directory. We have tested HBase 0.98 with Jackson 1.9.13. --- * [HBASE-13481](https://issues.apache.org/jira/browse/HBASE-13481) | *Major* | **Master should respect master (old) DNS/bind related configurations** Master now honors configuration options as was before 1.0.0 releases: hbase.master.ipc.address hbase.master.dns.interface hbase.master.dns.nameserver hbase.master.info.bindAddress This jira also adds hbase.master.hostname parameter as an extension to HBASE-12954. --- * [HBASE-13090](https://issues.apache.org/jira/browse/HBASE-13090) | *Major* | **Progress heartbeats for long running scanners** Previously, there was no way to enforce a time limit on scan RPC requests. The server would receive a scan RPC request and take as much time as it needed to accumulate enough results to reach a limit or exhaust the region. The problem with this approach was that, in the case of a very selective scan, the processing of the scan could take too long and cause timeouts client side. With this fix, the server will now enforce a time limit on the execution of scan RPC requests. When a scan RPC request arrives to the server, a time limit is calculated to be half of whichever timeout value is more restictive between the configurations ("hbase.client.scanner.timeout.period" and "hbase.rpc.timeout"). When the time limit is reached, the server will return whatever results it has accumulated up to that point. The results may be empty. To ensure that timeout checks do not occur too often (which would hurt the performance of scans), the configuration "hbase.cells.scanned.per.heartbeat.check" has been introduced. This configuration controls how often System.currentTimeMillis() is called to update the progress towards the time limit. Currently, the default value of this configuration value is 10000. Specifying a smaller value will provide a tighter bound on the time limit, but may hurt scan performance due to the higher frequency of calls to System.currentTimeMillis(). Protobuf models for ScanRequest and ScanResponse have been updated so that heartbeat support can be communicated. Support for heartbeat messages is specified in the request sent to the server via ScanRequest.Builder#setClientHandlesHeartbeats. Only when the server sees that ScanRequest#getClientHandlesHeartbeats() is true will it send heartbeat messages back to the client. A response is marked as a heartbeat message via the boolean flag ScanResponse#getHeartbeatMessage --- * [HBASE-13307](https://issues.apache.org/jira/browse/HBASE-13307) | *Major* | **Making methods under ScannerV2#next inlineable, faster** Made methods smaller under Scanner#next so inlinable and compilable (was getting 'too big to compile' from hotspot). Use of unsafe to parse shorts rather than use BB#getShort... faster, etc. --- * [HBASE-13453](https://issues.apache.org/jira/browse/HBASE-13453) | *Critical* | **Master should not bind to region server ports** In 1.0.x, master by default binds to the region server ports (both rpc and info). This change brings back the usage of old master rpc and info ports in 1.1+ and master (2.0) branches. The motivation for this change is to ease the life of the user so that he does not need to do anything to bring up a RS on the same host and also to make the migration from 0.98 to 1.1 hassle free. However, the users going from 1.0 to 1.1 would see the change in the master ports. --- * [HBASE-13419](https://issues.apache.org/jira/browse/HBASE-13419) | *Major* | **Thrift gateway should propagate text from exception causes.** Compose thrift exception text from the text of the entire cause chain of the underlying exception. --- * [HBASE-13275](https://issues.apache.org/jira/browse/HBASE-13275) | *Major* | **Setting hbase.security.authorization to false does not disable authorization** Prior to this change the configuration setting 'hbase.security.authorization' had no effect if security coprocessor were installed. The act of installing the security coprocessors was assumed to indicate active authorizaton was desired and required. Now it is possible to install the security coprocessors yet have them operate in a passive state with active authorization disabled by setting 'hbase.security.authorization' to false. This can be useful but is probably not what you want. For more information, consult the Security section of the HBase online manual. 'hbase.security.authorization' defaults to true for backwards comptatible behavior. --- * [HBASE-13118](https://issues.apache.org/jira/browse/HBASE-13118) | *Major* | **[PE] Add being able to write many columns** Adds a --columns option to PE so you can write more than one column (changes default qualifier from 'data' to '0'). --- * [HBASE-13270](https://issues.apache.org/jira/browse/HBASE-13270) | *Major* | **Setter for Result#getStats is #addResults; confusing!** Deprecates Result#addResults in favor of Result#setStatistics --- * [HBASE-13362](https://issues.apache.org/jira/browse/HBASE-13362) | *Major* | **Set max result size from client only (like scanner caching).** This introduces a new config option: hbase.server.scanner.max.result.size This setting enforces a maximum result size (in bytes), when reached the server will return the results is has so far. This is a safety setting and should be kept large. The default is inifinite in 0.98 and 1.0.x and 100mb in 1.1 and later. Use hbase.client.scanner.max.result.size instead to enforce practical chunk sizes of a few mb (defaults to 2mb) --- * [HBASE-11544](https://issues.apache.org/jira/browse/HBASE-11544) | *Critical* | **[Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even if it means OOME** Results returned from RPC calls may now be returned as partials When is a Result marked as a partial? When the server must stop the scan because the max size limit has been reached. Means that the LAST Result returned within the ScanResult's Result array may be marked as a partial if the scan's max size limit caused it to stop in the middle of a row. Incompatible Change: The return type of InternalScanners#next and RegionScanners#nextRaw has been changed to NextState from boolean The previous boolean return value can be accessed via NextState#hasMoreValues() Provides more context as to what happened inside the scanner Scan caching default has been changed to Integer.Max\_Value This value works together with the new maxResultSize value from HBASE-12976 (defaults to 2MB) Results returned from server on basis of size rather than number of rows Provides better use of network since row size varies amongst tables Protobuf models have changed for Result, ScanRequest, and ScanResponse to support new partial Results Partial Results should be invisible to application layer unless Scan#setAllowPartials is set Scan#setAllowPartials has been added to allow the application to request to see the partial Results returned by the server rather than have the ClientScanner form the complete Result prior to returning it to the application To disable the use of partial Results on the server, set ScanRequest.Builder#setClientHandlesPartials() to be false in the ScanRequest issued to server Partial Results should allow the server to return large rows in parts rather than accumulate all the cells for that particular row and run out of memory --- * [HBASE-11864](https://issues.apache.org/jira/browse/HBASE-11864) | *Minor* | **Enhance HLogPrettyPrinter to print information from WAL Header** Enhance WALPrettyPrinter to print information (writer classnames and cell codec classname) from WAL Header --- * [HBASE-13289](https://issues.apache.org/jira/browse/HBASE-13289) | *Major* | **typo in splitSuccessCount metric** In hbase 1.0.0, 0.98.10, 0.98.10.1, 0.98.11, and 0.98.12 'splitSuccessCount' was misspelled as 'splitSuccessCounnt' --- * [HBASE-12990](https://issues.apache.org/jira/browse/HBASE-12990) | *Major* | **MetaScanner should be replaced by MetaTableAccessor** Removes MetaScanner. Use MetaTableAccessor instead. --- * [HBASE-13373](https://issues.apache.org/jira/browse/HBASE-13373) | *Major* | **Squash HFileReaderV3 together with HFileReaderV2 and AbstractHFileReader; ditto for Scanners and BlockReader, etc.** Marking as incompatible change. Requires hfiles be major version \>= 2 and \>= minor version 3. Version 3 files are enabled by default in 1.0. 0.98 writes version 2 minor version 3. You cannot go to 1.0 from anything before 0.98. --- * [HBASE-13252](https://issues.apache.org/jira/browse/HBASE-13252) | *Major* | **Get rid of managed connections and connection caching** For a long time, HBase supported 2 types of connections - managed, which were cached and closed automatically when not needed, and unmanaged, where user is responsible for closing the connections by calling #close() on them. The concept of managed connections in HBase (deprecated before) has now been extinguished completely, and now all callers are responsible for managing the lifecycle of connections they acquire. --- * [HBASE-12954](https://issues.apache.org/jira/browse/HBASE-12954) | *Minor* | **Ability impaired using HBase on multihomed hosts** The following config is added by this JIRA: hbase.regionserver.hostname This config is for experts: don't set its value unless you really know what you are doing. When set to a non-empty value, this represents the (external facing) hostname for the underlying server. See https://issues.apache.org/jira/browse/HBASE-12954 for details. Caution: please make sure rolling upgrade succeeds before turning on this feature. --- * [HBASE-13187](https://issues.apache.org/jira/browse/HBASE-13187) | *Critical* | **Add ITBLL that exercises per CF flush** Pass the -D flag generator.multiple.columnfamilies on the command-line if you want the generator to write three column families rather than the default one. When set, we will write the usual 'meta' column family and use it checking linked-list is wholesome but we will also write a 'tiny' column family and a 'big' column family to provoke uneven flushing; good for testing the flush-by-columnfamily feature. --- * [HBASE-13361](https://issues.apache.org/jira/browse/HBASE-13361) | *Minor* | **Remove or undeprecate {get\|set}ScannerCaching in HTable** Removed getScannerCaching and setScannerCaching from Table --- * [HBASE-10728](https://issues.apache.org/jira/browse/HBASE-10728) | *Major* | **get\_counter value is never used.** for 0.98 and 1.0 changes are compatible (due to mitigation by HBASE-13433): \* The "get\_counter" command no longer requires a dummy 4th argument. Downstream users are encouraged to migrate code to not pass this argument because it will result in an error for HBase 1.1+. \* The "incr" command now outputs the current value of the counter to stdout. ex: {code} jruby-1.6.8 :005 \> incr 'counter\_example', 'r1', 'cf1:foo', 10 COUNTER VALUE = 1772 0 row(s) in 0.1180 seconds {code} for 1.1+ changes are incompatible: \* The "get\_counter" command no longer accepts a dummy 4th argument. Downstream users will need to update their code to not pass this argument. ex: {code} jruby-1.6.8 :006 \> get\_counter 'counter\_example', 'r1', 'cf1:foo' COUNTER VALUE = 1772 {code} \* The "incr" command now outputs the current value of the counter to stdout. ex: {code} jruby-1.6.8 :005 \> incr 'counter\_example', 'r1', 'cf1:foo', 10 COUNTER VALUE = 1772 0 row(s) in 0.1180 seconds {code} --- * [HBASE-13170](https://issues.apache.org/jira/browse/HBASE-13170) | *Major* | **Allow block cache to be external** HBase can use memcached as an external block cache. To use this change your config to set hbase.blockcache.use.external to true and hbase.cache.memcached.servers to contain the list of memcached servers to use. --- * [HBASE-13316](https://issues.apache.org/jira/browse/HBASE-13316) | *Minor* | **Reduce the downtime on planned moves of regions** When issuing an Admin.move command the RegionServer that receive the region will try and open the StoreFiles of that region to prime the block cache with index blocks. --- * [HBASE-13298](https://issues.apache.org/jira/browse/HBASE-13298) | *Critical* | **Clarify if Table.{set\|get}WriteBufferSize() is deprecated or not** Deprecate said methods. They were mistakenly included in Table Interface. --- * [HBASE-13248](https://issues.apache.org/jira/browse/HBASE-13248) | *Major* | **Make HConnectionImplementation top-level class.** **WARNING: No release note provided for this change.** --- * [HBASE-13331](https://issues.apache.org/jira/browse/HBASE-13331) | *Blocker* | **Exceptions from DFS client can cause CatalogJanitor to delete referenced files** Fixes an issue where files from a split region that were still referenced were erroneously deleted leading to data loss. --- * [HBASE-13273](https://issues.apache.org/jira/browse/HBASE-13273) | *Major* | **Make Result.EMPTY\_RESULT read-only; currently it can be modified** The Result.EMPTY\_RESULT object is now immutable. In previous releases, the object could be modified by a caller to no longer be empty. Code that relies on this behavior will now receive an UnsupportedOperationException. --- * [HBASE-12867](https://issues.apache.org/jira/browse/HBASE-12867) | *Major* | **Shell does not support custom replication endpoint specification** Adds support to add\_peer in hbase shell to add a custom replication endpoint from HBASE-12254. --- * [HBASE-13198](https://issues.apache.org/jira/browse/HBASE-13198) | *Major* | **Remove HConnectionManager** **WARNING: No release note provided for this change.** --- * [HBASE-12586](https://issues.apache.org/jira/browse/HBASE-12586) | *Major* | **Task 6 & 7 from HBASE-9117, delete all public HTable constructors and delete ConnectionManager#{delete,get}Connection** HTable class has been marked as private API before, and now it's no longer directly instantiable from client code (all public constructors have been removed). All clients should use Connection#getTable() and Connection#getRegionLocator() when appropriate to obtain Table and RegionLocator implementations to work with. --- * [HBASE-13171](https://issues.apache.org/jira/browse/HBASE-13171) | *Minor* | **Change AccessControlClient methods to accept connection object to reduce setup time.** **WARNING: No release note provided for this change.** --- * [HBASE-12706](https://issues.apache.org/jira/browse/HBASE-12706) | *Critical* | **Support multiple port numbers in ZK quorum string** hbase.zookeeper.quorum configuration now allows servers together with client ports consistent with the way Zookeeper java client accepts the quorum string. In this case, using hbase.zookeeper.clientPort is not needed. eg. hbase.zookeeper.quorum=myserver1:2181,myserver2:20000,myserver3:31111 --- * [HBASE-13142](https://issues.apache.org/jira/browse/HBASE-13142) | *Major* | **[PERF] Reuse the IPCUtil#buildCellBlock buffer** Adds buffer reuse sending Cell results. It is on by default and should not need configuration. Improves GC profile and ups throughput. The benefit gets better the larger the row size returned. The buffer reservoir is bounded at a maximum count after which we will start logging at WARN level that the reservoir is running at capacity (returned buffers will be discarded and not added back to the reservoir pool). Default maximum is twice the handler count: i.e. 2 \* hbase.regionserver.handler.count. This should be more than enough. Set the maximum with the new configuration: hbase.ipc.server.reservoir.max The reservoir will not cache buffers in excess of hbase.ipc.server.reservoir.max.buffer.size The default is 10MB. This means that if a row is very large, then we will allocate a buffer of the average size that is currently in the pool and we will then resize it till we can accommodate the return. These resizes are expensive. The resultant buffer will be used and then discarded. To check how the reservoir is doing, enable trace level logging for a few seconds on a regionserver. You can do this from the regionserver UI. See 'Log Level'. Set org.apache.hadoop.hbase.io.BoundedByteBufferPool to TRACE. The BoundedByteBufferPool will spew report to the log. Disable the TRACE level and then check the log. You'll see allocation rate, size of pool, size of buffers in pool, etc. --- * [HBASE-13012](https://issues.apache.org/jira/browse/HBASE-13012) | *Major* | **Add shell commands to trigger the mob file compactor** This adds two new shell commands -- compact\_mob and major\_compact\_mob to the hbase shell. Run compaction on a mob enabled column family or all mob enabled column families within a table Examples: Compact a column family within a table: hbase\> compact\_mob 't1', 'c1' Compact all mob enabled column families hbase\> compact\_mob 't1' Run major compaction on a mob enabled column family or all mob enabled column families within a table Examples: Compact a column family within a table: hbase\> major\_compact\_mob 't1', 'c1' Compact all mob enabled column families within a table hbase\> major\_compact\_mob 't1' --- * [HBASE-12869](https://issues.apache.org/jira/browse/HBASE-12869) | *Major* | **Add a REST API implementation of the ClusterManager interface** Adds an implementation of ClusterManager to control REST API-managed HBase clusters. --- * [HBASE-13047](https://issues.apache.org/jira/browse/HBASE-13047) | *Trivial* | **Add "HBase Configuration" link missing on the table details pages** Add a '/conf' link to UI --- * [HBASE-13044](https://issues.apache.org/jira/browse/HBASE-13044) | *Minor* | **Configuration option for disabling coprocessor loading** This change adds two new configuration options: - "hbase.coprocessor.enabled" controls globally if any coprocessors will be loaded. Set to "false" to disable. Defaults to "true" for compatibility with previous releases. - "hbase.coprocessor.user.enabled" controls if any user (aka table) coprocessors will be loaded. Set to "false" to disable. Defaults to "true" for compatibility with previous releases. --- * [HBASE-12961](https://issues.apache.org/jira/browse/HBASE-12961) | *Minor* | **Negative values in read and write region server metrics** Change read and write request count in ServerLoad from int to long --- * [HBASE-7332](https://issues.apache.org/jira/browse/HBASE-7332) | *Minor* | **[webui] HMaster webui should display the number of regions a table has.** Adds counts for various regions states to the table listing on main page. See attached screenshot. --- * [HBASE-8329](https://issues.apache.org/jira/browse/HBASE-8329) | *Major* | **Limit compaction speed** Adds compaction throughput limit mechanism(the word "throttle" is already used when choosing compaction thread pool, so use a different word here to avoid ambiguity). Default is org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController, will limit throughput as follow: 1. In off peak hours, use a fixed limitation "hbase.hstore.compaction.throughput.offpeak" (default is Long.MAX\_VALUE which means no limitation). 2. In normal hours, the limitation is tuned between "hbase.hstore.compaction.throughput.lower.bound"(default 10MB/sec) and "hbase.hstore.compaction.throughput.higher.bound"(default 20MB/sec), using the formula "lower + (higer - lower) \* param" where param is in range [0.0, 1.0] and calculate based on store files count on this regionserver. 3. If some stores have too many store files(storefilesCount \> blockingFileCount), then there is no limitation no matter peak or off peak. You can set "hbase.regionserver.throughput.controller" to org.apache.hadoop.hbase.regionserver.throttle.NoLimitThroughputController to disable throughput controlling. And we have implemented ConfigurationObserver which means you can change all configurations above and do not need to restart cluster. The throttle is on by default in hbase-2.0.0. There is no limit in hbase-1.x. --- * [HBASE-6778](https://issues.apache.org/jira/browse/HBASE-6778) | *Major* | **Deprecate Chore; its a thread per task when we should have one thread to do all tasks** Corresponding usages for new ScheduledChore vs. Deprecated Chore: Chore.interrupt() -\> ScheduledChore.cancel(mayInterruptWhileRunning = true) Threads.setDaemonThreadRunning(Chore) -\> ChoreService.scheduleChore(ScheduledChore) Chore.isAlive -\> ScheduledChore.isScheduled() Chore.getSleeper().skipSleepCycle() -\> ScheduledChore.triggerNow() --- * [HBASE-11574](https://issues.apache.org/jira/browse/HBASE-11574) | *Major* | **hbase:meta's regions can be replicated** On the server side, set hbase.meta.replica.count to the number of replicas of meta that you want to have in the cluster (defaults to 1). hbase.regionserver. meta.storefile.refresh.period should be set to a non-zero number in milliseconds - something like 30000 (defaults to 0). On the client/user side, set hbase.meta.replicas.use to true. --- * [HBASE-12808](https://issues.apache.org/jira/browse/HBASE-12808) | *Major* | **Use Java API Compliance Checker for binary/source compatibility** Adds a dev-support/check\_compatibility.sh script for comparing versions. Run the script to see usage. --- * [HBASE-12684](https://issues.apache.org/jira/browse/HBASE-12684) | *Major* | **Add new AsyncRpcClient** Retrofit a new, netty-based rpc transport on the client. This client is slightly slower if little contention given the extra tier or so that netty adds and that we block on a Future waiting on the call to finish. This client opens the way for HBase having a native Async API. This client is on by default in master branch (2.0 hbase). It is off in branch-1.0 (hbase-1.1.x). To enable it, set "hbase.rpc.client.impl" to "org.apache.hadoop.hbase.ipc.AsyncRpcClient" --- * [HBASE-8410](https://issues.apache.org/jira/browse/HBASE-8410) | *Major* | **Basic quota support for namespaces** Namespace auditor provides basic quota support for namespaces in terms of number of tables and number of regions. In order to use namespace quotas, quota support must be enabled by setting "hbase.quota.enabled" property to true in hbase-site.xml file. The users can add quota information to namespace, while creating new namespaces or by altering existing ones. Examples: 1. create\_namespace 'ns1', {'hbase.namespace.quota.maxregions'=\>'10'} 2. create\_namespace 'ns2', {'hbase.namespace.quota.maxtables'=\>'2','hbase.namespace.quota.maxregions'=\>'5'} 3. alter\_namespace 'ns3', {METHOD =\> 'set', 'hbase.namespace.quota.maxtables'=\>'5','hbase.namespace.quota.maxregions'=\>'25'} The quotas can be modified/added to namespace at any point of time. To remove quotas, the following command can be used: alter\_namespace 'ns3', {METHOD =\> 'unset', NAME =\> 'hbase.namespace.quota.maxtables'} alter\_namespace 'ns3', {METHOD =\> 'unset', NAME =\> 'hbase.namespace.quota.maxregions'} --- * [HBASE-12902](https://issues.apache.org/jira/browse/HBASE-12902) | *Major* | **Post-asciidoc conversion fix-ups** Pushed to master. Shout if there are any issues. --- * [HBASE-12848](https://issues.apache.org/jira/browse/HBASE-12848) | *Major* | **Utilize Flash storage for WAL** For users on a version of Hadoop that supports tiered storage policies (i.e. Apache Hadoop 2.6.0+), HBase now allows users to opt-in to having the write ahead log placed on the SSD tier. Users on earlier versions of Hadoop will be unable to take advantage of this feature. Use of tiered storage is controlled by a new RegionServer config, hbase.wal.storage.policy. It defaults to the value 'NONE', which will rely on HDFS defaults for a policy decision. User can specify ONE\_SSD or ALL\_SSD as the value: ONE\_SSD: place only one replica of WAL files in SSD and the remaining in default storage ALL\_SSD: all replica for WAL files are placed on SSD See [the HDFS docs on storage policy\|http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html] --- * [HBASE-11144](https://issues.apache.org/jira/browse/HBASE-11144) | *Major* | **Filter to support scanning multiple row key ranges** MultiRowRangeFilter is a filter to support scanning multiple row key ranges. If the number of the ranges is small, using multiple scans can also do the same thing and can work well. But when the number of ranges are quite big (e.g. millions), use the MultiRowRangeFilter will be nice. In this filter, the ranges will be sorted and merged, so users do not have to take care of ranges are not continuous. And if users are using something like rest, thrift or pig to access the data the filter might be the practical solution. --- * [HBASE-12268](https://issues.apache.org/jira/browse/HBASE-12268) | *Major* | **Add support for Scan.setRowPrefixFilter to shell** Added new option, ROWPREFIXFILTER, to the scan command in the HBase shell to easily scan for a specific row prefix. --- * [HBASE-12775](https://issues.apache.org/jira/browse/HBASE-12775) | *Major* | **CompressionTest ate my HFile (sigh!)** CompressionTest will now abort when the target path exists. --- * [HBASE-12695](https://issues.apache.org/jira/browse/HBASE-12695) | *Critical* | **JDK 1.8 compilation broken** Use the -Pjavac maven profile in order to compile HBase using the compiler provided by the JDK instead of the default error-prone compiler plugin. This is useful for now if you are building HBase with JDK 1.8 or a JDK that doesn't support error-prone. --- * [HBASE-10201](https://issues.apache.org/jira/browse/HBASE-10201) | *Major* | **Port 'Make flush decisions per column family' to trunk** Adds new flushing policy mechanism. Default, org.apache.hadoop.hbase.regionserver.FlushLargeStoresPolicy, will try to avoid flushing out the small column families in a region, those whose memstores are \< hbase.hregion.percolumnfamilyflush.size.lower.bound. To restore the old behavior of flushes writing out all column families, set hbase.regionserver.flush.policy to org.apache.hadoop.hbase.regionserver.FlushAllStoresPolicy either in hbase-default.xml or on a per-table basis by setting the policy to use with HTableDescriptor.getFlushPolicyClassName(). --- * [HBASE-12559](https://issues.apache.org/jira/browse/HBASE-12559) | *Major* | **Provide LoadBalancer with online configuration capability** updateConfiguration(ServerName server) method of Admin now updates config for HMaster as well. Specifically, config update would be taken by load balancer. --- * [HBASE-10378](https://issues.apache.org/jira/browse/HBASE-10378) | *Major* | **Divide HLog interface into User and Implementor specific interfaces** HBase internals for the write ahead log have been refactored. Advanced users of HBase should be aware of the following changes. Public Audience - The Admin API for asking a region server to roll WAL files has changed from a synchronous command that returns a set of regions the WAL implementation would like flushed into an asynchronous command that returns nothing. Older clients relying on the former behavior will still be able to interact with newer servers, but the response body will always contain an empty list of regions to flush. - The shell command "hlog\_roll" has been deprecated. Operators should use the "wal\_roll" command instead. This command is subject to the changes described above for the Admin API to roll WAL files. - The command for analyzing write ahead logs has been renamed from 'hlog' to 'wal'. The old usage is deprecated and will be removed in a future version. - Some utility methods in the HBaseTesetingUtility related to testing write-ahead-logs were changed in incompatible ways. No functionality has been removed, but method names and arguments have changed. See the HBaseTestingUtility javadoc for details. - The WALPlayer utility has deprecated the configuration keys used for advanced customization. Users should switch to the updated configuration keys. See the usage information on the WALPlayer tool for details. - The HLogInputFormat utility class for processing logs with MapReduce has been deprecated and will be removed in a future version. Users should switch to the WALInputFormat. - The labeling of server metrics on the region server status pages changed. Previously, the number of backing files for the write ahead log was labeled 'Num. HLog Files'. If you wish to see this statistic now, please look for the label 'Num. WAL Files.' If you rely on JMX for these metrics, their location has not changed. LimitedPrivate(COPROC) Audience, LimitedPrivate(PHOENIX) - The RegionObserver API has been updated. The changes are both binary and source backwards compatible for coprocessors that use the BaseRegionObserver class. For those that implement RegionObserver directly the changes are binary backwards compatible. Depending on the internals of future HBase versions, coprocessors using the deprecated API may not see all WAL related events. Users are strongly encouraged to update their use of the API; see the RegionObserver javadoc for details. - Classes related to reading WAL entries (ReaderBase, ProtobufLogReader, SequenceFileLogReader) have changed in a backwards incompatible way. Users who referenced HLog.Reader directly or HLog.Entry will have to update. These changes do not impact compatibility with extant wal files. - The WALObserver API has been updated. The changes are both binary and source backwards compatible for coprocessors that use the BaseWALObserver class. For those that implement WALObserver directly the changes are binary backwards compatible. Depending on the internals of future HBase versions, coprocessors using the deprecated API may not see all WAL related events. Users are strongly encouraged to update their use of the API; see the WALObserver javadoc for details. - The WALCoprocessorEnvironment has changed in a backwards incompatible way. WALObserver coprocessors that relied on retrieving an object representing the write ahead log instance will have to be updated. LimitedPrivate(REPLICATION) Audience - The WALEntryFilter API has changed in a backwards incompatible way. Implementers will have to be updated. - The ReplicationEndpoint.ReplicateContext API has changed in a backwards incompatible way. Implementers who use this interface will have to be updated. These changes do not impact wire compatibility for replicating between clusters. - The HLogKey API is deprecated in favor of the WALKey API. Additionally, the HLogKey API has changed in a backwards incompatible way by changing from implementing WriteableComparable\ to implementing Writeable and Comparable\. --- * [HBASE-11683](https://issues.apache.org/jira/browse/HBASE-11683) | *Major* | **Metrics for MOB** Adds new mob related metrics: mobCompactedIntoMobCellsCount mobCompactedIntoMobCellsSize mobCompactedFromMobCellsCount mobCompactedFromMobCellsSize mobFlushCount mobFlushedCellsCount mobFlushedCellsSize mobScanCellsCount mobScanCellsSize mobFileCacheAccessCount mobFileCacheMissCount mobFileCacheHitPercent mobFileCacheEvictedCount mobFileCacheCount --- * [HBASE-11912](https://issues.apache.org/jira/browse/HBASE-11912) | *Major* | **Catch some bad practices at compile time with error-prone** Errors from error-prone will fail the build in the compile phase. Warnings look like Javac warnings and are counted as such by test-patch etc --- * [HBASE-12220](https://issues.apache.org/jira/browse/HBASE-12220) | *Major* | **Add hedgedReads and hedgedReadWins metrics** Adds metrics hedgedReads and hedgedReadWins counts. --- * [HBASE-6290](https://issues.apache.org/jira/browse/HBASE-6290) | *Minor* | **Add a function a mark a server as dead and start the recovery the process** Adds a script to mark a server as dead. Usage: considerAsDead.sh --hostname serverName --- * [HBASE-12111](https://issues.apache.org/jira/browse/HBASE-12111) | *Major* | **Remove deprecated APIs from Mutation(s)** Removed the below from hbase-2 (were deprecated on release of hbase-1.0.0) Mutation setWriteToWAL(boolean) boolean getWriteToWAL() Mutation setFamilyMap(NavigableMap\\>) NavigableMap\\> getFamilyMap() --- * [HBASE-12084](https://issues.apache.org/jira/browse/HBASE-12084) | *Major* | **Remove deprecated APIs from Result** The below KeyValue based APIs are removed from Result KeyValue[] raw() List\ list() List\ getColumn(byte [] family, byte [] qualifier) KeyValue getColumnLatest(byte [] family, byte [] qualifier) KeyValue getColumnLatest(byte [] family, int foffset, int flength, byte [] qualifier, int qoffset, int qlength) They are replaced with Cell[] rawCells() List\ listCells() List\ getColumnCells(byte [] family, byte [] qualifier) Cell getColumnLatestCell(byte [] family, byte [] qualifier) Cell getColumnLatestCell(byte [] family, int foffset, int flength, byte [] qualifier, int qoffset, int qlength) respectively Also the constructors which were taking KeyValues also removed Result(KeyValue [] cells) Result(List\ kvs) --- * [HBASE-12048](https://issues.apache.org/jira/browse/HBASE-12048) | *Major* | **Remove deprecated APIs from Filter** The following APIs are removed from Filter KeyValue transform(KeyValue) KeyValue getNextKeyHint(KeyValue) and replaced with Cell transformCell(Cell) Cell getNextCellHint(Cell) respectively. If a custom Filter implementation have overridden any of these methods, we will no longer call them. User has to change the custom Filter to override cell based methods as shown above --- * [HBASE-7767](https://issues.apache.org/jira/browse/HBASE-7767) | *Major* | **Get rid of ZKTable, and table enable/disable state in ZK** Keeps table enabled/disabled state in HDFS rather than up in ZooKeeper. Auto-migrates any existing zk state. --- * [HBASE-11911](https://issues.apache.org/jira/browse/HBASE-11911) | *Major* | **Break up tests into more fine grained categories** Adds new test categories besides the class smalltests, mediumtests, and largetests. Adds: ClientTests CoprocessorTests FilterTests FlakeyTests IOTests MapReduceTests MasterTests MiscTests RegionServerTests ReplicationTests RestTests SecurityTests VerySlowMapReduceTests VerySlowRegionServerTests See description for examples on how to use them. --- * [HBASE-11658](https://issues.apache.org/jira/browse/HBASE-11658) | *Major* | **Piped commands to hbase shell should return non-zero if shell command failed.** Adds a noninteractive mode (-n or --noninteractive) to the hbase shell that exits with a non-zero error code on failed or invalid shell command executions, and exits with a zero error code upon successful execution. --- * [HBASE-11640](https://issues.apache.org/jira/browse/HBASE-11640) | *Major* | **Add syntax highlighting support to HBase Ref Guide programlistings** This got committed, so I guess it is safe to resolve it? --- * [HBASE-11606](https://issues.apache.org/jira/browse/HBASE-11606) | *Minor* | **Enable ZK-less region assignment by default** By default, we don't use ZK for region assignment now. To fall back to the old way, you can set hbase.assignment.usezk to true. --- * [HBASE-3135](https://issues.apache.org/jira/browse/HBASE-3135) | *Major* | **Make our MR jobs implement Tool and use ToolRunner so can do -D trickery, etc.** All MR jobs implement Tool Interface, http://hadoop.apache.org/docs/current/api/org/apache/hadoop/util/Tool.html, so now you can pass properties on command line with the -D flag, etc. --- * [HBASE-11556](https://issues.apache.org/jira/browse/HBASE-11556) | *Major* | **Move HTablePool to hbase-thrift module.** HTablePool was deprecated in 0.98.1 but was still present and usable by apps built against versions before HBase 2.0. It has been moved and is not intended to be used by user applications, and is now an internal part of the thrift2 proxy server only. --- * [HBASE-11548](https://issues.apache.org/jira/browse/HBASE-11548) | *Trivial* | **[PE] Add 'cycling' test N times and unit tests for size/zipf/valueSize calculations** Adds --cycles=N argument. --- * [HBASE-11344](https://issues.apache.org/jira/browse/HBASE-11344) | *Major* | **Hide row keys and such from the web UIs** Configure "hbase.display.keys" to false (default: true) in the master/regionservers if the row-keys should be hidden in the webUIs (like in the webUI for table details). --- * [HBASE-6580](https://issues.apache.org/jira/browse/HBASE-6580) | *Major* | **Deprecate HTablePool in favor of HConnection.getTable(...)** This issue introduces a few new APIs: \* HConnectionManager: {code} public static HConnection createConnection(Configuration conf) public static HConnection createConnection(Configuration conf, ExecutorService pool) {code} \* HConnection: {code} public HTableInterface getTable(String tableName) throws IOException public HTableInterface getTable(byte[] tableName) throws IOException public HTableInterface getTable(String tableName, ExecutorService pool) throws IOException public HTableInterface getTable(byte[] tableName, ExecutorService pool) throws IOException {code} By default HConnectionImplementation will create an ExecutorService when needed. The ExecutorService can optionally passed be passed in. HTableInterfaces are retrieved from the HConnection. By default the HConnection's ExecutorService is used, but optionally that can be overridden for each HTable. --- * [HBASE-8450](https://issues.apache.org/jira/browse/HBASE-8450) | *Critical* | **Update hbase-default.xml and general recommendations to better suit current hw, h2, experience, etc.** Changed defaults: + max versions now 1 instead of 3 + row blooms on by default (except on .META. table) + handlers 30 instead of 10 + upped memstore lower limit from .35 to .38 + zookeeper timeout default is 90seconds instead of 180 + client pause is 100ms instead of 1000ms + retries are now 20 instead of 10 (so overall we still wait same amount of time) + bulkload retries is 10 instead of infinite + major compactions are now once a week instead of once every 24 hours; they are staggered so all regionservers do not start compacting at the same time + blockingstorefiles is 10 instead of 7 + block cache is 0.4 instead of 0.25 + Previous, default for hbase.rootdir was /tmp/hbase-${user.name}. Now it is ${java.io.tmpdir}/hbase-${user.name} which is usually the same location but may not be (on macos, it points to /var/tmp....). --- * [HBASE-4072](https://issues.apache.org/jira/browse/HBASE-4072) | *Major* | **Deprecate/disable and remove support for reading ZooKeeper zoo.cfg files from the classpath** The Apache ZooKeeper config file zoo.cfg will no longer be read when instantiating a HBaseConfiguration object, as it causes various inconsistency issues. Instead, users have to specify all HBase-relevant ZooKeeper properties in the hbase-site.xml using the various "hbase.zookeeper" prefixed properties. For example, specify "hbase.zookeeper.quorum" to provide a ZK quorum server list. To enable zoo.cfg reading, for which support may be removed in a future release, set the property "hbase.config.read.zookeeper.config" to true in the hbase-site.xml at the client and servers like so: \ \hbase.config.read.zookeeper.config\ \true\ \ Set to true to allow HBaseConfiguration to read the zoo.cfg file for ZooKeeper properties. Switching this to true is not recommended, since the functionality of reading ZK properties from a zoo.cfg file has been deprecated. \ \