We introduced EnvironmentEdgeManager as a way to inject alternate clocks
for unit tests. In order for this to be effective, all callers that would
otherwise use System.currentTimeMillis() must call
EnvironmentEdgeManager.currentTime() instead, except the implementers of
EnvironmentEdge.
Signed-off-by: Bharath Vissapragada <bharathv@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
This bug was exposed by the test from HBASE-25924. Since this wal
implementations close the wal asynchronously, replication can potentially
miss the trailer bytes. (see jira comment for detailed analysis).
While this is not a correctness problem (since trailer does not have any entry data),
it erroneously bumps a metric that is used to track skipped bytes in WAL resulting
in false alarms which is something we should avoid.
Reviewed-by: Rushabh Shah <rushabh.shah@salesforce.com>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by Anoop Sam John <anoopsamjohn@apache.org>
Undo asserts that LZ4 and SNAPPY fails if their native libs are NOT
loaded; as of hadoop 3.3.1, LZ4 and SNAPPY can work w/o native libs.
Signed-off-by: Duo Zhang <zhangduo@apache.org>
WAL storage can be expensive, especially if the cell values
represented in the edits are large, consisting of blobs or
significant lengths of text. Such WALs might need to be kept around
for a fairly long time to satisfy replication constraints on a space
limited (or space-contended) filesystem.
We have a custom dictionary compression scheme for cell metadata that
is engaged when WAL compression is enabled in site configuration.
This is fine for that application, where we can expect the universe
of values and their lengths in the custom dictionaries to be
constrained. For arbitrary cell values it is better to use one of the
available compression codecs, which are suitable for arbitrary albeit
compressible data.
Signed-off-by: Bharath Vissapragada <bharathv@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
* HBASE-25875 RegionServer failed to start with IllegalThreadStateException due to race condition in AuthenticationTokenSecretManager's start & retrievePassword method
Signed-off-by: stack <stack@apache.com>
* HBASE-25867 Extra doc around ITBLL
Minor edits to a few log messages.
Explain how the '-c' option works when passed to ChaosMonkeyRunner.
Some added notes on ITBLL.
Fix whacky 'R' and 'Not r' thing in Master (shows when you run ITBLL).
In HRS, report hostname and port when it checks in (was debugging issue
where Master and HRS had different notions of its hostname).
Spare a dirty FNFException on startup if base dir not yet in place.
* Address Review by Sean
Signed-off-by: Sean Busbey <busbey@apache.org>
Revert "HBASE-25032 Wait for region server to become online before adding it to online servers in Master (#2769)"
This reverts commit 1e4639d2eb.
Conflicts:
hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
In CatalogJanitor we schedule GCRegionProcedure to clean up both
filesystem and in-memory state after a split, and
GCMultipleMergedRegionsProcedure to do the same for merges. Both of these
procedures clean up in-memory state, but CatalogJanitor also does this
redundantly just after scheduling the procedures. The cleanup should be
done in only one place. Presumably we are using the procedures to do it in
a principled way. Remove the redundancy in CatalogJanitor and fix any
follow on issues, like test failures.
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Processing of the RS report happens asynchronously from other activities
which can mutate region state. For example, a split procedure may already
be running. A split procedure cannot succeed if the parent region is no
longer open, so we can ignore it in that case.
Note that submitting more than one split procedure for a given region is
harmless -- the split is fenced in the procedure handling -- but it would
be noisy in the logs. Only one procedure can succeed. The other
procedure(s) would abort during initialization and report failure with
WARN level logging.
Signed-off-by: Bharath Vissapragada <bharathv@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Pankaj <pankajkumar@apache.org>
RegionStates#getAssignmentsForBalancer is used by the HMaster to
collect all regions of interest to the balancer for the next chore
iteration. We check if a table is in disabled state to exclude
regions that will not be of interest (because disabled regions are
or will be offline) or are in a state where they shouldn't be
mutated (like SPLITTING). The current checks are not actually
comprehensive.
Filter out regions not in OPEN or OPENING state when building the
set of interesting regions for the balancer to consider. Only
regions open (or opening) on the cluster are of interest to
balancing calculations for the current iteration. Regions in all
other states can be expected to not be of interest – either offline
(OFFLINE, or FAILED_*), not subject to balancer decisions now
(SPLITTING, SPLITTING_NEW, MERGING, MERGING_NEW), or will be
offline shortly (CLOSING) – until at least the next chore
iteration.
Add TRACE level logging.
Signed-off-by: Bharath Vissapragada <bharathv@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
We claim in a WARN level log line to be "Playing-it-safe skipping merge/
split gc'ing of regions from hbase:meta while regions-in-transition (RIT)"
but do not actually skip because of a missing return. Remove the warning.
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Minor refactor. Make the `compactSplitThread` member field of `HRegionServer` private, and gate
all access through the getter method.
Signed-off-by: Yulin Niu <niuyulin@apache.org>
Signed-off-by: Pankaj Kumar <pankajkumar@apache.org>
Need to add to allowed-licenses list too....
Signed-off-by: Wei-Chiu Chuang <weichiu@apache.org>
Reviewed-by: Duo Zhang <zhangduo@apache.org>
Reviewed-by: Nick Dimiduk <ndimiduk@apache.org>
Remove the deprecated fields, which can be removed in 3.0.0. Marked the
constant OLDEST_TIMESTAMP as InterfaceAudience.Private as it is only use
in classes, which are also marked as InterfaceAudience.Private.
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
There are code paths in which we throw non-IOExceptions when
initializing a WAL reader. However, we only close the InputStream to the
WAL filesystem when the exception is an IOException. Close it if it is
open in all cases.
Co-authored-by: Josh Elser <jelser@cloudera.com>
Signed-off-by: Andrew Purtell <apurtell@apache.org>
The issue is that FileInputStream is created with try-with-resources, so its close() is called right after the try sentence.
FileInputStream is a finalize class, when this object is garbage collected, its close() is called again.
To avoid this double-free resources, add guard against it.
Signed-off-by: stack <stack@apache.org>
Don't have every handler update regionserver metrics on each
scan#nextRaw; instead, do a batch update just before Scan
returns. Otherwise, all running handlers end up contending
on metrics update.
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
Update of regionserver metrics counters moved out to caller where
can be done as a batch update instead of per-next.
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServer.java
Class doc to encourage batch updating metrics.
Remove the single update as unused anymore.
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
Count calls to nextRaw. Update regionserver count in finally block when
scan is done rather than per nextRaw call. Move all metrics updates to
finally.
Signed-off-by: Reid Chan <reidchan@apache.org>
Signed-off-by: Baiqiang Zhao <ZhaoBQ>
Introduces a new metric that tracks number of replication sources that are stuck in initialization.
Signed-off-by: Xu Cang <xucang@apache.org>
Signed-off-by: Bharath Vissapragada <bharathv@apache.org>
Add a toString to all split policy implementations listing name and
vitals. Use this toString in the Region open message. Ditto for flush
policy for the Region.
Signed-off-by: Huaxiang Sun<huaxiangsun@apache.org>
Turn down the amount we log. If you want to see the full exception
enable TRACE-level logging.
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: shahrs87
Close the stream using a try-with-resources block.
Reviewed-by: Aman Poonia <aman.poonia.29@gmail.com>
Signed-off-by: Wei-Chiu Chuang <weichiu@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Bharath Vissapragada <bharathv@apache.org>
ServerCall.java: calling wrapWithSasl() was moved to getResponse(), so
the SASL wrapping is delayed until the reply is sent back to the client.
Signed-off-by: Peter Somogyi <psomogyi@apache.org>
ProtobufLogReader#readNext may be called by code that attempts to advance
the reader but does not necessarily expect to succeed, for example
WALEntryStream#tryAdvanceEntry. Much of the logging in this method is
at TRACE level. Other logging at WARN level will be frequently emitted, as
often as several per minute, and this will cause false positive assessment
from operators that they are experiencing a bug. Fix the mixed intent with
respect to log levels in readNext. Log at only DEBUG level or below.
Signed-off-by: Sean Busbey <busbey@apache.org>
Plumbs the configuration needed to enable core thread timeout on non-critical thread pools.
Currently only enabled for thread pools with op-codes RS_LOG_REPLAY_OPS, RS_PARALLEL_SEEK, MASTER_SNAPSHOT_OPERATIONS, MASTER_MERGE_OPERATIONS. Others can be added later as
needed.
Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org>
* HBASE-25542 Add client detail to scan name so when lease expires, we have clue on who was scanning
When we create a scanner lease, record client ip and port (removed
unnecessary store of scannerName).
Signed-off-by: Clara Xiong <clarax98007@gmail.com>
Retain assignment will be useful in non-cloud scenario where RegionServer and Datanode are deployed in same machine and will avoid remote read.
Signed-off-by: Guanghao Zhang <zghao@apache.org>
Signed-off-by: Anoop Sam John <anoopsamjohn@apache.org>
Adds "hbase.master.executor.merge.dispatch.threads" and defaults to 2.
Also adds additional logging that includes the number of split plans
and merge plans computed for each normalizer run.
Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Adding StoreContext which contains the metadata about the HStore. This
metadata can be used across the HFileWriter/Readers and other HStore
consumers without the need of passing around the complete store and
exposing its internals.
Co-authored-by: Abhishek Khanna <akkhanna@amazon.com>
Signed-off-by: stack <stack@apache.org>
Signed-off-by: Zach York <zyork@apache.org>
Some OMME can not cause the JVM to exit, like "java.lang.OutOfMemoryError: Direct buffer memory", "java.lang.OutOfMemoryError: unable to create new native thread", as they dont call vmError#next_OnError_command. So abort HMaster when uncaught exception occurs in TimeoutExecutor, the new active Hmaster will resume the suspended procedure.
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: stack <stack@apache.com>
Signed-off-by: Pankaj Kumar<pankajkumar@apache.org>
* HBASE-25379 Make retry pause time configurable for regionserver short operation RPC (reportRegionStateTransition/reportProcedureDone)
* HBASE-25379 RemoteProcedureResultReporter also should retry after the configured pause time
* Addressed the review comments
Signed-off-by: Yulin Niu <niuyulin@apache.org>
* Using ContiguousCellFormat as a marker alone
* Commit the new file
* Fix the comparator logic that was an oversight
* Fix the sequenceId check order
* Adding few more static methods that helps in scan flow like query
matcher where we have more cols
* Remove ContiguousCellFormat and ensure compare() can be inlined
* applying negation as per review comment
* Fix checkstyle comments
* fix review comments
* Address review comments
* Fix the checkstyle issues
* Fix javadoc
Signed-off-by: stack <stack@apache.org>
Signed-off-by: AnoopSamJohn <anoopsamjohn@apache.org>
Signed-off-by: huaxiangsun <huaxiangsun@apache.org>
* HBASE-25277 postScannerFilterRow impacts Scan performance a lot in HBase 2.x
1. Added a check for Object class in RegionCoprocessorHost to avoid wrong initialization of hasCustomPostScannerFilterRow
2. Removed dummy implementation of postScannerFilterRow from AccessController, VisibilityController & ConstraintProcessor (which are not required currently)
Signed-off-by Ramkrishna S Vasudevan <ramkrishna@apache.org>
Signed-off-by Anoop Sam John <anoopsamjohn@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Add a bit of a wait before testing if online replicas match the zk count. It might take a
while for all replicas to come online.
Signed-off-by: huaxiangsun <huaxiangsun@apache.org>
Check TEST_SKIP_REPORTING_TRANSITION and if true, skip trying to talk to
update master on state transition -- i.e. reportFileArchivalForQuotas --
as we allow for reportRegionStateTransition (For tests only)
Signed-off-by: Huaxiang Sun <huaxiangsun@apache.com>
Network identities should be bound late. Remote addresses should be
resolved at the last possible moment, just before connect(). Network
identity mappings can change, so our code should not inappropriately
cache them. Otherwise we might miss a change and fail to operate normally.
Revert "HBASE-14544 Allow HConnectionImpl to not refresh the dns on errors"
Removes hbase.resolve.hostnames.on.failure and related code. We always
resolve hostnames, as late as possible.
Preserve InetSocketAddress caching per RPC connection. Avoids potential
lookups per Call.
Replace InetSocketAddress with Address where used as a map key. If we want
to key by hostname and/or resolved address we should be explicit about it.
Using Address chooses mapping by hostname and port only.
Add metrics for potential nameservice resolution attempts, whenever an
InetSocketAddress is instantiated for connect; and metrics for failed
resolution, whenever InetSocketAddress#isUnresolved on the new instance
is true.
* Use ServerName directly to build a stub key
* Resolve and cache ISA on a RpcChannel as late as possible, at first call
* Remove now invalid unit test TestCIBadHostname
We resolve DNS at the latest possible time, at first call, and do not
resolve hostnames for creating stubs at all, so this unit test cannot
work now.
Reviewed-by: Mingliang Liu <liuml07@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
This PR is a follow-up of HBASE-25181 (#2539), where several issues were
discussed on the PR:
1. Currently we use PBKDF2WithHmacSHA1 key generation algorithm to generate a
secret key for HFile / WalFile encryption, when the user is defining a string
encryption key in the hbase shell. This algorithm is not secure enough and
not allowed in certain environments (e.g. on FIPS compliant clusters). We are
changing it to PBKDF2WithHmacSHA384. It will not break backward-compatibility,
as even the tables created by the shell using the new algorithm will be able
to load (e.g. during bulkload / replication) the HFiles serialized with the
key generated by an old algorithm, as the HFiles themselves already contain
the key necessary for their decryption.
Smaller issues fixed by this commit:
2. Improve the documentation e.g. with the changes introduced by HBASE-25181
and also by some points discussed on the Jira ticket of HBASE-25263.
3. In EncryptionUtil.createEncryptionContext the various encryption config
checks should throw IllegalStateExceptions instead of RuntimeExceptions.
4. Test cases in TestEncryptionTest.java should be broken down into smaller
tests.
5. TestEncryptionDisabled.java should use ExpectedException JUnit rule to
validate exceptions.
closes#2676
Signed-off-by: Peter Somogyi <psomogyi@apache.org>
* HBASE-25050 - We initialize Filesystems more than once.
* Ensuring that calling the FS#get() will only ensure FS init.
* Fix for testfailures. We should pass the entire path and no the scheme
alone
* Cases where we don't have a scheme for the URI
* Address review comments
* Add some comments on why FS#get(URI, conf) is getting used
* Adding the comment as per Sean's review
Signed-off-by: Sean Busbey <busbey@apache.org>
Signed-off-by: Michael Stack <stack@apache.org>