Avoid the pattern where a Random object is allocated, used once or twice, and
then left for GC. This pattern triggers warnings from some static analysis tools
because this pattern leads to poor effective randomness. In a few cases we were
legitimately suffering from this issue; in others a change is still good to
reduce noise in analysis results.
Use ThreadLocalRandom where there is no requirement to set the seed to gain
good reuse.
Where useful relax use of SecureRandom to simply Random or ThreadLocalRandom,
which are unlikely to block if the system entropy pool is low, if we don't need
crypographically strong randomness for the use case. The exception to this is
normalization of use of Bytes#random to fill byte arrays with randomness.
Because Bytes#random may be used to generate key material it must be backed by
SecureRandom.
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Add the following stats for a given table:
- 7. Total size of serialized cells of each CF.
- 8. Total size of serialized cells of each qualifier.
- 9. Total size of serialized cells across all rows.
Signed-off-by: Viraj Jasani <vjasani@apache.org>
We get and retain Compressor instances in HFileBlockDefaultEncodingContext,
and could in theory call Compressor#reinit when setting up the context,
to update compression parameters like level and buffer size, but we do
not plumb through the CompoundConfiguration from the Store into the
encoding context. As a consequence we can only update codec parameters
globally in system site conf files.
Fine grained configurability is important for algorithms like ZStandard
(ZSTD), which offers more than 20 compression levels, where at level 1
it is almost as fast as LZ4, and where at higher levels it utilizes
computationally expensive techniques to rival LZMA at compression ratio
but trades off significantly for reduced compresson throughput. The ZSTD
level that should be set for a given column family or table will vary by
use case.
Signed-off-by: Viraj Jasani <vjasani@apache.org>
This change introduces provided compression codecs to HBase as
new Maven modules. Each module provides compression codec support
that formerly required Hadoop native codecs, which in turn relies
on native code integration, which may or may not be available on
a given hardware platform or in an operational environment. We
now provide codecs in the HBase distribution for users whom for
whatever reason cannot or do not wish to deploy the Hadoop native
codecs.
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
HBase 2 moved over Scans to use PREAD by default instead of STREAM like
HBase 1. In the context of a MapReduce job, we can generally expect that
clients using the InputFormat (batch job) would be reading most of the
data for a job. Cater to them, but still give users who want PREAD the
ability to do so.
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Tak Lon (Stephen) Wu <taklwu@apache.org>
We introduced EnvironmentEdgeManager as a way to inject alternate clocks
for unit tests. In order for this to be effective, all callers that would
otherwise use System.currentTimeMillis() must call
EnvironmentEdgeManager.currentTime() instead, except the implementers of
EnvironmentEdge.
Signed-off-by: Bharath Vissapragada <bharathv@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Need to add to allowed-licenses list too....
Signed-off-by: Wei-Chiu Chuang <weichiu@apache.org>
Reviewed-by: Duo Zhang <zhangduo@apache.org>
Reviewed-by: Nick Dimiduk <ndimiduk@apache.org>
Adding StoreContext which contains the metadata about the HStore. This
metadata can be used across the HFileWriter/Readers and other HStore
consumers without the need of passing around the complete store and
exposing its internals.
Co-authored-by: Abhishek Khanna <akkhanna@amazon.com>
Signed-off-by: stack <stack@apache.org>
Signed-off-by: Zach York <zyork@apache.org>
Make it so WALPlayer can replay recovered.edits files.
hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/WALInputFormat.java
Allow for WAL files that do NOT have a startime in their name.
Use the 'generic' WAL-filename parser instead of the one that
used be local here. Implement support for 'startTime' filter.
Previous was just not implemented.
hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/WALPlayer.java
Checkstyle.
hbase-server/src/main/java/org/apache/hadoop/hbase/wal/AbstractFSWALProvider.java
Use the new general WAL name timestamp parser.
hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WAL.java
Utility for parsing timestamp from WAL filename.
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRecoveredEdits.java
Export attributes about the local recovered.edits test file
so other tests can play with it.
Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org>
Add MR counters so operator can see if WALPlayer run actually did
anything. Fix bugs in usage (it enforced two args though usage
describes allowing one arg only). Clean up usage output. In
particular add note on wal file separator as hbase by default uses
the ',' in its WAL file names which could befuddle operator
trying to do simple import.
Signed-off-by: Huaxiang Sun <huaxiangsun@apache.com>