ZStandard supports initialization of compressors and decompressors with a
precomputed dictionary, which can dramatically improve and speed up compression
of tables with small values. For more details, please see
The Case For Small Data Compression
https://github.com/facebook/zstd#the-case-for-small-data-compression
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Conflicts:
hbase-compression/hbase-compression-zstd/src/main/java/org/apache/hadoop/hbase/io/compress/zstd/ZstdCodec.java
This reverts commit 8ac0b5ed7f.
This is not ready yet. There are some code paths remaining where store
configuration (CompoundConfiguration) is not passed into the block decoding
context. Found with additional integration tests.
A regression was introduced by HBASE-25847 which changed regionInfo#isParentSplit
to regionState#isSplit. The region state after restart is CLOSED instead of SPLIT.
We need to check both regionState and regionInfo for split status.
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Use a hybrid logical clock for timestamping entries.
Using BufferedMutator without HLC was not good because we assign client timestamps,
and the store loop is fast enough that on rare occasion two temporally adjacent URLs
in the set of WARCs are equivalent and the timestamp does not advance, leading later
to a rare false positive CORRUPT finding.
While making changes, support direct S3N paths as input paths on the command line.
Signed-off-by: Viraj Jasani <vjasani@apache.org>
ZStandard supports initialization of compressors and decompressors with a
precomputed dictionary, which can dramatically improve and speed up compression
of tables with small values. For more details, please see
The Case For Small Data Compression
https://github.com/facebook/zstd#the-case-for-small-data-compression
Signed-off-by: Duo Zhang <zhangduo@apache.org>
We get and retain Compressor instances in HFileBlockDefaultEncodingContext,
and could in theory call Compressor#reinit when setting up the context,
to update compression parameters like level and buffer size, but we do
not plumb through the CompoundConfiguration from the Store into the
encoding context. As a consequence we can only update codec parameters
globally in system site conf files.
Fine grained configurability is important for algorithms like ZStandard
(ZSTD), which offers more than 20 compression levels, where at level 1
it is almost as fast as LZ4, and where at higher levels it utilizes
computationally expensive techniques to rival LZMA at compression ratio
but trades off significantly for reduced compresson throughput. The ZSTD
level that should be set for a given column family or table will vary by
use case.
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Conflicts:
hbase-compression/hbase-compression-zstd/src/main/java/org/apache/hadoop/hbase/io/compress/zstd/ZstdDecompressor.java
hbase-server/src/test/java/org/apache/hadoop/hbase/io/compress/HFileTestBase.java
hbase-server/src/test/java/org/apache/hadoop/hbase/io/encoding/TestDataBlockEncoders.java
hbase-server/src/test/java/org/apache/hadoop/hbase/io/encoding/TestSeekToBlockWithEncoders.java
Reduce the frequency of allocation failed traces by printing them
preiodically (once per minute). Record the allocation failures in the
Bucket Cache Stats and let the stat thread dump cumulative allocation
failures alongside other traces it dumps.
Also, this change adds trace for the Table name, Column Family and
HFileName for the most recent allocation failure in last 1 minute.
Signed-off-by: Anoop <anoopsamjohn@apache.org>
This change introduces provided compression codecs to HBase as
new Maven modules. Each module provides compression codec support
that formerly required Hadoop native codecs, which in turn relies
on native code integration, which may or may not be available on
a given hardware platform or in an operational environment. We
now provide codecs in the HBase distribution for users whom for
whatever reason cannot or do not wish to deploy the Hadoop native
codecs.
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Conflicts:
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileEncryption.java
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileSeek.java
hbase-server/src/test/java/org/apache/hadoop/hbase/wal/TestCompressedWAL.java
For unit testing in sync connection with OpenTelemetry tracing,
there is a race condition between ConnectionImplementation#finalize
(object GC) and the test method before Rule triggered by
OpenTelemetryRule.create. such that a error message of
`GlobalOpenTelemetry.set has already been called`.
This change fixed it test by moving up the TRACE_RULE creation
to class-level such that the GlobalOpenTelemetry is being set
and reuse it for all methods.
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org>