The S3A connector supports
"an auditor", a plugin which is invoked
at the start of every filesystem API call,
and whose issued "audit span" provides a context
for all REST operations against the S3 object store.
The standard auditor sets the HTTP Referrer header
on the requests with information about the API call,
such as process ID, operation name, path,
and even job ID.
If the S3 bucket is configured to log requests, this
information will be preserved there and so can be used
to analyze and troubleshoot storage IO.
Contributed by Steve Loughran.
Change-Id: Ic0a105c194342ed2d529833ecc42608e8ba2f258
When the S3A and ABFS filesystems are closed,
their IOStatistics are logged at debug in the log:
org.apache.hadoop.fs.statistics.IOStatisticsLogging
Set `fs.iostatistics.logging.level` to `info` for the statistics
to be logged at info. (also: `warn` or `error` for even higher
log levels).
Contributed by: Mehakmeet Singh
Change-Id: I56d44ad89fc1c0dd4baf701681834e7fd96c544f
Followup to HADOOP-13327, which changed S3A output stream hsync/hflush calls
to raise an exception.
Adds a new option fs.s3a.downgrade.syncable.exceptions
When true, calls to Syncable hsync/hflush on S3A output streams will
log once at warn (for entire process life, not just the stream), then
increment IOStats with the relevant operation counter
With the downgrade option false (default)
* IOStats are incremented
* The UnsupportedOperationException current raised includes a link to the
JIRA.
Contributed by Steve Loughran.
Change-Id: I967e077eda1d1a1a3795b4d22e003fe7997b6679
The ABFS Filesystem and its input and output streams now implement
the IOStatisticSource interface and provide IOStatistics on
their interactions with Azure Storage.
This includes the min/max/mean durations of all REST API calls.
Contributed by Mehakmeet Singh <mehakmeet.singh@cloudera.com>
The S3A connector's rename() operation now raises FileNotFoundException if
the source doesn't exist; a FileAlreadyExistsException if the destination
exists and is unsuitable for the source file/directory.
When renaming to a path which does not exist, the connector no longer checks
for the destination parent directory existing -instead it simply verifies
that there is no file immediately above the destination path.
This is needed to avoid race conditions with delete() and rename()
calls working on adjacent subdirectories.
Contributed by Steve Loughran.
Adds an Abortable.abort() interface for streams to enable output streams to be terminated; this
is implemented by the S3A connector's output stream. It allows for commit protocols
to be implemented which commit/abort work by writing to the final destination and
using the abort() call to cancel any write which is not intended to be committed.
Consult the specification document for information about the interface and its use.
Contributed by Jungtaek Lim and Steve Loughran.
Change-Id: I7fcc25e9dd8c10ce6c29f383529f3a2642a201ae
This defines what output streams and especially those which implement
Syncable are meant to do, and documents where implementations (HDFS; S3)
don't. With tests.
The file:// FileSystem now supports Syncable if an application calls
FileSystem.setWriteChecksum(false) before creating a file -checksumming
and Syncable.hsync() are incompatible.
Contributed by Steve Loughran.
Change-Id: I892d768de6268f4dd6f175b3fe3b7e5bcaa91194
* core-default.xml updated so that fs.s3a.committer.magic.enabled = true
* CommitConstants updated to match
* All tests which previously enabled the magic committer now rely on
default settings. This helps make sure it is enabled.
* Docs cover the switch, mention its enabled and explain why you may
want to disable it.
Note: this doesn't switch to using the committer -it just enables the path
rewriting magic which it depends on.
Contributed by Steve Loughran.
This needs SPARK-33739 in the matching spark branch in order to work
Contributed by Steve Loughran.
Change-Id: I4fe75b057159e35aacc072da3cb7343467c0c3f1
Caused by HADOOP-16830 and HADOOP-17271.
Fixes tests which fail intermittently based on configs and
in the case of the HugeFile tests, bulk runs with existing
FS instances meant statistic probes sometimes ended up probing those
of a previous FS.
Contributed by Steve Loughran.
Change-Id: I65ba3f44444e59d298df25ac5c8dc5a8781dfb7d
This is the API and implementation classes of HADOOP-16830,
which allows callers to query IO object instances
(filesystems, streams, remote iterators, ...) and other classes
for statistics on their I/O Usage: operation count and min/max/mean
durations.
New Packages
org.apache.hadoop.fs.statistics.
Public API, including:
IOStatisticsSource
IOStatistics
IOStatisticsSnapshot (seralizable to java objects and json)
+helper classes for logging and integration
BufferedIOStatisticsInputStream
implements IOStatisticsSource and StreamCapabilities
BufferedIOStatisticsOutputStream
implements IOStatisticsSource, Syncable and StreamCapabilities
org.apache.hadoop.fs.statistics.impl
Implementation classes for internal use.
org.apache.hadoop.util.functional
functional programming support for RemoteIterators and
other operations which raise IOEs; all wrapper classes
implement and propagate IOStatisticsSource
Contributed by Steve Loughran.
Change-Id: If56e8db2981613ff689c39239135e44feb25f78e
See also [SPARK-33402]: Jobs launched in same second have duplicate MapReduce JobIDs
Contributed by Steve Loughran.
Change-Id: Iae65333cddc84692997aae5d902ad8765b45772a
This adds a semaphore to throttle the number of FileSystem instances which
can be created simultaneously, set in "fs.creation.parallel.count".
This is designed to reduce the impact of many threads in an application calling
FileSystem.get() on a filesystem which takes time to instantiate -for example
to an object where HTTPS connections are set up during initialization.
Many threads trying to do this may create spurious delays by conflicting
for access to synchronized blocks, when simply limiting the parallelism
diminishes the conflict, so speeds up all threads trying to access
the store.
The default value, 64, is larger than is likely to deliver any speedup -but
it does mean that there should be no adverse effects from the change.
If a service appears to be blocking on all threads initializing connections to
abfs, s3a or store, try a smaller (possibly significantly smaller) value.
Contributed by Steve Loughran.
Change-Id: I57161b026f28349e339dc8b9d74f6567a62ce196
This fixes the S3Guard/Directory Marker Retention integration so that when
fs.s3a.directory.marker.retention=keep, failures during multipart delete
are handled correctly, as are incremental deletes during
directory tree operations.
In both cases, when a directory marker with children is deleted from
S3, the directory entry in S3Guard is not deleted, because it is still
critical to representing the structure of the store.
Contributed by Steve Loughran.
Change-Id: I4ca133a23ea582cd42ec35dbf2dc85b286297d2f
This switches the SnappyCodec to use the java-snappy codec, rather than the native one.
To use the codec, snappy-java.jar (from org.xerial.snappy) needs to be on the classpath.
This comesin as an avro dependency, so it is already on the hadoop-common classpath,
as well as in hadoop-common/lib.
The version used is now managed in the hadoop-project POM; initially 1.1.7.7
Contributed by DB Tsai and Liang-Chi Hsieh
Change-Id: Id52a404a0005480e68917cd17f0a27b7744aea4e
When a filesystem is closed, the FileSystem log will, at debug level,
log the method calling close/closeAll.
At trace level: the full calling stack.
Contributed by Karen Coppage.
Change-Id: I1444f065c171fd31d42b497c92ba4517969f67f0
This changes directory tree deletion so that only files are incrementally deleted
from S3Guard after the objects are deleted; the directories are left alone
until metadataStore.deleteSubtree(path) is invoke.
This avoids directory tombstones being added above files/child directories,
which stop the treewalk and delete phase from working.
Also:
* Callback to delete objects splits files and dirs so that
any problems deleting the dirs doesn't trigger s3guard updates
* New statistic to measure #of objects deleted, alongside request count.
* Callback listFilesAndEmptyDirectories renamed listFilesAndDirectoryMarkers
to clarify behavior.
* Test enhancements to replicate the failure and verify the fix
Contributed by Steve Loughran
Change-Id: I0e6ea2c35e487267033b1664228c8837279a35c7
Contributed by Steve Loughran.
* Fixes AbstractContractSeekTest test to use readFully
* Doesn't do this to AbstractContractUnbufferTest test as it changes the test too much.
Instead just notes in the error that this may be transient
The issue is that read(buffer) doesn't guarantee that the buffer is filled, only that it will
read up to a point, and that may be just be the amount of data left in the TCP packet.
readFully corrects for this, but using it in the unbuffer test runs the risk that what
is tested for in terms of unbuffering doesn't actually get validated.
Change-Id: I046eadb69b80ba0aac468b354c82c4d510dc3699
This adds an option to disable "empty directory" marker deletion,
so avoid throttling and other scale problems.
This feature is *not* backwards compatible.
Consult the documentation and use with care.
Contributed by Steve Loughran.
Change-Id: I69a61e7584dc36e485d5e39ff25b1e3e559a1958
* Passing C/C++ standard flags -std is
not cross-compiler friendly as not all
compilers support all values.
* Thus, we need to make use of the
appropriate flags provided by CMake in
order to specify the C/C++ standards.
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 909f1e82d3)
* HDFS-15436. Default mount table name used by ViewFileSystem should be configurable
* Replace Constants.CONFIG_VIEWFS_DEFAULT_MOUNT_TABLE use in tests
* Address Uma's comments on PR#2100
* Sort lists in test to match without concern to order
* Address comments, fix checkstyle and fix failing tests
* Fix checkstyle
(cherry picked from commit bed0a3a374)