@InterfaceAudience.Public @InterfaceStability.Evolving public class RollingFileSystemSink extends Object implements MetricsSink, Closeable
This class is a metrics sink that uses
FileSystem
to write the metrics logs. Every
roll interval a new directory will be created under the path specified by the
basepath
property. All metrics will be logged to a file in the
current interval's directory in a file named <hostname>.log, where
<hostname> is the name of the host on which the metrics logging
process is running. The base path is set by the
<prefix>.sink.<instance>.basepath
property. The
time zone used to create the current interval's directory name is GMT. If
the basepath
property isn't specified, it will default to
"/tmp", which is the temp directory on whatever default file
system is configured for the cluster.
The <prefix>.sink.<instance>.ignore-error
property controls whether an exception is thrown when an error is encountered
writing a log file. The default value is true
. When set to
false
, file errors are quietly swallowed.
The roll-interval
property sets the amount of time before
rolling the directory. The default value is 1 hour. The roll interval may
not be less than 1 minute. The property's value should be given as
number unit, where number is an integer value, and
unit is a valid unit. Valid units are minute, hour,
and day. The units are case insensitive and may be abbreviated or
plural. If no units are specified, hours are assumed. For example,
"2", "2h", "2 hour", and
"2 hours" are all valid ways to specify two hours.
The roll-offset-interval-millis
property sets the upper
bound on a random time interval (in milliseconds) that is used to delay
before the initial roll. All subsequent rolls will happen an integer
number of roll intervals after the initial roll, hence retaining the original
offset. The purpose of this property is to insert some variance in the roll
times so that large clusters using this sink on every node don't cause a
performance impact on HDFS by rolling simultaneously. The default value is
30000 (30s). When writing to HDFS, as a rule of thumb, the roll offset in
millis should be no less than the number of sink instances times 5.
The primary use of this class is for logging to HDFS. As it uses
FileSystem
to access the target file system,
however, it can be used to write to the local file system, Amazon S3, or any
other supported file system. The base path for the sink will determine the
file system used. An unqualified path will write to the default file system
set by the configuration.
Not all file systems support the ability to append to files. In file
systems without the ability to append to files, only one writer can write to
a file at a time. To allow for concurrent writes from multiple daemons on a
single host, the source
property is used to set unique headers
for the log files. The property should be set to the name of
the source daemon, e.g. namenode. The value of the
source
property should typically be the same as the property's
prefix. If this property is not set, the source is taken to be
unknown.
Instead of appending to an existing file, by default the sink will create a new file with a suffix of ".<n>", where n is the next lowest integer that isn't already used in a file name, similar to the Hadoop daemon logs. NOTE: the file with the highest sequence number is the newest file, unlike the Hadoop daemon logs.
For file systems that allow append, the sink supports appending to the
existing file instead. If the allow-append
property is set to
true, the sink will instead append to the existing file on file systems that
support appends. By default, the allow-append
property is
false.
Note that when writing to HDFS with allow-append
set to true,
there is a minimum acceptable number of data nodes. If the number of data
nodes drops below that minimum, the append will succeed, but reading the
data will fail with an IOException in the DataStreamer class. The minimum
number of data nodes required for a successful append is generally 2 or
3.
Note also that when writing to HDFS, the file size information is not updated until the file is closed (at the end of the interval) even though the data is being written successfully. This is a known HDFS limitation that exists because of the performance cost of updating the metadata. See HDFS-5478.
When using this sink in a secure (Kerberos) environment, two additional
properties must be set: keytab-key
and
principal-key
. keytab-key
should contain the key by
which the keytab file can be found in the configuration, for example,
yarn.nodemanager.keytab
. principal-key
should
contain the key by which the principal can be found in the configuration,
for example, yarn.nodemanager.principal
.
Modifier and Type | Field and Description |
---|---|
protected boolean |
allowAppend |
protected Path |
basePath |
protected static boolean |
forceFlush |
protected static boolean |
hasFlushed |
protected boolean |
ignoreError |
protected Calendar |
nextFlush |
protected long |
rollIntervalMillis |
protected long |
rollOffsetIntervalMillis |
protected String |
source |
protected static Configuration |
suppliedConf |
protected static FileSystem |
suppliedFilesystem |
Modifier | Constructor and Description |
---|---|
|
RollingFileSystemSink()
Create an empty instance.
|
protected |
RollingFileSystemSink(long flushIntervalMillis,
long flushOffsetIntervalMillis)
Create an instance for testing.
|
Modifier and Type | Method and Description |
---|---|
void |
close() |
void |
flush()
Flush any buffered metrics
|
protected long |
getRollInterval()
Extract the roll interval from the configuration and return it in
milliseconds.
|
void |
init(org.apache.commons.configuration2.SubsetConfiguration metrics2Properties)
Initialize the plugin
|
void |
putMetrics(MetricsRecord record)
Put a metrics record in the sink
|
protected void |
setInitialFlushTime(Date now)
Set the
nextFlush variable to the initial flush time. |
protected void |
updateFlushTime(Date now)
Update the
nextFlush variable to the next flush time. |
@VisibleForTesting protected String source
@VisibleForTesting protected boolean ignoreError
@VisibleForTesting protected boolean allowAppend
@VisibleForTesting protected Path basePath
@VisibleForTesting protected long rollIntervalMillis
@VisibleForTesting protected long rollOffsetIntervalMillis
@VisibleForTesting protected Calendar nextFlush
@VisibleForTesting protected static boolean forceFlush
@VisibleForTesting protected static volatile boolean hasFlushed
@VisibleForTesting protected static Configuration suppliedConf
@VisibleForTesting protected static FileSystem suppliedFilesystem
public RollingFileSystemSink()
@VisibleForTesting protected RollingFileSystemSink(long flushIntervalMillis, long flushOffsetIntervalMillis)
flushIntervalMillis
- the roll interval in millisflushOffsetIntervalMillis
- the roll offset interval in millispublic void init(org.apache.commons.configuration2.SubsetConfiguration metrics2Properties)
MetricsPlugin
init
in interface MetricsPlugin
metrics2Properties
- the configuration object for the plugin@VisibleForTesting protected long getRollInterval()
@VisibleForTesting protected void updateFlushTime(Date now)
nextFlush
variable to the next flush time. Add
an integer number of flush intervals, preserving the initial random offset.now
- the current time@VisibleForTesting protected void setInitialFlushTime(Date now)
nextFlush
variable to the initial flush time. The initial
flush will be an integer number of flush intervals past the beginning of
the current hour and will have a random offset added, up to
rollOffsetIntervalMillis
. The initial flush will be a time in
past that can be used from which to calculate future flush times.now
- the current timepublic void putMetrics(MetricsRecord record)
MetricsSink
putMetrics
in interface MetricsSink
record
- the record to putpublic void flush()
MetricsSink
flush
in interface MetricsSink
public void close()
close
in interface Closeable
close
in interface AutoCloseable
Copyright © 2023 Apache Software Foundation. All rights reserved.