The Commons Math random package includes utilities for
The source of random data used by the data generation utilities is
pluggable. By default, the JDK-supplied PseudoRandom Number Generator
(PRNG) is used, but alternative generators can be "plugged in" using an
adaptor framework, which provides a generic facility for replacing
java.util.Random
with an alternative PRNG.
Sections 2.2-2.5 below show how to use the commons math API to generate
different kinds of random data. The examples all use the default
JDK-supplied PRNG. PRNG pluggability is covered in 2.6. The only
modification required to the examples to use alternative PRNGs is to
replace the argumentless constructor calls with invocations including
a RandomGenerator
instance as a parameter.
The org.apache.commons.math.RandomData interface defines methods for generating random sequences of numbers. The API contracts of these methods use the following concepts:
Math.random(),
sequences of
values generated follow the
Uniform Distribution, which means that the values are evenly spread
over the interval between 0 and 1, with no sub-interval having a greater
probability of containing generated values than any other interval of the
same length. The mathematical concept of a
probability distribution basically amounts to asserting that different
ranges in the set of possible values of a random variable have
different probabilities of containing the value. Commons Math supports
generating random sequences from the following probability distributions.
The javadoc for the nextXxx
methods in
RandomDataImpl
describes the algorithms used to generate
random deviates from each of these distributions.
RandomDataImpl
implementation of
the RandomData
interface use the JDK SecureRandom
PRNG to generate cryptographically secure sequences. The
setSecureAlgorithm
method allows you to change the underlying
PRNG. These methods are much slower than the corresponding
"non-secure" versions, so they should only be used when cryptographic
security is required.RandomDataImpl
uses the JDK-provided PRNG. Like most other PRNGs, the JDK generator
generates sequences of random numbers based on an initial "seed value".
For the non-secure methods, starting with the same seed always produces the
same sequence of values. Secure sequences started with the same seeds will
diverge. When a new RandomDataImpl
is created, the underlying
random number generators are not intialized. The first
call to a data generation method, or to a reSeed()
method
initializes the appropriate generator. If you do not explicitly seed the
generator, it is by default seeded with the current time in milliseconds.
Therefore, to generate sequences of random data values, you should always
instantiate one RandomDataImpl
and use it
repeatedly instead of creating new instances for subsequent values in the
sequence. For example, the following will generate a random sequence of 50
long integers between 1 and 1,000,000, using the current time in
milliseconds as the seed for the JDK PRNG:
The following will not in general produce a good random sequence, since the
PRNG is reseeded each time through the loop with the current time in
milliseconds:
The following will produce the same random sequence each time it is
executed:
The following will produce a different random sequence each time it is
executed.
The methods nextHexString
and nextSecureHexString
can be used to generate random strings of hexadecimal characters. Both
of these methods produce sequences of strings with good dispersion
properties. The difference between the two methods is that the second is
cryptographically secure. Specifically, the implementation of
nextHexString(n)
in RandomDataImpl
uses the
following simple algorithm to generate a string of n
hex digits:
RandomDataImpl
implementation of the "secure" version,
nextSecureHexString
generates hex characters in 40-byte
"chunks" using a 3-step process:
SecureRandom.
nextSecureHexString
is much slower than
the non-secure version. It should be used only for applications such as
generating unique session or transaction ids where predictability of
subsequent ids based on observation of previous values is a security
concern. If all that is needed is an even distribution of hex characters
in the generated strings, the non-secure method should be used.
To select a random sample of objects in a collection, you can use the
nextSample
method in the RandomData
interface.
Specifically, if c
is a collection containing at least
k
objects, and ranomData
is a
RandomData
instance randomData.nextSample(c, k)
will return an object[]
array of length k
consisting of elements randomly selected from the collection. If
c
contains duplicate references, there may be duplicate
references in the returned array; otherwise returned elements will be
unique -- i.e., the sampling is without replacement among the object
references in the collection.
If randomData
is a RandomData
instance, and
n
and k
are integers with
k <= n
, then
randomData.nextPermutation(n, k)
returns an int[]
array of length k
whose whose entries are selected randomly,
without repetition, from the integers 0
through
n-1
(inclusive), i.e.,
randomData.nextPermutation(n, k)
returns a random
permutation of n
taken k
at a time.
Using the ValueServer
class, you can generate data based on
the values in an input file in one of two ways:
url
(a java.net.URL
instance), cycling through the values in the
file in sequence, reopening and starting at the beginning again when all
values have been read.
The values in the file are not stored in memory, so it does not matter
how large the file is, but you do need to explicitly close the file
as above. The expected file format is \n -delimited (i.e. one per line)
strings representing valid floating point numbers.
getNext()
returns random values whose
probability distribution matches the empirical distribution -- i.e., if
you generate a large number of such values, their distribution should
"look like" the distribution of the values in the input file. The values
are not stored in memory in this case either, so there is no limit to the
size of the input file. Here is an example:
See the javadoc for ValueServer
and
EmpiricalDistribution
for more details. Note that
computeDistribution()
opens and closes the input file
by itself.
To enable alternative PRNGs to be "plugged in" to the commons-math data
generation utilities and to provide a generic means to replace
java.util.Random
in applications, a random generator
adaptor framework has been added to commons-math. The
org.apache.commons.math.RandomGenerator interface abstracts the public
interface of java.util.Random
and any implementation of this
interface can be used as the source of random data for the commons-math
data generation classes. An abstract base class,
org.apache.commons.math.AbstractRandomGenerator is provided to make
implementation easier. This class provides default implementations of
"derived" data generation methods based on the primitive,
nextDouble().
To support generic replacement of
java.util.Random
, the
org.apache.commons.math.RandomAdaptor class is provided, which
extends java.util.Random
and wraps and delegates calls to
a RandomGenerator
instance.
Examples:
AbstractRandomGenerator
overriding the derived methods that the RngPack implementation provides:
java.util.Random
in RandomData
Random