The Commons Math random package includes utilities for
The org.apache.commons.math.RandomData interface defines methods for generating random sequences of numbers. The API contracts of these methods use the following concepts:
Math.random(),
sequences of values generated
follow the
Uniform Distribution, which means that the values are evenly spread over the interval
between 0 and 1, with no sub-interval having a greater probability of containing generated
values than any other interval of the same length. The mathematical concept of a
probability distribution basically amounts to asserting that different ranges in the set
of possible values for of a random variable have different probabilities of containing the value.
Commons Math supports generating random sequences from the following probability distributions. The
javadoc for the nextXxx
methods in RandomDataImpl
describes the algorithms used
to generate random deviates from each of these distributions.
RandomDataImpl
implementation of the RandomData
interface use the
JDK SecureRandom
pseudo-random number generator (PRNG)
to generate cryptographically secure sequences. The setSecureAlgorithm
method
allows you to change the underlying PRNG. These methods are much slower than
the corresponding "non-secure" versions, so they should only be used when cryptographic security
is required.RandomDataImpl
uses the JDK-provided
PRNG. Like other PRNGs, the JDK generator generates sequences of random numbers based on an initial
"seed value". For the non-secure methods, starting with the same seed always produces the same
sequence of values. Secure sequences started with the same seeds will diverge. When a new
RandomDataImpl
is created, the underlying random number generators are
not intialized. The first call to a data generation method, or to a
reSeed()
method initializes the appropriate generator. If you do not explicitly
seed the generator, it is by default seeded with the current time in milliseconds. Therefore,
to generate sequences of random data values, you should always instantiate one
RandomDataImpl
and use it repeatedly instead of creating new instances for
subsequent values in the sequence. For example, the following will generate a random sequence
of 50 long integers between 1 and 1,000,000, using the current time in milliseconds as the seed
for the JDK PRNG:
The following will not in general produce a good random sequence, since the PRNG is reseeded
each time through the loop with the current time in milliseconds:
The following will produce the same random sequence each time it is executed:
The following will produce a different random sequence each time it is executed.
The methods nextHexString
and nextSecureHexString
can be used to generate random strings of hexadecimal characters. Both of these
methods produce sequences of strings with good dispersion properties.
The difference between the two methods is that the second is cryptographically secure.
Specifically, the implementation of nextHexString(n)
in RandomDataImpl
uses the following simple algorithm to generate a string of n
hex digits:
RandomDataImpl
implementation of the "secure" version,
nextSecureHexString
generates hex characters in 40-byte "chunks"
using a 3-step process:
SecureRandom.
nextSecureHexString
is much slower than the non-secure version. It should be used only for
applications such as generating unique session or transaction ids where predictability of
subsequent ids based on observation of previous values is a security concern. If all
that is needed is an even distribution of hex characters in the generated strings, the
non-secure method should be used.
To select a random sample of objects in a collection, you can use the
nextSample
method in the RandomData
interface. Specifically,
if c
is a collection containing at least k
objects, and
ranomData
is a RandomDataImpl
instance
randomData.nextSample(c, k)
will return an object[]
array of length k
consisting of
elements randomly selected from the collection. If c
contains
duplicate references, there may be duplicate references in the returned array;
otherwise returned elements will be unique -- i.e., the sampling is without
replacement among the object references in the collection.
If randomData
is a RandomDataImpl
instance, and
n
and k
are integers with k <= n
,
then randomData.nextPermutation(n, k)
returns an int[]
array of length k
whose whose entries are selected randomly,
without repetition, from the integers 0
through n-1
(inclusive), i.e.,
randomData.nextPermutation(n, k)
returns a random permutation of
n
taken k
at a time.
Using the ValueServer
class, you can generate data based on the
values in an input file in one of two ways:
url
(a java.net.URL
instance), cycling through the values in the
file in sequence, reopening and starting at the beginning again when all
values have been read.
The values in the file are not stored in memory, so it does not matter
how large the file is, but you do need to explicitly close the file as above.
The expected file format is \n -delimited (i.e. one per line) strings
representing valid floating point numbers.
getNext()
returns random values whose probability
distribution matches the empirical distribution -- i.e., if you generate a large
number of such values, their distribution should "look like" the distribution of
the values in the input file. The values are not stored in memory in this case either,
so there is no limit to the size of the input file. Here is an example:
See the javadoc for ValueServer
and EmpiricalDistribution
for more details. Note that computeDistribution()
opens and closes
the input file by itself.