MATH-1603: Userguide update.
This commit is contained in:
parent
759743122d
commit
64474ed963
|
@ -58,6 +58,33 @@
|
|||
can make a difference when <code>p</code> is an attained value of the distribution.
|
||||
</p>
|
||||
</subsection>
|
||||
|
||||
<subsection name="8.2 Generating data like an input file"
|
||||
href="empirical">
|
||||
<p>
|
||||
Using the <code>EmpiricalDistribution</code> class, you can generate data based on
|
||||
the values in an input file:
|
||||
|
||||
<source>
|
||||
int binCount = 500;
|
||||
EmpiricalDistribution empDist = new EmpiricalDistribution(binCount);
|
||||
empDist.load("data.txt");
|
||||
RealDistribution.Sampler sampler = empDist.createSampler(RandomSource.MT.create());
|
||||
double value = sampler.nextDouble(); </source>
|
||||
|
||||
The entire input file is read and a probability density function is estimated
|
||||
based on data from the file.
|
||||
The estimation method is essentially the
|
||||
<a href="http://nedwww.ipac.caltech.edu/level5/March02/Silverman/Silver2_6.html">
|
||||
Variable Kernel Method</a> with Gaussian smoothing.
|
||||
The created sampler will return random values whose probability distribution
|
||||
matches the empirical distribution (i.e. if you generate a large number of
|
||||
such values, their distribution should "look like" the distribution of the
|
||||
values in the input file.
|
||||
The values are not stored in memory in this case either, so there is no limit to the
|
||||
size of the input file.
|
||||
</p>
|
||||
</subsection>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
||||
|
|
|
@ -50,12 +50,8 @@
|
|||
<li><a href="random.html">2. Data Generation</a>
|
||||
<ul>
|
||||
<li><a href="random.html#a2.1_Overview">2.1 Overview</a></li>
|
||||
<li><a href="random.html#a2.2_Random_numbers">2.2 Random numbers</a></li>
|
||||
<li><a href="random.html#a2.3_Random_Vectors">2.3 Random Vectors</a></li>
|
||||
<li><a href="random.html#a2.4_Random_Strings">2.4 Random Strings</a></li>
|
||||
<li><a href="random.html#a2.5_Random_permutations_combinations_sampling">2.5 Random permutations, combinations, sampling</a></li>
|
||||
<li><a href="random.html#a2.6_Generating_data_like_an_input_file">2.6 Generating data 'like' an input file</a></li>
|
||||
<li><a href="random.html#a2.7_PRNG_Pluggability">2.7 PRNG Pluggability</a></li>
|
||||
<li><a href="random.html#a2.2_Correlated_random_vectors">2.2 Correlated random vectors</a></li>
|
||||
<li><a href="random.html#a2.3_Low_discrepancy_sequences">2.3 Low discrepancy sequences</a></li>
|
||||
</ul></li>
|
||||
<li><a href="linear.html">3. Linear Algebra</a>
|
||||
<ul>
|
||||
|
@ -103,6 +99,7 @@
|
|||
<li><a href="distribution.html">8. Probability Distributions</a>
|
||||
<ul>
|
||||
<li><a href="distribution.html#a8.1_Overview">8.1 Overview</a></li>
|
||||
<li><a href="distribution.html#a8.2_Generating_data_like_an_input_file">8.2 Generating data 'like' an input file</a></li>
|
||||
</ul></li>
|
||||
<li><a href="fraction.html">9. Fractions</a>
|
||||
<ul>
|
||||
|
|
|
@ -28,181 +28,100 @@
|
|||
|
||||
<section name="2 Data Generation">
|
||||
|
||||
<subsection name="2.1 Overview"
|
||||
href="overview">
|
||||
<subsection name="2.1 Overview"
|
||||
href="overview">
|
||||
<p>
|
||||
The Commons Math <a href="../apidocs/org/apache/commons/math4/random/package-summary.html">o.a.c.m.random</a>
|
||||
package includes utilities for
|
||||
<ul>
|
||||
<li>generating random numbers</li>
|
||||
<li>generating random vectors</li>
|
||||
<li>generating random strings</li>
|
||||
<li>generating cryptographically secure sequences of random numbers or
|
||||
strings</li>
|
||||
<li>generating random samples and permutations</li>
|
||||
<li>analyzing distributions of values in an input file and generating
|
||||
values "like" the values in the file</li>
|
||||
<li>generating data for grouped frequency distributions or
|
||||
histograms</li>
|
||||
</ul></p>
|
||||
Utilities in package <a href="../apidocs/org/apache/commons/math4/legacy/random/package-summary.html">
|
||||
o.a.c.m.legacy.random</a> often uses an underlying "source of randomness": A pseudo-random
|
||||
number generator (PRNG) that produces sequences of numbers that are uniformly distributed
|
||||
within their range.
|
||||
Commons Math depends on <a href="http://commons.apache.org/rng">Commons RNG</a> for the
|
||||
PRNG implementations.
|
||||
</p>
|
||||
</subsection>
|
||||
|
||||
<subsection name="2.2 Correlated random vectors"
|
||||
href="vectors">
|
||||
<p>
|
||||
These utilities rely on an underlying "source of randomness", which in most
|
||||
cases is a pseudo-random number generator (PRNG) that produces sequences
|
||||
of numbers that are uniformly distributed within their range.
|
||||
Commons Math depends on <a href="http://commons.apache.org/rng">Commons Rng</a>
|
||||
for the PRNG implementations.
|
||||
Some algorithms require random vectors instead of random scalars.
|
||||
When the components of these vectors are uncorrelated, they may be generated
|
||||
simply one at a time and packed together in the vector.
|
||||
</p>
|
||||
<p>
|
||||
A PRNG algorithm is often deterministic, i.e. it produces the same sequence
|
||||
when initialized with the same "seed".
|
||||
This property is important for some applications like Monte-Carlo simulations,
|
||||
but makes such a PRNG often unsuitable for cryptographic purposes.
|
||||
When the components are correlated however, generating them is more difficult.
|
||||
The <a href="../apidocs/org/apache/commons/math4/legacy/random/CorrelatedVectorFactory.html">
|
||||
CorrelatedVectorFactory</a> class provides this service.
|
||||
In this case, a complete covariance matrix must be provided (instead of a
|
||||
simple standard deviations vector) gathering both the variance and the
|
||||
correlation information of the probability law.
|
||||
</p>
|
||||
<p>
|
||||
The main use for correlated random vector generation is for Monte-Carlo
|
||||
simulation of physical problems with several variables, for example to
|
||||
generate error vectors to be added to a nominal vector. A particularly
|
||||
common case is when the generated vector should be drawn from a <a
|
||||
href="http://en.wikipedia.org/wiki/Multivariate_normal_distribution">
|
||||
Multivariate Normal Distribution</a>.
|
||||
</p>
|
||||
</subsection>
|
||||
|
||||
<subsection name="2.2 Random Deviates"
|
||||
href="deviates">
|
||||
<p>
|
||||
<dl>
|
||||
<dt>Random sequence of numbers from a probability distribution</dt>
|
||||
<dd>
|
||||
There is no such thing as a single "random number." What can be
|
||||
generated are <i>sequences</i> of numbers that appear to be random. When
|
||||
using the built-in JDK function <code>Math.random()</code>, sequences of
|
||||
values generated follow the
|
||||
<a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda3662.htm">
|
||||
Uniform Distribution</a>, which means that the values are evenly spread
|
||||
over the interval between 0 and 1, with no sub-interval having a greater
|
||||
probability of containing generated values than any other interval of the
|
||||
same length. The mathematical concept of a
|
||||
<a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda36.htm">
|
||||
probability distribution</a> basically amounts to asserting that different
|
||||
ranges in the set of possible values of a random variable have
|
||||
different probabilities of containing the value. Commons Math supports
|
||||
generating random sequences from each of the distributions defined in the
|
||||
<a href="../apidocs/org/apache/commons/math4/distribution/package-summary.html">
|
||||
o.a.c.m.distribution</a> package.
|
||||
Please refer to the <a href="../distribution.html">specific documentation</a>
|
||||
for more details.
|
||||
</dd>
|
||||
<p>
|
||||
Generating random vectors from a bivariate normal distribution:
|
||||
|
||||
<dt>Cryptographically secure random sequences</dt>
|
||||
<dd>
|
||||
It is possible for a sequence of numbers to appear random, but
|
||||
nonetheless to be predictable based on the algorithm used to generate the
|
||||
sequence.
|
||||
When in addition to randomness, strong unpredictability is
|
||||
required, a
|
||||
<a href="http://www.wikipedia.org/wiki/Cryptographically_secure_pseudo-random_number_generator">
|
||||
secure random number generator</a>
|
||||
should be used to generate values (or strings), for example an instance of
|
||||
the JDK-provided <code>SecureRandom</code> generator.
|
||||
In general, such secure generator produce sequence based on a source of
|
||||
true randomness, and sequences started with the same seed will diverge.
|
||||
|
||||
The <a href="../apidocs/org/apache/commons/math4/random/RandomUtils.html">RandomUtils</a>
|
||||
class provides a method for wrapping a <code>java.util.Random</code> or
|
||||
<code>java.security.SecureRandom</code> instance in an object that implements
|
||||
the <a href="http://commons.apache.org/proper/commons-rng/apidocs/org/apache/commons/rng/UniformRandomProvider.html">
|
||||
UniformRandomProvider</a> interface:
|
||||
<source>
|
||||
UniformRandomProvider rg = RandomUtils.asUniformRandomProvider(new java.security.SecureRandom());
|
||||
</source>
|
||||
</dd>
|
||||
</dl>
|
||||
</p>
|
||||
</subsection>
|
||||
|
||||
<subsection name="2.3 Random Vectors"
|
||||
href="vectors">
|
||||
<p>
|
||||
Some algorithms require random vectors instead of random scalars. When the
|
||||
components of these vectors are uncorrelated, they may be generated simply
|
||||
one at a time and packed together in the vector. The <a
|
||||
href="../apidocs/org/apache/commons/math4/random/UncorrelatedRandomVectorGenerator.html">
|
||||
UncorrelatedRandomVectorGenerator</a> class simplifies this
|
||||
process by setting the mean and deviation of each component once and
|
||||
generating complete vectors. When the components are correlated however,
|
||||
generating them is much more difficult. The <a href="../apidocs/org/apache/commons/math4/random/CorrelatedRandomVectorGenerator.html">
|
||||
CorrelatedRandomVectorGenerator</a> class provides this service. In this
|
||||
case, the user must set up a complete covariance matrix instead of a simple
|
||||
standard deviations vector. This matrix gathers both the variance and the
|
||||
correlation information of the probability law.
|
||||
</p>
|
||||
<p>
|
||||
The main use for correlated random vector generation is for Monte-Carlo
|
||||
simulation of physical problems with several variables, for example to
|
||||
generate error vectors to be added to a nominal vector. A particularly
|
||||
common case is when the generated vector should be drawn from a <a
|
||||
href="http://en.wikipedia.org/wiki/Multivariate_normal_distribution">
|
||||
Multivariate Normal Distribution</a>.
|
||||
</p>
|
||||
|
||||
<p><dl>
|
||||
<dt>Generating random vectors from a bivariate normal distribution</dt><dd>
|
||||
<source>
|
||||
// Import common PRNG interface and factory class that instantiates the PRNG.
|
||||
<source>
|
||||
import java.util.function.Supplier;
|
||||
import org.apache.commons.rng.UniformRandomProvider;
|
||||
import org.apache.commons.rng.RandomSource;
|
||||
|
||||
// Create (and possibly seed) a PRNG (could use any of the CM-provided generators).
|
||||
// Import common PRNG interface and factory class that instantiates the PRNG.
|
||||
// Create (and possibly seed) a PRNG.
|
||||
long seed = 17399225432L; // Fixed seed means same results every time
|
||||
UniformRandomProvider rg = RandomSource.create(RandomSource.MT, seed);
|
||||
UniformRandomProvider rng = RandomSource.create(RandomSource.MT, seed);
|
||||
|
||||
// Create a GaussianRandomGenerator using "rg" as its source of randomness.
|
||||
GaussianRandomGenerator rawGenerator = new GaussianRandomGenerator(rg);
|
||||
|
||||
// Create a CorrelatedRandomVectorGenerator using "rawGenerator" for the components.
|
||||
CorrelatedRandomVectorGenerator generator =
|
||||
new CorrelatedRandomVectorGenerator(mean, covariance, 1.0e-12 * covariance.getNorm(), rawGenerator);
|
||||
// Create a a factory of correlated vectors.
|
||||
CorrelatedVectorFactory factory = new CorrelatedVectorFactory(mean, covariance, 1e-12);
|
||||
Supplier<double[]> generator = factory.gaussian(rng);
|
||||
|
||||
// Use the generator to generate correlated vectors.
|
||||
double[] randomVector = generator.nextVector();
|
||||
double[] randomVector = generator.get();
|
||||
... </source>
|
||||
|
||||
The <code>mean</code> argument is a <code>double[]</code> array holding the means
|
||||
of the random vector components. In the bivariate case, it must have length 2.
|
||||
The <code>covariance</code> argument is a <code>RealMatrix</code>, which has to
|
||||
be 2 x 2.
|
||||
The main diagonal elements are the variances of the vector components and the
|
||||
off-diagonal elements are the covariances.
|
||||
For example, if the means are 1 and 2 respectively, and the desired standard deviations
|
||||
are 3 and 4, respectively, then we need to use
|
||||
<source>
|
||||
The <code>mean</code> argument is a <code>double[]</code> array holding the means
|
||||
of the random vector components. In the bivariate case, it must have length 2.
|
||||
The <code>covariance</code> argument is a <code>RealMatrix</code>, which has to
|
||||
be 2 x 2.
|
||||
The main diagonal elements are the variances of the vector components and the
|
||||
off-diagonal elements are the covariances.
|
||||
For example, if the means are 1 and 2 respectively, and the desired standard deviations
|
||||
are 3 and 4, respectively, then we need to use
|
||||
|
||||
<source>
|
||||
double[] mean = {1, 2};
|
||||
double[][] cov = {{9, c}, {c, 16}};
|
||||
RealMatrix covariance = MatrixUtils.createRealMatrix(cov); </source>
|
||||
where "c" is the desired covariance. If you are starting with a desired correlation,
|
||||
you need to translate this to a covariance by multiplying it by the product of the
|
||||
standard deviations. For example, if you want to generate data that will give Pearson's
|
||||
R of 0.5, you would use c = 3 * 4 * 0.5 = 6.
|
||||
</dd>
|
||||
</dl></p>
|
||||
<p>
|
||||
In addition to multivariate normal distributions, correlated vectors from multivariate uniform
|
||||
distributions can be generated by creating a
|
||||
<a href="../apidocs/org/apache/commons/math4/random/UniformRandomGenerator.html">UniformRandomGenerator</a>
|
||||
in place of the
|
||||
<code>GaussianRandomGenerator</code> above. More generally, any
|
||||
<a href="../apidocs/org/apache/commons/math4/random/NormalizedRandomGenerator.html">NormalizedRandomGenerator</a>
|
||||
may be used.
|
||||
</p>
|
||||
RealMatrix covariance = MatrixUtils.createRealMatrix(cov);
|
||||
</source>
|
||||
where "c" is the desired covariance. If you are starting with a desired correlation,
|
||||
you need to translate this to a covariance by multiplying it by the product of the
|
||||
standard deviations. For example, if you want to generate data that will give Pearson's
|
||||
R of 0.5, you would use c = 3 * 4 * 0.5 = 6.
|
||||
</p>
|
||||
</subsection>
|
||||
|
||||
<p><dl>
|
||||
<dt>Low discrepancy sequences</dt>
|
||||
<dd>
|
||||
There exist several quasi-random sequences with the property that for all values of N, the subsequence
|
||||
x<sub>1</sub>, ..., x<sub>N</sub> has low discrepancy, which results in equi-distributed samples.
|
||||
While their quasi-randomness makes them unsuitable for most applications (i.e. the sequence of values
|
||||
is completely deterministic), their unique properties give them an important advantage for quasi-Monte Carlo simulations.<br/>
|
||||
Currently, the following low-discrepancy sequences are supported:
|
||||
<ul>
|
||||
<li><a href="../apidocs/org/apache/commons/math4/random/SobolSequenceGenerator.html">
|
||||
Sobol sequence</a> (pre-configured up to dimension 1000)</li>
|
||||
<li><a href="../apidocs/org/apache/commons/math4/random/HaltonSequenceGenerator.html">
|
||||
Halton sequence</a> (pre-configured up to dimension 40)</li>
|
||||
</ul>
|
||||
<source>
|
||||
<subsection name="2.3 Low discrepancy sequences"
|
||||
href="lowdiscrepancy">
|
||||
<p>
|
||||
There exist several quasi-random sequences with the property that for all values of N, the subsequence
|
||||
x<sub>1</sub>, ..., x<sub>N</sub> has low discrepancy, which results in equi-distributed samples.
|
||||
While their quasi-randomness makes them unsuitable for most applications (i.e. the sequence of values
|
||||
is completely deterministic), their unique properties give them an important advantage for quasi-Monte Carlo simulations.<br/>
|
||||
Currently, the following low-discrepancy sequences are supported:
|
||||
<ul>
|
||||
<li><a href="../apidocs/org/apache/commons/math4/legacy/random/SobolSequenceGenerator.html">
|
||||
Sobol sequence</a> (pre-configured up to dimension 1000)</li>
|
||||
<li><a href="../apidocs/org/apache/commons/math4/legacy/random/HaltonSequenceGenerator.html">
|
||||
Halton sequence</a> (pre-configured up to dimension 40)</li>
|
||||
</ul>
|
||||
|
||||
<source>
|
||||
// Create a Sobol sequence generator for 2-dimensional vectors
|
||||
RandomVectorGenerator generator = new SobolSequence(2);
|
||||
|
||||
|
@ -210,85 +129,15 @@ RandomVectorGenerator generator = new SobolSequence(2);
|
|||
double[] randomVector = generator.nextVector();
|
||||
... </source>
|
||||
|
||||
The figure below illustrates the unique properties of low-discrepancy sequences when
|
||||
generating N samples in the interval [0, 1]. Roughly speaking, such sequences "fill"
|
||||
the respective space more evenly which leads to faster convergence in quasi-Monte Carlo
|
||||
simulations.<br/>
|
||||
<img src="../images/userguide/low_discrepancy_sequences.png"
|
||||
alt="Comparison of low-discrepancy sequences"/>
|
||||
</dd>
|
||||
</dl></p>
|
||||
|
||||
</subsection>
|
||||
|
||||
<subsection name="2.4 Random Strings"
|
||||
href="strings">
|
||||
<p>
|
||||
The method <code>nextHexString</code> in
|
||||
<a href="../apidocs/org/apache/commons/math4/random/RandomUtils.DataGenerator.html">
|
||||
RandomUtils.DataGenerator</a> can be used to generate random strings of
|
||||
hexadecimal characters.
|
||||
It produces sequences of strings with good dispersion properties.
|
||||
A string can be generated in two different ways, depending on the value
|
||||
of the boolean argument passed to the method (see the Javadoc for more
|
||||
details).
|
||||
The figure below illustrates the unique properties of low-discrepancy sequences when
|
||||
generating N samples in the interval [0, 1]. Roughly speaking, such sequences "fill"
|
||||
the respective space more evenly which leads to faster convergence in quasi-Monte Carlo
|
||||
simulations.<br/>
|
||||
<img src="../images/userguide/low_discrepancy_sequences.png"
|
||||
alt="Comparison of low-discrepancy sequences"/>
|
||||
</p>
|
||||
</subsection>
|
||||
|
||||
<subsection name="2.5 Random Permutations, Combinations, Sampling"
|
||||
href="combinatorics">
|
||||
<p>
|
||||
To select a random sample of objects in a collection, you can use the
|
||||
<code>nextSample</code> method provided by in
|
||||
<a href="../apidocs/org/apache/commons/math4/random/RandomUtils.DataGenerator.html">
|
||||
RandomUtils.DataGenerator</a>.
|
||||
Specifically, if <code>c</code> is a <code>java.util.Collection<T></code>
|
||||
containing at least <code>k</code> objects, and <code>randomData</code> is a
|
||||
<code>RandomUtils.DataGenerator</code> instance <code>randomData.nextSample(c, k)</code>
|
||||
will return an <code>List<T></code> instance of size <code>k</code>
|
||||
consisting of elements randomly selected from the collection.
|
||||
If <code>c</code> contains duplicate references, there may be duplicate
|
||||
references in the returned array; otherwise returned elements will be
|
||||
unique (i.e. the sampling is without replacement among the object
|
||||
references in the collection).
|
||||
</p>
|
||||
|
||||
<p>
|
||||
If <code>n</code> and <code>k</code> are integers with <code>k < n</code>, then
|
||||
<code>randomData.nextPermutation(n, k)</code> returns an <code>int[]</code>
|
||||
array of length <code>k</code> whose whose entries are selected randomly,
|
||||
without repetition, from the integers <code>0</code> through
|
||||
<code>n-1</code> (inclusive).
|
||||
</p>
|
||||
</subsection>
|
||||
|
||||
<subsection name="2.6 Generating data like an input file"
|
||||
href="empirical">
|
||||
<p>
|
||||
Using the <code>EmpiricalDistribution</code> class, you can generate data based on
|
||||
the values in an input file:
|
||||
<dl>
|
||||
<source>
|
||||
int binCount = 500;
|
||||
EmpiricalDistribution empDist = new EmpiricalDistribution(binCount);
|
||||
empDist.load("data.txt");
|
||||
RealDistribution.Sampler sampler = empDist.createSampler(RandomSource.create(RandomSource.MT));
|
||||
double value = sampler.nextDouble(); </source>
|
||||
|
||||
The entire input file is read and a probability density function is estimated
|
||||
based on data from the file.
|
||||
The estimation method is essentially the
|
||||
<a href="http://nedwww.ipac.caltech.edu/level5/March02/Silverman/Silver2_6.html">
|
||||
Variable Kernel Method</a> with Gaussian smoothing.
|
||||
The created sampler will return random values whose probability distribution
|
||||
matches the empirical distribution (i.e. if you generate a large number of
|
||||
such values, their distribution should "look like" the distribution of the
|
||||
values in the input file.
|
||||
The values are not stored in memory in this case either, so there is no limit to the
|
||||
size of the input file.
|
||||
</dl>
|
||||
</p>
|
||||
</subsection>
|
||||
</subsection>
|
||||
|
||||
</section>
|
||||
|
||||
|
|
Loading…
Reference in New Issue