variable. The implementation uses running sums and does not require the data
to be stored in memory. Since I could not conceive of any significantly
different implementation strategies that did not amount to just improving
efficiency or numerical accuracy of what I am submitting, I did not abstract
the interface.
The test cases validate the computations against NIST reference data and
verified computations. The slope, intercept, their standard errors and
r-square estimates are accurate to within 10E-12 against the reference data
set. MSE and other ANOVA stats are good at least to within 10E-8. -- Phil S.
PR: Issue #20224
Obtained from: Bugzilla
Submitted by: Phil Steitz
Reviewed by: Tim O'Brien
git-svn-id: https://svn.apache.org/repos/asf/jakarta/commons/proper/math/trunk@140858 13f79535-47bb-0310-9956-ffa450edef68
The attached patch includes the following improvements to Univariate and
UnivariateImpl:
* Improved efficiency of min, max and product maintenance when windowSize is
limited by incorporating suggestion posted to commons-dev by Brend Worden
(added author credit). Thanks, Brent!
* Added javadoc specifying NaN contracts for all statistics, definitions for
geometric and arithmetic means.
* Made some slight modifications to UnivariateImpl to make it consistent with
NaN contracts
* All interface documentation moved to Univariate. The interface specification
includes the NaN semantics and a first attempt at clealy defining exactly
what "rolling" means and how this effects what statistics are defined when.
* Added test cases to verify that min, max, product are correctly maintained
when "rolling" and to verify that NaN contracts are satisfied.
git-svn-id: https://svn.apache.org/repos/asf/jakarta/commons/proper/math/trunk@140857 13f79535-47bb-0310-9956-ffa450edef68
Univariate interface, in which getN() returned a double. The attached patch
inserts the necessary casts to avoid the rounding/truncation errors that were
causing the EmpiricalDistribution and ValueServer unit tests to fail.
The patch also adds a RandomData member variable so that getNext() does not
instantiate a new RandomData instance for each activation
PR: Bugzilla #20149
Obtained from: Issue Patch
Submitted by: Phil Steitz
Reviewed by: Tim O'Brien
git-svn-id: https://svn.apache.org/repos/asf/jakarta/commons/proper/math/trunk@140853 13f79535-47bb-0310-9956-ffa450edef68
contains contributions from Mark Diggory.
* This patch introduces Product and GeometricMean into the Univariate
implementation.
* Discarding the contribution of a discarded element in a rolling
UnivariateImpl requires that the product be calculated explicitly each
time a value is discarded. This is necessary because not all values will
have non-zero values.
* Errors in rolling logic for ListUimpl, and UnivariateImpl were corrected,
and more test cases were added to the JUnit tests for the Univariate
implementations. More rigorous test cases are needed for the entire
suite of Univariate implementations
git-svn-id: https://svn.apache.org/repos/asf/jakarta/commons/proper/math/trunk@140851 13f79535-47bb-0310-9956-ffa450edef68
EmpiricalDistribution -- represents an empirical probability distribution and
supports generation of data values that are "like" values in an input file
without making any assumptions about the functional form of the probability
distribution that the data come from. This is useful in simulation
applications where historical data about component performance are
available but do not follow standard distributions (or any application that
requires random data generation from an empirical distribution). Also
generates data for grouped frequency histograms based on the input file.
ValueServer -- a wrapper for RandomData and EmpiricalDistribution that
generates values in each of the following modes:
* DIGEST_MODE -- uses an empirical distribution
* REPLAY_MODE -- replays data from an input file
* UNIFORM_MODE -- generates uniformly distributed random values
* EXPONENTIAL_MODE -- generates exponentially distributed random
values
* GAUSSIAN_MODE -- generates Gaussian distributed random values
* CONSTANT_MODE -- returns the same value every time.
git-svn-id: https://svn.apache.org/repos/asf/jakarta/commons/proper/math/trunk@140848 13f79535-47bb-0310-9956-ffa450edef68
* One should be able to use a DoubleArray in a similar way to a
regular double[], to this effect methods for accessing element
values will no longer throw NoSuchElementExceptions when an
index is outside of the element set. These method all throw
ArrayIndexOutOfBoundException if a bad index is supplied.
* Filled out javadoc in FixedDoubleArray.
git-svn-id: https://svn.apache.org/repos/asf/jakarta/commons/proper/math/trunk@140845 13f79535-47bb-0310-9956-ffa450edef68
This commit contains the suite of random data generation utilities that
I originally
proposed as extensions to lang.math. There is some functional overlap
with lang.math, but the contract and intention of this implementation is
different in several significant ways.
* the lang implementation maintains "immutability" of the underlying
random number generator (emulating Math). The RandomData
implementation allows users to reseed the random number generator(s)
(this is in effect possible in the recent extensions to lang.math by
passing in a user-supplied random as an actual parameter to the
next() methods) Users can also reset the PRNG algorithm and provider
used by the "secure" methods.
* RandomData includes "secure" methods (delegating to SecureRandom)
* RandomData will generate random deviates from exponential and poisson,
as well as Gaussian and Uniform distributions. These are useful in
simulation applications.
* Overlapping somewhat with lang.StringUtils, RandomData will generate
random hex strings. There is a nextSecureHexString method that will
(I claim :-) generate cryptographically secure string identifiers. I
would appreciate feedback on this algorithm, which I have seen used
elsewhere (similar to what tomcat does to generate session ids); but
not documented as a standard.
PR: Bugzilla 20013
Obtained from: Phil S.
Submitted by: Phil S.
Reviewed by: Tim O.
git-svn-id: https://svn.apache.org/repos/asf/jakarta/commons/proper/math/trunk@140838 13f79535-47bb-0310-9956-ffa450edef68
that reuses an array of fixed length. This classes was added to an efficient
rolling mechanism.
FixedDoubleArray was influenced by discussions on the commons-dev list and
patches submitted by Mark Diggory.
git-svn-id: https://svn.apache.org/repos/asf/jakarta/commons/proper/math/trunk@140836 13f79535-47bb-0310-9956-ffa450edef68
addition to a constant which signifies an "infinite" window. Windowing
has been added to all three Univariate implementations:
* UnivariateImpl - If the window is not infinite, we keep track of
0..n elements and discount the contribution of the discarded element when
our "window" is moved. If the window is infinite no extra storage is used
beyond an empty ContractableDoubleArray.
- In the following two implementations, the window size can be changed at anytime.
* ListStoreUnivariateImpl - If the window is not infinite, this
implementation only takes into account the last n elements of the List.
* StoreUnivariateImpl - Uses an internal ContractableDoubleArray, window size
can be changed at any time.
git-svn-id: https://svn.apache.org/repos/asf/jakarta/commons/proper/math/trunk@140835 13f79535-47bb-0310-9956-ffa450edef68
ExpandableDoubleArray. The interface provides a public interface
which does not hint at any of the storage parameters of
Expandable or Contractable.
* DoubleArrayTest now operates on the DoubleArray interface, casting
to Expandable when we need to call the package scopes getInternalLength
method.
* While we should not provide access to the internal storage array, it
should be possible to obtain a double[] of elements stored in this
DoubleArray - double[] getElements() was added to the DoubleArray interface,
it will return the element array.
git-svn-id: https://svn.apache.org/repos/asf/jakarta/commons/proper/math/trunk@140834 13f79535-47bb-0310-9956-ffa450edef68
need for a DoubleArray interface.
* ExpandableDoubleArray and the extension ContractableDoubleArray should
aim towards presenting a public interface that does not expose any
details of the internal. To this end, one is no longer able to get the
internal storage array via public double[] getValues(), and the startIndex
(which was relative to the internal storage array) is no longer available.
* [Expandable|Contractable]DoubleArray now allow one to discard
elements from the front of the array. Before this commit, one could
accomplish the same goal by changing the starting index of the element
array within the internal storage array. This solution allowed one to
discard elements from the front of the array (as well as) reclaiming
elements by decreases the startIndex.
There were two problems with this approach (especially in
ContractableDoubleArray). The ContractableDoubleArray can be
"compacted" at anytime thereby reseting the startIndex to zero and the
size of the internal store array to number of elements plus one. Second,
"reclaiming" elements from the internal storage array by finagling
internal "pointers" to the start and end index seems to violate the
principles of encapsulation. If you "discard" an element from the
front of the array, consider it unavailable.
It should be noted that calling setNumElements allows one to move the end
index of the internal element array at will. Assume one has a 100 element
array, and one calls setNumElements(10), thereby decreasing the ending index
of the element array by 90. The 90 "dumped" elements are not currently
reinitializied to the default double primitive value. This is an open
question.
* Tests for ExpandableDoubleArray and ContractableDoubleArray were
refactored. both test classes now extend a DoubleArrayAbstractTest
JUnit class which contained shared unit tests for both "implementations".
An approach like this should be adopted to test the Univariate implementations.
git-svn-id: https://svn.apache.org/repos/asf/jakarta/commons/proper/math/trunk@140833 13f79535-47bb-0310-9956-ffa450edef68
* A TestStatistic interface with corresponding implementation and testcase
PR: Bugzilla 19971
Obtained from: Patches attached to issue
Submitted by: Phil S.
Reviewed by: Tim O/
git-svn-id: https://svn.apache.org/repos/asf/jakarta/commons/proper/math/trunk@140832 13f79535-47bb-0310-9956-ffa450edef68
* This class now supports
the ability to move the starting index of the internal element array. This
allows one to move the beginning of the element array, and form a sort of
"window", this will come into play when we want to provide moving
averages, or "rolling".
* Added an addElementRolling(double v) - this will increment the startIndex
and add the element to the end of the internal element array
* brought the Clover test cases up to 100% for this class
Added a class ContractableDoubleArray:
* This is an extension of ExpandableDoubleArray - it addes a configuration
parameter contractionCriteria. Essential if the contractionCriteria is
2.0f we commit to never having the internal storage array provide more
than 2.0 times the storage capacity needed. Once the internal
storage array exceed this measurement, the internal storage array is
pruned to the size of the internal element array.
Also, my IDE scolded me for some ununsed imports in ListUnivariateImpl, they
have been removed.
git-svn-id: https://svn.apache.org/repos/asf/jakarta/commons/proper/math/trunk@140831 13f79535-47bb-0310-9956-ffa450edef68
into an AbstractStoreUnivariate class which take responsibility for
all statistical calculations. AbstractStoreUnivariate is implemented by
two classes:
* StoreUnivariateImpl - This class uses a ExpandableDoubleArray for
internal storage. This class is a more efficient class in terms
of storage and cycles for users who are interested in gathering statistics
not available in the UnivariateImpl implementation.
* ListUnivariateImpl - This class is for a situation where a user might
wish to maintain a List of numeric objects outside of a StoreUnivariate
instance. We still need to add serious error checking in the absence of
1.5's generics, but this implementation will work with any list that
contains Number objects - (BigDecimal, BigInteger, Byte, Double, Float,
Integer, Long, Short). This implementation ultimately transforms all
numeric objects into double primitives via Number.doubleValue().
Becuase AbstractStoreUnivariate does not hold on to any state, a user
can add values through the Univariate.addValue() function OR one can
directly manipulate the contents of the List directly.
git-svn-id: https://svn.apache.org/repos/asf/jakarta/commons/proper/math/trunk@140830 13f79535-47bb-0310-9956-ffa450edef68
interface of Univariate was extracted in an interface of the same name.
Univariate, an interface, is now implemented by UnivariateImpl which contains
all code originally present in the original Univariate implementation.
* StoredUnivariate is an interface which extends Univariate and adds
measures not available in the superinterface such as mode, kurtosis, and skew
* StoredUnivariateImpl provides an implementation which uses the
ExpandableDoubleArray for internal storage. Calculations are performed
on demand *each* time a particular measure is required no state is
maintained by this implementation.
* Univariate provided methods addValue(int), addValue(float), addValue(long).
There functions were removed as no cast is required - all of these
assignments are widening conversions - no cast required
* Removed the name property from Univariate - property not relevant to
univariate statistics
git-svn-id: https://svn.apache.org/repos/asf/jakarta/commons/proper/math/trunk@140828 13f79535-47bb-0310-9956-ffa450edef68
2. Make all currently unimplemented methods throw UnsupportedOperationException
3. Add solve() method to RealMatrix interface, representing vector
solution to AX = B, where B is the parameter and A is *this.
Phil
Obtained from: Phil S.
Submitted by: Phil S.
Reviewed by: Tim O.
git-svn-id: https://svn.apache.org/repos/asf/jakarta/commons/proper/math/trunk@140826 13f79535-47bb-0310-9956-ffa450edef68