2013-08-09 10:46:53 -04:00
|
|
|
|
PEP: 450
|
|
|
|
|
Title: Adding A Statistics Module To The Standard Library
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
|
|
|
|
Author: Steven D'Aprano <steve@pearwood.info>
|
2013-09-17 17:20:05 -04:00
|
|
|
|
Status: Accepted
|
2013-08-09 10:46:53 -04:00
|
|
|
|
Type: Standards Track
|
|
|
|
|
Content-Type: text/plain
|
|
|
|
|
Created: 01-Aug-2013
|
|
|
|
|
Python-Version: 3.4
|
2013-09-17 17:20:05 -04:00
|
|
|
|
Post-History: 13-Sep-2013
|
2013-08-09 10:46:53 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
|
|
|
|
|
This PEP proposes the addition of a module for common statistics functions
|
|
|
|
|
such as mean, median, variance and standard deviation to the Python
|
2013-09-09 00:21:27 -04:00
|
|
|
|
standard library. See also http://bugs.python.org/issue18606
|
2013-08-09 10:46:53 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
|
|
|
|
|
|
The proposed statistics module is motivated by the "batteries included"
|
|
|
|
|
philosophy towards the Python standard library. Raymond Hettinger and
|
|
|
|
|
other senior developers have requested a quality statistics library that
|
|
|
|
|
falls somewhere in between high-end statistics libraries and ad hoc
|
|
|
|
|
code.[1] Statistical functions such as mean, standard deviation and others
|
|
|
|
|
are obvious and useful batteries, familiar to any Secondary School student.
|
|
|
|
|
Even cheap scientific calculators typically include multiple statistical
|
|
|
|
|
functions such as:
|
|
|
|
|
|
|
|
|
|
- mean
|
|
|
|
|
- population and sample variance
|
|
|
|
|
- population and sample standard deviation
|
|
|
|
|
- linear regression
|
|
|
|
|
- correlation coefficient
|
|
|
|
|
|
|
|
|
|
Graphing calculators aimed at Secondary School students typically
|
|
|
|
|
include all of the above, plus some or all of:
|
|
|
|
|
|
|
|
|
|
- median
|
|
|
|
|
- mode
|
|
|
|
|
- functions for calculating the probability of random variables
|
|
|
|
|
from the normal, t, chi-squared, and F distributions
|
|
|
|
|
- inference on the mean
|
|
|
|
|
|
|
|
|
|
and others[2]. Likewise spreadsheet applications such as Microsoft Excel,
|
|
|
|
|
LibreOffice and Gnumeric include rich collections of statistical
|
|
|
|
|
functions[3].
|
|
|
|
|
|
|
|
|
|
In contrast, Python currently has no standard way to calculate even the
|
|
|
|
|
simplest and most obvious statistical functions such as mean. For those
|
|
|
|
|
who need statistical functions in Python, there are two obvious solutions:
|
|
|
|
|
|
|
|
|
|
- install numpy and/or scipy[4];
|
|
|
|
|
|
|
|
|
|
- or use a Do It Yourself solution.
|
|
|
|
|
|
|
|
|
|
Numpy is perhaps the most full-featured solution, but it has a few
|
|
|
|
|
disadvantages:
|
|
|
|
|
|
|
|
|
|
- It may be overkill for many purposes. The documentation for numpy even
|
|
|
|
|
warns
|
|
|
|
|
|
|
|
|
|
"It can be hard to know what functions are available in
|
|
|
|
|
numpy. This is not a complete list, but it does cover
|
|
|
|
|
most of them."[5]
|
|
|
|
|
|
|
|
|
|
and then goes on to list over 270 functions, only a small number of
|
|
|
|
|
which are related to statistics.
|
|
|
|
|
|
|
|
|
|
- Numpy is aimed at those doing heavy numerical work, and may be
|
|
|
|
|
intimidating to those who don't have a background in computational
|
|
|
|
|
mathematics and computer science. For example, numpy.mean takes four
|
|
|
|
|
arguments:
|
|
|
|
|
|
|
|
|
|
mean(a, axis=None, dtype=None, out=None)
|
|
|
|
|
|
|
|
|
|
although fortunately for the beginner or casual numpy user, three are
|
|
|
|
|
optional and numpy.mean does the right thing in simple cases:
|
|
|
|
|
|
|
|
|
|
>>> numpy.mean([1, 2, 3, 4])
|
|
|
|
|
2.5
|
|
|
|
|
|
|
|
|
|
- For many people, installing numpy may be difficult or impossible. For
|
|
|
|
|
example, people in corporate environments may have to go through a
|
|
|
|
|
difficult, time-consuming process before being permitted to install
|
|
|
|
|
third-party software. For the casual Python user, having to learn about
|
|
|
|
|
installing third-party packages in order to average a list of numbers is
|
|
|
|
|
unfortunate.
|
|
|
|
|
|
|
|
|
|
This leads to option number 2, DIY statistics functions. At first glance,
|
|
|
|
|
this appears to be an attractive option, due to the apparent simplicity of
|
|
|
|
|
common statistical functions. For example:
|
|
|
|
|
|
|
|
|
|
def mean(data):
|
|
|
|
|
return sum(data)/len(data)
|
|
|
|
|
|
|
|
|
|
def variance(data):
|
|
|
|
|
# Use the Computational Formula for Variance.
|
|
|
|
|
n = len(data)
|
|
|
|
|
ss = sum(x**2 for x in data) - (sum(data)**2)/n
|
|
|
|
|
return ss/(n-1)
|
|
|
|
|
|
|
|
|
|
def standard_deviation(data):
|
|
|
|
|
return math.sqrt(variance(data))
|
|
|
|
|
|
|
|
|
|
The above appears to be correct with a casual test:
|
|
|
|
|
|
|
|
|
|
>>> data = [1, 2, 4, 5, 8]
|
|
|
|
|
>>> variance(data)
|
|
|
|
|
7.5
|
|
|
|
|
|
|
|
|
|
But adding a constant to every data point should not change the variance:
|
|
|
|
|
|
|
|
|
|
>>> data = [x+1e12 for x in data]
|
|
|
|
|
>>> variance(data)
|
|
|
|
|
0.0
|
|
|
|
|
|
|
|
|
|
And variance should *never* be negative:
|
|
|
|
|
|
|
|
|
|
>>> variance(data*100)
|
|
|
|
|
-1239429440.1282566
|
|
|
|
|
|
|
|
|
|
By contrast, the proposed reference implementation gets the exactly correct
|
|
|
|
|
answer 7.5 for the first two examples, and a reasonably close answer for
|
|
|
|
|
the third: 6.012. numpy does no better[6].
|
|
|
|
|
|
|
|
|
|
Even simple statistical calculations contain traps for the unwary, starting
|
|
|
|
|
with the Computational Formula itself. Despite the name, it is numerically
|
|
|
|
|
unstable and can be extremely inaccurate, as can be seen above. It is
|
|
|
|
|
completely unsuitable for computation by computer[7]. This problem plagues
|
|
|
|
|
users of many programming language, not just Python[8], as coders reinvent
|
|
|
|
|
the same numerically inaccurate code over and over again[9], or advise
|
|
|
|
|
others to do so[10].
|
|
|
|
|
|
|
|
|
|
It isn't just the variance and standard deviation. Even the mean is not
|
|
|
|
|
quite as straight-forward as it might appear. The above implementation
|
|
|
|
|
seems too simple to have problems, but it does:
|
|
|
|
|
|
|
|
|
|
- The built-in sum can lose accuracy when dealing with floats of wildly
|
|
|
|
|
differing magnitude. Consequently, the above naive mean fails this
|
|
|
|
|
"torture test":
|
|
|
|
|
|
|
|
|
|
assert mean([1e30, 1, 3, -1e30]) == 1
|
|
|
|
|
|
|
|
|
|
returning 0 instead of 1, a purely computational error of 100%.
|
|
|
|
|
|
|
|
|
|
- Using math.fsum inside mean will make it more accurate with float data,
|
|
|
|
|
but it also has the side-effect of converting any arguments to float
|
|
|
|
|
even when unnecessary. E.g. we should expect the mean of a list of
|
|
|
|
|
Fractions to be a Fraction, not a float.
|
|
|
|
|
|
|
|
|
|
While the above mean implementation does not fail quite as catastrophically
|
|
|
|
|
as the naive variance does, a standard library function can do much better
|
|
|
|
|
than the DIY versions.
|
|
|
|
|
|
|
|
|
|
The example above involves an especially bad set of data, but even for
|
|
|
|
|
more realistic data sets accuracy is important. The first step in
|
|
|
|
|
interpreting variation in data (including dealing with ill-conditioned
|
|
|
|
|
data) is often to standardize it to a series with variance 1 (and often
|
|
|
|
|
mean 0). This standardization requires accurate computation of the mean
|
|
|
|
|
and variance of the raw series. Naive computation of mean and variance
|
|
|
|
|
can lose precision very quickly. Because precision bounds accuracy, it is
|
|
|
|
|
important to use the most precise algorithms for computing mean and
|
|
|
|
|
variance that are practical, or the results of standardization are
|
|
|
|
|
themselves useless.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Comparison To Other Languages/Packages
|
|
|
|
|
|
|
|
|
|
The proposed statistics library is not intended to be a competitor to such
|
|
|
|
|
third-party libraries as numpy/scipy, or of proprietary full-featured
|
|
|
|
|
statistics packages aimed at professional statisticians such as Minitab,
|
|
|
|
|
SAS and Matlab. It is aimed at the level of graphing and scientific
|
|
|
|
|
calculators.
|
|
|
|
|
|
|
|
|
|
Most programming languages have little or no built-in support for
|
|
|
|
|
statistics functions. Some exceptions:
|
|
|
|
|
|
|
|
|
|
R
|
|
|
|
|
R (and its proprietary cousin, S) is a programming language designed
|
|
|
|
|
for statistics work. It is extremely popular with statisticians and
|
|
|
|
|
is extremely feature-rich[11].
|
|
|
|
|
|
|
|
|
|
C#
|
|
|
|
|
|
|
|
|
|
The C# LINQ package includes extension methods to calculate the
|
|
|
|
|
average of enumerables[12].
|
|
|
|
|
|
|
|
|
|
Ruby
|
|
|
|
|
|
|
|
|
|
Ruby does not ship with a standard statistics module, despite some
|
|
|
|
|
apparent demand[13]. Statsample appears to be a feature-rich third-
|
|
|
|
|
party library, aiming to compete with R[14].
|
|
|
|
|
|
|
|
|
|
PHP
|
|
|
|
|
|
|
|
|
|
PHP has an extremely feature-rich (although mostly undocumented) set
|
|
|
|
|
of advanced statistical functions[15].
|
|
|
|
|
|
|
|
|
|
Delphi
|
|
|
|
|
|
|
|
|
|
Delphi includes standard statistical functions including Mean, Sum,
|
|
|
|
|
Variance, TotalVariance, MomentSkewKurtosis in its Math library[16].
|
|
|
|
|
|
|
|
|
|
GNU Scientific Library
|
|
|
|
|
|
|
|
|
|
The GNU Scientific Library includes standard statistical functions,
|
|
|
|
|
percentiles, median and others[17]. One innovation I have borrowed
|
|
|
|
|
from the GSL is to allow the caller to optionally specify the pre-
|
|
|
|
|
calculated mean of the sample (or an a priori known population mean)
|
|
|
|
|
when calculating the variance and standard deviation[18].
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Design Decisions Of The Module
|
|
|
|
|
|
|
|
|
|
My intention is to start small and grow the library as needed, rather than
|
|
|
|
|
try to include everything from the start. Consequently, the current
|
|
|
|
|
reference implementation includes only a small number of functions: mean,
|
|
|
|
|
variance, standard deviation, median, mode. (See the reference
|
|
|
|
|
implementation for a full list.)
|
|
|
|
|
|
|
|
|
|
I have aimed for the following design features:
|
|
|
|
|
|
|
|
|
|
- Correctness over speed. It is easier to speed up a correct but slow
|
|
|
|
|
function than to correct a fast but buggy one.
|
|
|
|
|
|
|
|
|
|
- Concentrate on data in sequences, allowing two-passes over the data,
|
|
|
|
|
rather than potentially compromise on accuracy for the sake of a one-pass
|
|
|
|
|
algorithm. Functions expect data will be passed as a list or other
|
|
|
|
|
sequence; if given an iterator, they may internally convert to a list.
|
|
|
|
|
|
|
|
|
|
- Functions should, as much as possible, honour any type of numeric data.
|
|
|
|
|
E.g. the mean of a list of Decimals should be a Decimal, not a float.
|
|
|
|
|
When this is not possible, treat float as the "lowest common data type".
|
|
|
|
|
|
|
|
|
|
- Although functions support data sets of floats, Decimals or Fractions,
|
|
|
|
|
there is no guarantee that *mixed* data sets will be supported. (But on
|
|
|
|
|
the other hand, they aren't explicitly rejected either.)
|
|
|
|
|
|
|
|
|
|
- Plenty of documentation, aimed at readers who understand the basic
|
|
|
|
|
concepts but may not know (for example) which variance they should use
|
|
|
|
|
(population or sample?). Mathematicians and statisticians have a terrible
|
|
|
|
|
habit of being inconsistent with both notation and terminology[19], and
|
|
|
|
|
having spent many hours making sense of the contradictory/confusing
|
|
|
|
|
definitions in use, it is only fair that I do my best to clarify rather
|
|
|
|
|
than obfuscate the topic.
|
|
|
|
|
|
|
|
|
|
- But avoid going into tedious[20] mathematical detail.
|
|
|
|
|
|
|
|
|
|
|
2013-09-09 00:21:27 -04:00
|
|
|
|
API
|
|
|
|
|
|
|
|
|
|
The initial version of the library will provide univariate (single
|
|
|
|
|
variable) statistics functions. The general API will be based on a
|
|
|
|
|
functional model ``function(data, ...) -> result``, where ``data``
|
|
|
|
|
is a mandatory iterable of (usually) numeric data.
|
|
|
|
|
|
|
|
|
|
The author expects that lists will be the most common data type used,
|
|
|
|
|
but any iterable type should be acceptable. Where necessary, functions
|
|
|
|
|
may convert to lists internally. Where possible, functions are
|
|
|
|
|
expected to conserve the type of the data values, for example, the mean
|
|
|
|
|
of a list of Decimals should be a Decimal rather than float.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Calculating mean, median and mode
|
|
|
|
|
|
|
|
|
|
The ``mean``, ``median*`` and ``mode`` functions take a single
|
|
|
|
|
mandatory argument and return the appropriate statistic, e.g.:
|
|
|
|
|
|
|
|
|
|
>>> mean([1, 2, 3])
|
|
|
|
|
2.0
|
|
|
|
|
|
|
|
|
|
Functions provided are:
|
|
|
|
|
|
|
|
|
|
* mean(data) -> arithmetic mean of data.
|
|
|
|
|
|
|
|
|
|
* median(data) -> median (middle value) of data, taking the
|
|
|
|
|
average of the two middle values when there are an even
|
|
|
|
|
number of values.
|
|
|
|
|
|
|
|
|
|
* median_high(data) -> high median of data, taking the
|
|
|
|
|
larger of the two middle values when the number of items
|
|
|
|
|
is even.
|
|
|
|
|
|
|
|
|
|
* median_low(data) -> low median of data, taking the smaller
|
|
|
|
|
of the two middle values when the number of items is even.
|
|
|
|
|
|
|
|
|
|
* median_grouped(data, interval=1) -> 50th percentile of
|
|
|
|
|
grouped data, using interpolation.
|
|
|
|
|
|
|
|
|
|
* mode(data) -> most common data point.
|
|
|
|
|
|
|
|
|
|
``mode`` is the sole exception to the rule that the data argument
|
|
|
|
|
must be numeric. It will also accept an iterable of nominal data,
|
|
|
|
|
such as strings.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Calculating variance and standard deviation
|
|
|
|
|
|
|
|
|
|
In order to be similar to scientific calculators, the statistics
|
|
|
|
|
module will include separate functions for population and sample
|
|
|
|
|
variance and standard deviation. All four functions have similar
|
|
|
|
|
signatures, with a single mandatory argument, an iterable of
|
|
|
|
|
numeric data, e.g.:
|
|
|
|
|
|
|
|
|
|
>>> variance([1, 2, 2, 2, 3])
|
|
|
|
|
0.5
|
|
|
|
|
|
|
|
|
|
All four functions also accept a second, optional, argument, the
|
|
|
|
|
mean of the data. This is modelled on a similar API provided by
|
|
|
|
|
the GNU Scientific Library[18]. There are three use-cases for
|
|
|
|
|
using this argument, in no particular order:
|
|
|
|
|
|
|
|
|
|
1) The value of the mean is known *a priori*.
|
|
|
|
|
|
|
|
|
|
2) You have already calculated the mean, and wish to avoid
|
|
|
|
|
calculating it again.
|
|
|
|
|
|
|
|
|
|
3) You wish to (ab)use the variance functions to calculate
|
|
|
|
|
the second moment about some given point other than the
|
|
|
|
|
mean.
|
|
|
|
|
|
|
|
|
|
In each case, it is the caller's responsibility to ensure that
|
|
|
|
|
given argument is meaningful.
|
|
|
|
|
|
|
|
|
|
Functions provided are:
|
|
|
|
|
|
|
|
|
|
* variance(data, xbar=None) -> sample variance of data,
|
|
|
|
|
optionally using xbar as the sample mean.
|
|
|
|
|
|
|
|
|
|
* stdev(data, xbar=None) -> sample standard deviation of
|
|
|
|
|
data, optionally using xbar as the sample mean.
|
|
|
|
|
|
|
|
|
|
* pvariance(data, mu=None) -> population variance of data,
|
|
|
|
|
optionally using mu as the population mean.
|
|
|
|
|
|
|
|
|
|
* pstdev(data, mu=None) -> population standard deviation of
|
|
|
|
|
data, optionally using mu as the population mean.
|
|
|
|
|
|
|
|
|
|
Other functions
|
|
|
|
|
|
|
|
|
|
There is one other public function:
|
|
|
|
|
|
|
|
|
|
* sum(data, start=0) -> high-precision sum of numeric data.
|
|
|
|
|
|
|
|
|
|
|
2013-08-09 10:46:53 -04:00
|
|
|
|
Specification
|
|
|
|
|
|
|
|
|
|
As the proposed reference implementation is in pure Python,
|
|
|
|
|
other Python implementations can easily make use of the module
|
|
|
|
|
unchanged, or adapt it as they see fit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
What Should Be The Name Of The Module?
|
|
|
|
|
|
|
|
|
|
This will be a top-level module "statistics".
|
|
|
|
|
|
|
|
|
|
There was some interest in turning math into a package, and making this a
|
|
|
|
|
sub-module of math, but the general consensus eventually agreed on a
|
|
|
|
|
top-level module. Other potential but rejected names included "stats" (too
|
|
|
|
|
much risk of confusion with existing "stat" module), and "statslib"
|
|
|
|
|
(described as "too C-like").
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Previous Discussions
|
|
|
|
|
|
|
|
|
|
This proposal has been previously discussed here[21].
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Frequently Asked Questions
|
|
|
|
|
|
|
|
|
|
Q: Shouldn't this module spend time on PyPI before being considered for
|
|
|
|
|
the standard library?
|
|
|
|
|
|
|
|
|
|
A: Older versions of this module have been available on PyPI[22] since
|
|
|
|
|
2010. Being much simpler than numpy, it does not require many years of
|
|
|
|
|
external development.
|
|
|
|
|
|
|
|
|
|
Q: Does the standard library really need yet another version of ``sum``?
|
|
|
|
|
|
|
|
|
|
A: This proved to be the most controversial part of the reference
|
|
|
|
|
implementation. In one sense, clearly three sums is two too many. But
|
|
|
|
|
in another sense, yes. The reasons why the two existing versions are
|
|
|
|
|
unsuitable are described here[23] but the short summary is:
|
|
|
|
|
|
|
|
|
|
- the built-in sum can lose precision with floats;
|
|
|
|
|
|
|
|
|
|
- the built-in sum accepts any non-numeric data type that supports
|
|
|
|
|
the + operator, apart from strings and bytes;
|
|
|
|
|
|
|
|
|
|
- math.fsum is high-precision, but coerces all arguments to float.
|
|
|
|
|
|
|
|
|
|
There is some interest in "fixing" one or the other of the existing
|
|
|
|
|
sums. If this occurs before 3.4 feature-freeze, the decision to keep
|
|
|
|
|
statistics.sum can be re-considered.
|
|
|
|
|
|
|
|
|
|
Q: Will this module be backported to older versions of Python?
|
|
|
|
|
|
|
|
|
|
A: The module currently targets 3.3, and I will make it available on PyPI
|
|
|
|
|
for 3.3 for the foreseeable future. Backporting to older versions of
|
|
|
|
|
the 3.x series is likely (but not yet decided). Backporting to 2.7 is
|
|
|
|
|
less likely but not ruled out.
|
|
|
|
|
|
|
|
|
|
Q: Is this supposed to replace numpy?
|
|
|
|
|
|
|
|
|
|
A: No. While it is likely to grow over the years (see open issues below)
|
|
|
|
|
it is not aimed to replace, or even compete directly with, numpy. Numpy
|
|
|
|
|
is a full-featured numeric library aimed at professionals, the nuclear
|
|
|
|
|
reactor of numeric libraries in the Python ecosystem. This is just a
|
|
|
|
|
battery, as in "batteries included", and is aimed at an intermediate
|
|
|
|
|
level somewhere between "use numpy" and "roll your own version".
|
|
|
|
|
|
|
|
|
|
|
2013-09-09 00:21:27 -04:00
|
|
|
|
Future Work
|
2013-08-09 10:46:53 -04:00
|
|
|
|
|
|
|
|
|
- At this stage, I am unsure of the best API for multivariate statistical
|
|
|
|
|
functions such as linear regression, correlation coefficient, and
|
|
|
|
|
covariance. Possible APIs include:
|
|
|
|
|
|
|
|
|
|
* Separate arguments for x and y data:
|
|
|
|
|
function([x0, x1, ...], [y0, y1, ...])
|
|
|
|
|
|
|
|
|
|
* A single argument for (x, y) data:
|
|
|
|
|
function([(x0, y0), (x1, y1), ...])
|
|
|
|
|
|
2013-09-09 00:21:27 -04:00
|
|
|
|
This API is preferred by GvR[24].
|
|
|
|
|
|
2013-08-09 10:46:53 -04:00
|
|
|
|
* Selecting arbitrary columns from a 2D array:
|
|
|
|
|
function([[a0, x0, y0, z0], [a1, x1, y1, z1], ...], x=1, y=2)
|
|
|
|
|
|
|
|
|
|
* Some combination of the above.
|
|
|
|
|
|
|
|
|
|
In the absence of a consensus of preferred API for multivariate stats,
|
|
|
|
|
I will defer including such multivariate functions until Python 3.5.
|
|
|
|
|
|
|
|
|
|
- Likewise, functions for calculating probability of random variables and
|
|
|
|
|
inference testing (e.g. Student's t-test) will be deferred until 3.5.
|
|
|
|
|
|
|
|
|
|
- There is considerable interest in including one-pass functions that can
|
|
|
|
|
calculate multiple statistics from data in iterator form, without having
|
|
|
|
|
to convert to a list. The experimental "stats" package on PyPI includes
|
|
|
|
|
co-routine versions of statistics functions. Including these will be
|
|
|
|
|
deferred to 3.5.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
|
|
|
|
|
|
[1] http://mail.python.org/pipermail/python-dev/2010-October/104721.html
|
|
|
|
|
|
|
|
|
|
[2] http://support.casio.com/pdf/004/CP330PLUSver310_Soft_E.pdf
|
|
|
|
|
|
|
|
|
|
[3] Gnumeric:
|
|
|
|
|
https://projects.gnome.org/gnumeric/functions.shtml
|
|
|
|
|
|
|
|
|
|
LibreOffice:
|
|
|
|
|
https://help.libreoffice.org/Calc/Statistical_Functions_Part_One
|
|
|
|
|
https://help.libreoffice.org/Calc/Statistical_Functions_Part_Two
|
|
|
|
|
https://help.libreoffice.org/Calc/Statistical_Functions_Part_Three
|
|
|
|
|
https://help.libreoffice.org/Calc/Statistical_Functions_Part_Four
|
|
|
|
|
https://help.libreoffice.org/Calc/Statistical_Functions_Part_Five
|
|
|
|
|
|
|
|
|
|
[4] Scipy: http://scipy-central.org/
|
|
|
|
|
Numpy: http://www.numpy.org/
|
|
|
|
|
|
|
|
|
|
[5] http://wiki.scipy.org/Numpy_Functions_by_Category
|
|
|
|
|
|
|
|
|
|
[6] Tested with numpy 1.6.1 and Python 2.7.
|
|
|
|
|
|
|
|
|
|
[7] http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/
|
|
|
|
|
|
|
|
|
|
[8] http://rosettacode.org/wiki/Standard_deviation
|
|
|
|
|
|
|
|
|
|
[9] https://bitbucket.org/larsyencken/simplestats/src/c42e048a6625/src/basic.py
|
|
|
|
|
|
|
|
|
|
[10] http://stackoverflow.com/questions/2341340/calculate-mean-and-variance-with-one-iteration
|
|
|
|
|
|
|
|
|
|
[11] http://www.r-project.org/
|
|
|
|
|
|
|
|
|
|
[12] http://msdn.microsoft.com/en-us/library/system.linq.enumerable.average.aspx
|
|
|
|
|
|
|
|
|
|
[13] https://www.bcg.wisc.edu/webteam/support/ruby/standard_deviation
|
|
|
|
|
|
|
|
|
|
[14] http://ruby-statsample.rubyforge.org/
|
|
|
|
|
|
|
|
|
|
[15] http://www.php.net/manual/en/ref.stats.php
|
|
|
|
|
|
|
|
|
|
[16] http://www.ayton.id.au/gary/it/Delphi/D_maths.htm#Delphi%20Statistical%20functions.
|
|
|
|
|
|
|
|
|
|
[17] http://www.gnu.org/software/gsl/manual/html_node/Statistics.html
|
|
|
|
|
|
|
|
|
|
[18] http://www.gnu.org/software/gsl/manual/html_node/Mean-and-standard-deviation-and-variance.html
|
|
|
|
|
|
|
|
|
|
[19] http://mathworld.wolfram.com/Skewness.html
|
|
|
|
|
|
|
|
|
|
[20] At least, tedious to those who don't like this sort of thing.
|
|
|
|
|
|
|
|
|
|
[21] http://mail.python.org/pipermail/python-ideas/2011-September/011524.html
|
|
|
|
|
|
|
|
|
|
[22] https://pypi.python.org/pypi/stats/
|
|
|
|
|
|
|
|
|
|
[23] http://mail.python.org/pipermail/python-ideas/2013-August/022630.html
|
|
|
|
|
|
2013-09-09 00:21:27 -04:00
|
|
|
|
[24] https://mail.python.org/pipermail/python-dev/2013-September/128429.html
|
|
|
|
|
|
2013-08-09 10:46:53 -04:00
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
|
|
|
|
coding: utf-8
|
|
|
|
|
End:
|