New version of pep 465 (matrix multiply @ operator). Added scan-ops.py.

This commit is contained in:
Guido van Rossum 2014-03-18 11:07:32 -07:00
parent 3328823e23
commit 77af598f5f
2 changed files with 305 additions and 150 deletions

View File

@ -1,5 +1,5 @@
PEP: 465
Title: Dedicated infix operators for matrix multiplication and matrix power
Title: A dedicated infix operator for matrix multiplication
Version: $Revision$
Last-Modified: $Date$
Author: Nathaniel J. Smith <njs@pobox.com>
@ -13,30 +13,31 @@ Post-History: 13-Mar-2014
Abstract
========
This PEP proposes two new binary operators dedicated to matrix
multiplication and matrix power, spelled ``@`` and ``@@``
respectively. (Mnemonic: ``@`` is ``*`` for mATrices.)
This PEP proposes a new binary operator to be used for matrix
multiplication, called ``@``. (Mnemonic: ``@`` is ``*`` for
mATrices.)
Specification
=============
Two new binary operators are added to the Python language, together
with corresponding in-place versions:
A new binary operator is added to the Python language, together
with the corresponding in-place version:
======= ========================= ===============================
Op Precedence/associativity Methods
======= ========================= ===============================
``@`` Same as ``*`` ``__matmul__``, ``__rmatmul__``
``@@`` Same as ``**`` ``__matpow__``, ``__rmatpow__``
``@`` *To be determined* ``__matmul__``, ``__rmatmul__``
``@=`` n/a ``__imatmul__``
``@@=`` n/a ``__imatpow__``
======= ========================= ===============================
No implementations of these methods are added to the builtin or
standard library types. However, a number of projects have reached
consensus on the recommended semantics for these operations; see
`Intended usage details`_ below.
`Intended usage details`_ below for details.
For details on how this operator will be implemented in CPython, see
`Implementation details`_.
Motivation
@ -90,9 +91,6 @@ operator:
and finally standardize on a single consensus duck type for all
numerical array objects.
And, given the existence of ``@``, it makes more sense than not to
have ``@@``, ``@=``, and ``@@=``, so they are added as well.
Background: What's wrong with the status quo?
---------------------------------------------
@ -138,12 +136,13 @@ at hand.
Matrix multiplication is more of a special case. It's only defined on
2d arrays (also known as "matrices"), and multiplication is the only
operation that has a meaningful "matrix" version -- "matrix addition"
operation that has an important "matrix" version -- "matrix addition"
is the same as elementwise addition; there is no such thing as "matrix
bitwise-or" or "matrix floordiv"; "matrix division" can be defined but
is not very useful, etc. However, matrix multiplication is still used
very heavily across all numerical application areas; mathematically,
it's one of the most fundamental operations there is.
bitwise-or" or "matrix floordiv"; "matrix division" and "matrix
to-the-power-of" can be defined but are not very useful, etc.
However, matrix multiplication is still used very heavily across all
numerical application areas; mathematically, it's one of the most
fundamental operations there is.
Because Python syntax currently allows for only a single
multiplication operator ``*``, libraries providing array-like objects
@ -533,37 +532,22 @@ and the bitwise operations.
But isn't it weird to add an operator with no stdlib uses?
----------------------------------------------------------
It's certainly unusual (though ``Ellipsis`` was also added without any
stdlib uses). But the important thing is whether a change will
benefit users, not where the software is being downloaded from. It's
clear from the above that ``@`` will be used, and used heavily. And
this PEP provides the critical piece that will allow the Python
It's certainly unusual (though extended slicing existed for some time
builtin types gained support for it, ``Ellipsis`` is still unused
within the stdlib, etc.). But the important thing is whether a change
will benefit users, not where the software is being downloaded from.
It's clear from the above that ``@`` will be used, and used heavily.
And this PEP provides the critical piece that will allow the Python
numerical community to finally reach consensus on a standard duck type
for all array-like objects, which is a necessary precondition to ever
adding a numerical array type to the stdlib.
Matrix power and in-place operators
-----------------------------------
The primary motivation for this PEP is ``@``; the other proposed
operators don't have nearly as much impact. The matrix power operator
``@@`` is useful and well-defined, but not really necessary. It is
still included, though, for consistency: if we have an ``@`` that is
analogous to ``*``, then it would be weird and surprising to *not*
have an ``@@`` that is analogous to ``**``. Similarly, the in-place
operators ``@=`` and ``@@=`` provide limited value -- it's more common
to write ``a = (b @ a)`` than it is to write ``a = (a @ b)``, and
in-place matrix operations still generally have to allocate
substantial temporary storage -- but they are included for
completeness and symmetry.
Compatibility considerations
============================
Currently, the only legal use of the ``@`` token in Python code is at
statement beginning in decorators. The new operators are all infix;
statement beginning in decorators. The new operators are both infix;
the one place they can never occur is at statement beginning.
Therefore, no existing code will be broken by the addition of these
operators, and there is no possible parsing ambiguity between
@ -583,7 +567,7 @@ Intended usage details
This section is informative, rather than normative -- it documents the
consensus of a number of libraries that provide array- or matrix-like
objects on how the ``@`` and ``@@`` operators will be implemented.
objects on how ``@`` will be implemented.
This section uses the numpy terminology for describing arbitrary
multidimensional arrays of data, because it is a superset of all other
@ -611,8 +595,8 @@ The recommended semantics for ``@`` for different inputs are:
* 2d inputs are conventional matrices, and so the semantics are
obvious: we apply conventional matrix multiplication. If we write
``arr(2, 3)`` to represent an arbitrary 2x3 array, then ``arr(3, 4)
@ arr(4, 5)`` returns an array with shape (3, 5).
``arr(2, 3)`` to represent an arbitrary 2x3 array, then ``arr(2, 3)
@ arr(3, 4)`` returns an array with shape (2, 4).
* 1d vector inputs are promoted to 2d by prepending or appending a '1'
to the shape, the operation is performed, and then the added
@ -705,36 +689,6 @@ The recommended semantics for ``@`` for different inputs are:
elementwise ``*`` operator. Allowing scalar @ matrix would thus
both require an unnecessary special case, and violate TOOWTDI.
The recommended semantics for ``@@`` are::
def __matpow__(self, n):
if not isinstance(n, numbers.Integral):
raise TypeError("@@ not implemented for fractional powers")
if n == 0:
return identity_matrix_with_shape(self.shape)
elif n < 0:
return inverse(self) @ (self @@ (n + 1))
else:
return self @ (self @@ (n - 1))
(Of course we expect that much more efficient implementations will be
used in practice.) Notice that if given an appropriate definition of
``identity_matrix_with_shape``, then this definition will
automatically handle >2d arrays appropriately. Notice also that with
this definition, ``vector @@ 2`` gives the squared Euclidean length of
the vector, a commonly used value. Also, while it is rarely useful to
explicitly compute inverses or other negative powers in standard
immediate-mode dense matrix code, these computations are natural when
doing symbolic or deferred-mode computations (as in e.g. sympy,
theano, numba, numexpr); therefore, negative powers are fully
supported. Fractional powers, though, bring in variety of
`mathematical complications`_, so we leave it to individual projects
to decide whether they want to try to define some reasonable semantics
for fractional inputs.
.. _`mathematical complications`:
https://en.wikipedia.org/wiki/Square_root_of_a_matrix
Adoption
--------
@ -743,12 +697,12 @@ We group existing Python projects which provide array- or matrix-like
types based on what API they currently use for elementwise and matrix
multiplication.
**Projects which currently use * for *elementwise* multiplication, and
function/method calls for *matrix* multiplication:**
**Projects which currently use * for elementwise multiplication, and
function/method calls for matrix multiplication:**
The developers of the following projects have expressed an intention
to implement ``@`` and ``@@`` on their array-like types using the
above semantics:
to implement ``@`` on their array-like types using the above
semantics:
* numpy
* pandas
@ -764,8 +718,8 @@ things:
* pycuda
* panda3d
**Projects which currently use * for *matrix* multiplication, and
function/method calls for *elementwise* multiplication:**
**Projects which currently use * for matrix multiplication, and
function/method calls for elementwise multiplication:**
The following projects have expressed an intention, if this PEP is
accepted, to migrate from their current API to the elementwise-``*``,
@ -784,8 +738,8 @@ eliminated if this PEP is accepted):
* cvxopt
**Projects which currently use * for *matrix* multiplication, and
which do not implement elementwise multiplication at all:**
**Projects which currently use * for matrix multiplication, and which
don't really care about elementwise multiplication of matrices:**
There are several projects which implement matrix types, but from a
very different perspective than the numerical libraries discussed
@ -796,14 +750,13 @@ numbers that need crunching. And it turns out that from the abstract
math point of view, there isn't much use for elementwise operations in
the first place; as discussed in the Background section above,
elementwise operations are motivated by the bag-of-numbers approach.
The different goals of these projects mean that they don't encounter
the basic problem that this PEP exists to address, making it mostly
irrelevant to them; while they appear superficially similar, they're
actually doing something quite different. They use ``*`` for matrix
So these projects don't encounter the basic problem that this PEP
exists to address, making it mostly irrelevant to them; while they
appear superficially similar to projects like numpy, they're actually
doing something quite different. They use ``*`` for matrix
multiplication (and for group actions, and so forth), and if this PEP
is accepted, their expressed intention is to continue doing so, while
perhaps adding ``@`` and ``@@`` on matrices as aliases for ``*`` and
``**``:
perhaps adding ``@`` as an alias. These projects include:
* sympy
* sage
@ -814,6 +767,23 @@ are not listed above, then please let the PEP author know:
njs@pobox.com
Implementation details
======================
New functions ``operator.matmul`` and ``operator.__matmul__`` are
added to the standard library, with the usual semantics.
A corresponding function ``PyObject* PyObject_MatrixMultiply(PyObject
*o1, PyObject o2)`` is added to the C API.
A new AST node is added named ``MatMult``, along with a new token
``ATEQUAL`` and new bytecode opcodes ``BINARY_MATRIX_MULTIPLY`` and
``INPLACE_MATRIX_MULTIPLY``.
Two new type slots are added; whether this is to ``PyNumberMethods``
or a new ``PyMatrixMethods`` struct remains to be determined.
Rationale for specification details
===================================
@ -848,16 +818,17 @@ better than ``@``. Some options that have been suggested include:
What we need, though, is an operator that means "matrix
multiplication, as opposed to scalar/elementwise multiplication".
There is no conventional symbol for these in mathematics or
programming, where these operations are usually distinguished by
context. (And U+2297 CIRCLED TIMES is actually used conventionally to
mean exactly the opposite: elementwise multiplication -- the "Hadamard
product" -- as opposed to matrix multiplication). ``@`` at least has
the virtue that it *looks* like a funny non-commutative operator; a
naive user who knows maths but not programming couldn't look at ``A *
B`` versus ``A × B``, or ``A * B`` versus ``A ⋅ B``, or ``A * B``
versus ``A ° B`` and guess which one is the usual multiplication, and
which one is the special case.
There is no conventional symbol with this meaning in either
programming or mathematics, where these operations are usually
distinguished by context. (And U+2297 CIRCLED TIMES is actually used
conventionally to mean exactly the wrong things: elementwise
multiplication -- the "Hadamard product" -- or outer product, rather
than matrix/inner product like our operator). ``@`` at least has the
virtue that it *looks* like a funny non-commutative operator; a naive
user who knows maths but not programming couldn't look at ``A * B``
versus ``A × B``, or ``A * B`` versus ``A ⋅ B``, or ``A * B`` versus
``A ° B`` and guess which one is the usual multiplication, and which
one is the special case.
Finally, there is the option of using multi-character tokens. Some
options:
@ -878,9 +849,9 @@ options:
be too easy to confuse with ``*+``, which is just multiplication
combined with the unary ``+`` operator.
* PEP 211 suggested ``~*`` and ``~**``. This has the downside that it
sort of suggests that there is a unary ``*`` operator that is being
combined with unary ``~``, but it could work.
* PEP 211 suggested ``~*``. This has the downside that it sort of
suggests that there is a unary ``*`` operator that is being combined
with unary ``~``, but it could work.
* R uses ``%*%`` for matrix multiplication. In R this forms part of a
general extensible infix system in which all tokens of the form
@ -888,12 +859,11 @@ options:
token without stealing the system.
* Some other plausible candidates that have been suggested: ``><`` (=
ascii drawing of the multiplication sign ×); the footnote operators
``[*]`` and ``[**]`` or ``|*|`` and ``|**|`` (but when used in
context, the use of vertical grouping symbols tends to recreate the
nested parentheses visual clutter that was noted as one of the major
downsides of the function syntax we're trying to get away from);
``^*`` and ``^^``.
ascii drawing of the multiplication sign ×); the footnote operator
``[*]`` or ``|*|`` (but when used in context, the use of vertical
grouping symbols tends to recreate the nested parentheses visual
clutter that was noted as one of the major downsides of the function
syntax we're trying to get away from); ``^*`` and ``^^``.
So, it doesn't matter much, but ``@`` seems as good or better than any
of the alternatives:
@ -911,8 +881,12 @@ of the alternatives:
* The mATrices mnemonic is cute.
* The use of a single-character token reduces the line-noise effect,
and makes ``@@`` possible, which is a nice bonus.
* The swirly shape is reminiscent of the simultaneous sweeps over rows
and columns that define matrix multiplication
* Its asymmetry is evocative of its non-commutative nature.
* Whatever, we have to pick something.
(Non)-Definitions for built-in types
@ -924,32 +898,41 @@ hierarchy, because these types represent scalars, and the consensus
semantics for ``@`` are that it should raise an error on scalars.
We do not -- for now -- define a ``__matmul__`` method on the standard
``memoryview`` or ``array.array`` objects, for several reasons.
First, there is currently no way to create multidimensional memoryview
objects using only the stdlib, and array objects cannot represent
multidimensional data at all, which makes ``__matmul__`` much less
useful. Second, providing a quality implementation of matrix
multiplication is highly non-trivial. Naive nested loop
implementations are very slow and providing one in CPython would just
create a trap for users. But the alternative -- providing a modern,
competitive matrix multiply -- would require that CPython link to a
BLAS library, which brings a set of new complications. In particular,
several popular BLAS libraries (including the one that ships by
default on OS X) currently break the use of ``multiprocessing``
[#blas-fork]_. And finally, we'd have to add quite a bit beyond
``__matmul__`` before ``memoryview`` or ``array.array`` would be
useful for numeric work -- like elementwise versions of the other
arithmetic operators, just to start. Put together, these
considerations mean that the cost/benefit of adding ``__matmul__`` to
these types just isn't there, so for now we'll continue to delegate
these problems to numpy and friends, and defer a more systematic
solution to a future proposal.
``memoryview`` or ``array.array`` objects, for several reasons. Of
course this could be added if someone wants it, but these types would
require quite a bit of additional work beyond ``__matmul__`` before
they could be used for numeric work -- e.g., they have no way to do
addition or scalar multiplication either! -- and adding such
functionality is beyond the scope of this PEP. In addition, providing
a quality implementation of matrix multiplication is highly
non-trivial. Naive nested loop implementations are very slow and
shipping such an implementation in CPython would just create a trap
for users. But the alternative -- providing a modern, competitive
matrix multiply -- would require that CPython link to a BLAS library,
which brings a set of new complications. In particular, several
popular BLAS libraries (including the one that ships by default on
OS X) currently break the use of ``multiprocessing`` [#blas-fork]_.
Together, these considerations mean that the cost/benefit of adding
``__matmul__`` to these types just isn't there, so for now we'll
continue to delegate these problems to numpy and friends, and defer a
more systematic solution to a future proposal.
There are also non-numeric Python builtins which define ``__mul__``
(``str``, ``list``, ...). We do not define ``__matmul__`` for these
types either, because why would we even do that.
Non-definition of matrix power
------------------------------
Earlier versions of this PEP also proposed a matrix power operator,
``@@``, analogous to ``**``. But on further consideration, it was
decided that the utility of this was sufficiently unclear that it
would be better to leave it out for now, and only revisit the issue if
-- once we have more experience with ``@`` -- it turns out that ``@@``
is truly missed. [#atat-discussion]
Unresolved issues
-----------------
@ -1026,15 +1009,16 @@ be.)
general Python, and then overload it in numeric code:** This was the
approach taken by PEP 211, which proposed defining ``@`` to be the
equivalent of ``itertools.product``. The problem with this is that
when taken on its own terms, adding an infix operator for
``itertools.product`` is just silly. (During discussions of this PEP,
a similar suggestion was made to define ``@`` as a general purpose
function composition operator, and this suffers from the same problem;
``functools.compose`` isn't even useful enough to exist.) Matrix
multiplication has a uniquely strong rationale for inclusion as an
infix operator. There almost certainly don't exist any other binary
operations that will ever justify adding any other infix operators to
Python.
when taken on its own terms, it's pretty clear that
``itertools.product`` doesn't actually need a dedicated operator. It
hasn't even been deemed worth of a builtin. (During discussions of
this PEP, a similar suggestion was made to define ``@`` as a general
purpose function composition operator, and this suffers from the same
problem; ``functools.compose`` isn't even useful enough to exist.)
Matrix multiplication has a uniquely strong rationale for inclusion as
an infix operator. There almost certainly don't exist any other
binary operations that will ever justify adding any other infix
operators to Python.
**Add a .dot method to array types so as to allow "pseudo-infix"
A.dot(B) syntax:** This has been in numpy for some years, and in many
@ -1101,18 +1085,60 @@ these magic incantations that they're learning, when along comes an
evil hack like this that violates that system, creates bizarre error
messages when accidentally misused, and whose underlying mechanisms
can't be understood without deep knowledge of how object oriented
systems work. We've considered promoting this as a general solution,
and perhaps if the PEP is rejected we'll revisit this option, but so
far the numeric community has mostly elected to leave this one on the
shelf.
systems work.
**Use a special "facade" type to support syntax like arr.M * arr:**
This is very similar to the previous proposal, in that the ``.M``
attribute would basically return the same object as ``arr *dot` would,
and thus suffers the same objections about 'magicalness'. This
approach also has some non-obvious complexities: for example, while
``arr.M * arr`` must return an array, ``arr.M * arr.M`` and ``arr *
arr.M`` must return facade objects, or else ``arr.M * arr.M * arr``
and ``arr * arr.M * arr`` will not work. But this means that facade
objects must be able to recognize both other array objects and other
facade objects (which creates additional complexity for writing
interoperating array types from different libraries who must now
recognize both each other's array types and their facade types). It
also creates pitfalls for users who may easily type ``arr * arr.M`` or
``arr.M * arr.M`` and expect to get back an array object; instead,
they will get a mysterious object that throws errors when they attempt
to use it. Basically with this approach users must be careful to
think of ``.M*`` as an indivisible unit that acts as an infix operator
-- and as infix-operator-like token strings go, at least ``*dot*``
is prettier looking (look at its cute little ears!).
Discussions of this PEP
=======================
Collected here for reference:
* Github pull request containing much of the original discussion and
drafting: https://github.com/numpy/numpy/pull/4351
* sympy mailing list discussions of an early draft:
* https://groups.google.com/forum/#!topic/sympy/22w9ONLa7qo
* https://groups.google.com/forum/#!topic/sympy/4tGlBGTggZY
* sage-devel mailing list discussions of an early draft:
https://groups.google.com/forum/#!topic/sage-devel/YxEktGu8DeM
* 13-Mar-2014 python-ideas thread:
https://mail.python.org/pipermail/python-ideas/2014-March/027053.html
* numpy-discussion thread on whether to keep ``@@``:
http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069448.html
* numpy-discussion thread on precedence/associativity of ``@``:
http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069444.html
References
==========
.. [#preprocessor] From a comment by GvR on a G+ post by GvR; the
comment itself does not seem to be directly linkable:
https://plus.google.com/115212051037621986145/posts/hZVVtJ9bK3u
comment itself does not seem to be directly linkable: https://plus.google.com/115212051037621986145/posts/hZVVtJ9bK3u
.. [#infix-hack] http://code.activestate.com/recipes/384122-infix-operators/
http://www.sagemath.org/doc/reference/misc/sage/misc/decorators.html#sage.misc.decorators.infix_operator
.. [#scipy-conf] http://conference.scipy.org/past.html
@ -1130,8 +1156,7 @@ References
average. Compare to eq. 2.139 in
http://sfb649.wiwi.hu-berlin.de/fedc_homepage/xplore/tutorials/xegbohtmlnode17.html
Example code is adapted from
https://github.com/rerpy/rerpy/blob/0d274f85e14c3b1625acb22aed1efa85d122ecb7/rerpy/incremental_ls.py#L202
Example code is adapted from https://github.com/rerpy/rerpy/blob/0d274f85e14c3b1625acb22aed1efa85d122ecb7/rerpy/incremental_ls.py#L202
.. [#pycon-tutorials] Out of the 36 tutorials scheduled for PyCon 2014
(https://us.pycon.org/2014/schedule/tutorials/), we guess that the
@ -1177,9 +1202,8 @@ References
from Python 3.2.3 to examine the tokens in all files ending ``.py``
underneath some directory. Only tokens which occur at least once
in the source trees are included in the table. The counting script
will be available as an auxiliary file once this PEP is submitted;
until then, it can be found here:
https://gist.github.com/njsmith/9157645
is available `in the PEP repository
<http://hg.python.org/peps/file/tip/pep-0465/scan-ops.py>`_.
Matrix multiply counts were estimated by counting how often certain
tokens which are used as matrix multiply function names occurred in
@ -1216,8 +1240,7 @@ References
.. [#broadcasting] http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html
.. [#matmul-other-langs]
http://mail.scipy.org/pipermail/scipy-user/2014-February/035499.html
.. [#matmul-other-langs] http://mail.scipy.org/pipermail/scipy-user/2014-February/035499.html
.. [#github-details] Counts were produced by manually entering the
string ``"import foo"`` or ``"from foo import"`` (with quotes) into

132
scan-ops.py Normal file
View File

@ -0,0 +1,132 @@
#!/usr/bin/env python3
# http://legacy.python.org/dev/peps/pep-0465/
# https://gist.github.com/njsmith/9157645
# usage:
# python3 scan-ops.py stdlib_path sklearn_path nipy_path
import sys
import os
import os.path
import tokenize
from collections import OrderedDict
NON_SOURCE_TOKENS = [
tokenize.COMMENT, tokenize.NL, tokenize.ENCODING, tokenize.NEWLINE,
tokenize.INDENT, tokenize.DEDENT,
]
SKIP_OPS = list("(),.:[]{}@;") + ["->", "..."]
class TokenCounts(object):
def __init__(self, dot_names=[]):
self.counts = {}
self.sloc = 0
self.dot_names = dot_names
def count(self, path):
sloc_idxes = set()
for token in tokenize.tokenize(open(path, "rb").readline):
if token.type == tokenize.OP:
self.counts.setdefault(token.string, 0)
self.counts[token.string] += 1
if token.string in self.dot_names:
self.counts.setdefault("dot", 0)
self.counts["dot"] += 1
if token.type not in NON_SOURCE_TOKENS:
sloc_idxes.add(token.start[0])
self.sloc += len(sloc_idxes)
@classmethod
def combine(cls, objs):
combined = cls()
for obj in objs:
for op, count in obj.counts.items():
combined.counts.setdefault(op, 0)
combined.counts[op] += count
combined.sloc += obj.sloc
return combined
def count_tree(root, **kwargs):
c = TokenCounts(**kwargs)
for dirpath, _, filenames in os.walk(root):
for filename in filenames:
if filename.endswith(".py"):
path = os.path.join(dirpath, filename)
try:
c.count(path)
sys.stderr.write(".")
sys.stderr.flush()
except Exception as e:
sys.stderr.write("\nFailed to read %s: %s\n" % (path, e))
return c
# count_objs is OrderedDict (name -> TokenCounts)
def summarize(count_objs, out):
ops = {}
for count_obj in count_objs.values():
for op in count_obj.counts:
ops[op] = []
for count_obj in count_objs.values():
for op, row in ops.items():
count = count_obj.counts.get(op, 0)
row.append(count / count_obj.sloc)
titles = ["Op"] + list(count_objs)
# 4 chars is enough for ops and all numbers.
column_widths = [max(len(title), 4) for title in titles]
rows = []
for op, row in ops.items():
#rows.append(["``" + op + "``"] + row)
rows.append([op] + row)
rows.sort(key=lambda row: row[-1])
rows.reverse()
def write_row(entries):
out.write(" ".join(entries))
out.write("\n")
def lines():
write_row("=" * w for w in column_widths)
lines()
write_row(t.rjust(w) for w, t in zip(column_widths, titles))
lines()
for row in rows:
op = row[0]
if op in SKIP_OPS:
continue
# numbers here are avg number of uses per sloc, which is
# inconveniently small. convert to uses/1e4 sloc
numbers = row[1:]
number_strs = [str(int(round(x * 10000))) for x in numbers]
formatted_row = [op] + number_strs
write_row(str(e).rjust(w)
for w, e in zip(column_widths, formatted_row))
lines()
def run_projects(names, dot_names, dirs, out):
assert len(names) == len(dot_names) == len(dirs)
count_objs = OrderedDict()
for name, dot_name, dir in zip(names, dot_names, dirs):
counts = count_tree(dir, dot_names=dot_name)
count_objs[name] = counts
out.write("%s: %s sloc\n" % (name, counts.sloc))
count_objs["combined"] = TokenCounts.combine(count_objs.values())
summarize(count_objs, out)
if __name__ == "__main__":
run_projects(["stdlib", "scikit-learn", "nipy"],
[[],
# https://github.com/numpy/numpy/pull/4351#discussion_r9977913
# sklearn fast_dot is used to fix up some optimizations that
# are missing from older numpy's, but in modern days is
# exactly the same, so it's fair to count. safe_sparse_dot
# has hacks to workaround some quirks in scipy.sparse
# matrices, but these quirks are also already fixed, so
# counting this calls is also fair.
["dot", "fast_dot", "safe_sparse_dot"],
["dot"]],
sys.argv[1:],
sys.stdout)