2000-07-15 19:28:27 -04:00
|
|
|
|
PEP: 209
|
2001-02-15 18:01:53 -05:00
|
|
|
|
Title: Multi-dimensional Arrays
|
2000-07-15 19:28:27 -04:00
|
|
|
|
Version: $Revision$
|
2006-03-23 15:13:19 -05:00
|
|
|
|
Last-Modified: $Date$
|
2001-02-15 18:01:53 -05:00
|
|
|
|
Author: barrett@stsci.edu (Paul Barrett), oliphant@ee.byu.edu (Travis Oliphant)
|
2006-04-27 23:39:57 -04:00
|
|
|
|
Status: Withdrawn
|
2000-08-23 01:44:46 -04:00
|
|
|
|
Type: Standards Track
|
2001-02-15 18:01:53 -05:00
|
|
|
|
Created: 03-Jan-2001
|
2007-06-19 00:20:07 -04:00
|
|
|
|
Python-Version: 2.2
|
2017-03-24 17:11:33 -04:00
|
|
|
|
Post-History:
|
2001-02-15 18:01:53 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
|
|
|
|
|
This PEP proposes a redesign and re-implementation of the multi-
|
|
|
|
|
dimensional array module, Numeric, to make it easier to add new
|
|
|
|
|
features and functionality to the module. Aspects of Numeric 2
|
|
|
|
|
that will receive special attention are efficient access to arrays
|
|
|
|
|
exceeding a gigabyte in size and composed of inhomogeneous data
|
|
|
|
|
structures or records. The proposed design uses four Python
|
|
|
|
|
classes: ArrayType, UFunc, Array, and ArrayView; and a low-level
|
|
|
|
|
C-extension module, _ufunc, to handle the array operations
|
|
|
|
|
efficiently. In addition, each array type has its own C-extension
|
|
|
|
|
module which defines the coercion rules, operations, and methods
|
|
|
|
|
for that type. This design enables new types, features, and
|
|
|
|
|
functionality to be added in a modular fashion. The new version
|
|
|
|
|
will introduce some incompatibilities with the current Numeric.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Motivation
|
|
|
|
|
|
|
|
|
|
Multi-dimensional arrays are commonly used to store and manipulate
|
|
|
|
|
data in science, engineering, and computing. Python currently has
|
|
|
|
|
an extension module, named Numeric (henceforth called Numeric 1),
|
|
|
|
|
which provides a satisfactory set of functionality for users
|
|
|
|
|
manipulating homogeneous arrays of data of moderate size (of order
|
|
|
|
|
10 MB). For access to larger arrays (of order 100 MB or more) of
|
|
|
|
|
possibly inhomogeneous data, the implementation of Numeric 1 is
|
|
|
|
|
inefficient and cumbersome. In the future, requests by the
|
|
|
|
|
Numerical Python community for additional functionality is also
|
|
|
|
|
likely as PEPs 211: Adding New Linear Operators to Python, and
|
|
|
|
|
225: Elementwise/Objectwise Operators illustrate.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Proposal
|
|
|
|
|
|
|
|
|
|
This proposal recommends a re-design and re-implementation of
|
|
|
|
|
Numeric 1, henceforth called Numeric 2, which will enable new
|
|
|
|
|
types, features, and functionality to be added in an easy and
|
|
|
|
|
modular manner. The initial design of Numeric 2 should focus on
|
|
|
|
|
providing a generic framework for manipulating arrays of various
|
|
|
|
|
types and should enable a straightforward mechanism for adding new
|
|
|
|
|
array types and UFuncs. Functional methods that are more specific
|
|
|
|
|
to various disciplines can then be layered on top of this core.
|
|
|
|
|
This new module will still be called Numeric and most of the
|
|
|
|
|
behavior found in Numeric 1 will be preserved.
|
|
|
|
|
|
|
|
|
|
The proposed design uses four Python classes: ArrayType, UFunc,
|
|
|
|
|
Array, and ArrayView; and a low-level C-extension module to handle
|
|
|
|
|
the array operations efficiently. In addition, each array type
|
|
|
|
|
has its own C-extension module which defines the coercion rules,
|
|
|
|
|
operations, and methods for that type. At a later date, when core
|
|
|
|
|
functionality is stable, some Python classes can be converted to
|
|
|
|
|
C-extension types.
|
|
|
|
|
|
|
|
|
|
Some planned features are:
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
1. Improved memory usage
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
This feature is particularly important when handling large arrays
|
|
|
|
|
and can produce significant improvements in performance as well as
|
|
|
|
|
memory usage. We have identified several areas where memory usage
|
|
|
|
|
can be improved:
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
a. Use a local coercion model
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
Instead of using Python's global coercion model which creates
|
|
|
|
|
temporary arrays, Numeric 2, like Numeric 1, will implement a
|
|
|
|
|
local coercion model as described in PEP 208 which defers the
|
|
|
|
|
responsibility of coercion to the operator. By using internal
|
|
|
|
|
buffers, a coercion operation can be done for each array
|
|
|
|
|
(including output arrays), if necessary, at the time of the
|
|
|
|
|
operation. Benchmarks [1] have shown that performance is at
|
|
|
|
|
most degraded only slightly and is improved in cases where the
|
|
|
|
|
internal buffers are less than the L2 cache size and the
|
|
|
|
|
processor is under load. To avoid array coercion altogether,
|
|
|
|
|
C functions having arguments of mixed type are allowed in
|
|
|
|
|
Numeric 2.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
b. Avoid creation of temporary arrays
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
In complex array expressions (i.e. having more than one
|
|
|
|
|
operation), each operation will create a temporary array which
|
|
|
|
|
will be used and then deleted by the succeeding operation. A
|
|
|
|
|
better approach would be to identify these temporary arrays
|
|
|
|
|
and reuse their data buffers when possible, namely when the
|
|
|
|
|
array shape and type are the same as the temporary array being
|
|
|
|
|
created. This can be done by checking the temporary array's
|
|
|
|
|
reference count. If it is 1, then it will be deleted once the
|
|
|
|
|
operation is done and is a candidate for reuse.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
c. Optional use of memory-mapped files
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
Numeric users sometimes need to access data from very large
|
|
|
|
|
files or to handle data that is greater than the available
|
|
|
|
|
memory. Memory-mapped arrays provide a mechanism to do this
|
|
|
|
|
by storing the data on disk while making it appear to be in
|
|
|
|
|
memory. Memory- mapped arrays should improve access to all
|
|
|
|
|
files by eliminating one of two copy steps during a file
|
|
|
|
|
access. Numeric should be able to access in-memory and
|
|
|
|
|
memory-mapped arrays transparently.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
d. Record access
|
|
|
|
|
|
|
|
|
|
In some fields of science, data is stored in files as binary
|
2016-07-11 11:14:08 -04:00
|
|
|
|
records. For example, in astronomy, photon data is stored as a
|
2001-02-15 18:01:53 -05:00
|
|
|
|
1 dimensional list of photons in order of arrival time. These
|
|
|
|
|
records or C-like structures contain information about the
|
|
|
|
|
detected photon, such as its arrival time, its position on the
|
|
|
|
|
detector, and its energy. Each field may be of a different
|
|
|
|
|
type, such as char, int, or float. Such arrays introduce new
|
|
|
|
|
issues that must be dealt with, in particular byte alignment
|
|
|
|
|
or byte swapping may need to be performed for the numeric
|
|
|
|
|
values to be properly accessed (though byte swapping is also
|
|
|
|
|
an issue for memory mapped data). Numeric 2 is designed to
|
|
|
|
|
automatically handle alignment and representational issues
|
|
|
|
|
when data is accessed or operated on. There are two
|
|
|
|
|
approaches to implementing records; as either a derived array
|
|
|
|
|
class or a special array type, depending on your point-of-
|
|
|
|
|
view. We defer this discussion to the Open Issues section.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
2. Additional array types
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
Numeric 1 has 11 defined types: char, ubyte, sbyte, short, int,
|
|
|
|
|
long, float, double, cfloat, cdouble, and object. There are no
|
|
|
|
|
ushort, uint, or ulong types, nor are there more complex types
|
|
|
|
|
such as a bit type which is of use to some fields of science and
|
|
|
|
|
possibly for implementing masked-arrays. The design of Numeric 1
|
|
|
|
|
makes the addition of these and other types a difficult and
|
|
|
|
|
error-prone process. To enable the easy addition (and deletion)
|
|
|
|
|
of new array types such as a bit type described below, a re-design
|
|
|
|
|
of Numeric is necessary.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
a. Bit type
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
The result of a rich comparison between arrays is an array of
|
|
|
|
|
boolean values. The result can be stored in an array of type
|
|
|
|
|
char, but this is an unnecessary waste of memory. A better
|
|
|
|
|
implementation would use a bit or boolean type, compressing
|
|
|
|
|
the array size by a factor of eight. This is currently being
|
|
|
|
|
implemented for Numeric 1 (by Travis Oliphant) and should be
|
|
|
|
|
included in Numeric 2.
|
|
|
|
|
|
|
|
|
|
3. Enhanced array indexing syntax
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
The extended slicing syntax was added to Python to provide greater
|
|
|
|
|
flexibility when manipulating Numeric arrays by allowing
|
|
|
|
|
step-sizes greater than 1. This syntax works well as a shorthand
|
|
|
|
|
for a list of regularly spaced indices. For those situations
|
|
|
|
|
where a list of irregularly spaced indices are needed, an enhanced
|
|
|
|
|
array indexing syntax would allow 1-D arrays to be arguments.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
4. Rich comparisons
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
The implementation of PEP 207: Rich Comparisons in Python 2.1
|
|
|
|
|
provides additional flexibility when manipulating arrays. We
|
|
|
|
|
intend to implement this feature in Numeric 2.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
5. Array broadcasting rules
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
When an operation between a scalar and an array is done, the
|
|
|
|
|
implied behavior is to create a new array having the same shape as
|
|
|
|
|
the array operand containing the scalar value. This is called
|
|
|
|
|
array broadcasting. It also works with arrays of lesser rank,
|
|
|
|
|
such as vectors. This implicit behavior is implemented in Numeric
|
|
|
|
|
1 and will also be implemented in Numeric 2.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Design and Implementation
|
|
|
|
|
|
|
|
|
|
The design of Numeric 2 has four primary classes:
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
1. ArrayType:
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
This is a simple class that describes the fundamental properties
|
|
|
|
|
of an array-type, e.g. its name, its size in bytes, its coercion
|
|
|
|
|
relations with respect to other types, etc., e.g.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
> Int32 = ArrayType('Int32', 4, 'doc-string')
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
Its relation to the other types is defined when the C-extension
|
|
|
|
|
module for that type is imported. The corresponding Python code
|
|
|
|
|
is:
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
> Int32.astype[Real64] = Real64
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
This says that the Real64 array-type has higher priority than the
|
|
|
|
|
Int32 array-type.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
The following attributes and methods are proposed for the core
|
|
|
|
|
implementation. Additional attributes can be added on an
|
|
|
|
|
individual basis, e.g. .bitsize or .bitstrides for the bit type.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
Attributes:
|
|
|
|
|
.name: e.g. "Int32", "Float64", etc.
|
|
|
|
|
.typecode: e.g. 'i', 'f', etc.
|
|
|
|
|
(for backward compatibility)
|
|
|
|
|
.size (in bytes): e.g. 4, 8, etc.
|
|
|
|
|
.array_rules (mapping): rules between array types
|
|
|
|
|
.pyobj_rules (mapping): rules between array and python types
|
|
|
|
|
.doc: documentation string
|
|
|
|
|
Methods:
|
|
|
|
|
__init__(): initialization
|
|
|
|
|
__del__(): destruction
|
|
|
|
|
__repr__(): representation
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
C-API:
|
|
|
|
|
This still needs to be fleshed-out.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
2. UFunc:
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
This class is the heart of Numeric 2. Its design is similar to
|
|
|
|
|
that of ArrayType in that the UFunc creates a singleton callable
|
|
|
|
|
object whose attributes are name, total and input number of
|
|
|
|
|
arguments, a document string, and an empty CFunc dictionary; e.g.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
> add = UFunc('add', 3, 2, 'doc-string')
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
When defined the add instance has no C functions associated with
|
|
|
|
|
it and therefore can do no work. The CFunc dictionary is
|
|
|
|
|
populated or registered later when the C-extension module for an
|
|
|
|
|
array-type is imported. The arguments of the register method are:
|
|
|
|
|
function name, function descriptor, and the CUFunc object. The
|
|
|
|
|
corresponding Python code is
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
> add.register('add', (Int32, Int32, Int32), cfunc-add)
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
In the initialization function of an array type module, e.g.
|
|
|
|
|
Int32, there are two C API functions: one to initialize the
|
|
|
|
|
coercion rules and the other to register the CFunc objects.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
When an operation is applied to some arrays, the __call__ method
|
|
|
|
|
is invoked. It gets the type of each array (if the output array
|
|
|
|
|
is not given, it is created from the coercion rules) and checks
|
|
|
|
|
the CFunc dictionary for a key that matches the argument types.
|
|
|
|
|
If it exists the operation is performed immediately, otherwise the
|
|
|
|
|
coercion rules are used to search for a related operation and set
|
|
|
|
|
of conversion functions. The __call__ method then invokes a
|
|
|
|
|
compute method written in C to iterate over slices of each array,
|
|
|
|
|
namely:
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
> _ufunc.compute(slice, data, func, swap, conv)
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
The 'func' argument is a CFuncObject, while the 'swap' and 'conv'
|
|
|
|
|
arguments are lists of CFuncObjects for those arrays needing pre-
|
|
|
|
|
or post-processing, otherwise None is used. The data argument is
|
|
|
|
|
a list of buffer objects, and the slice argument gives the number
|
|
|
|
|
of iterations for each dimension along with the buffer offset and
|
|
|
|
|
step size for each array and each dimension.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
We have predefined several UFuncs for use by the __call__ method:
|
|
|
|
|
cast, swap, getobj, and setobj. The cast and swap functions do
|
|
|
|
|
coercion and byte-swapping, respectively and the getobj and setobj
|
|
|
|
|
functions do coercion between Numeric arrays and Python sequences.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
The following attributes and methods are proposed for the core
|
|
|
|
|
implementation.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
Attributes:
|
|
|
|
|
.name: e.g. "add", "subtract", etc.
|
|
|
|
|
.nargs: number of total arguments
|
|
|
|
|
.iargs: number of input arguments
|
|
|
|
|
.cfuncs (mapping): the set C functions
|
|
|
|
|
.doc: documentation string
|
|
|
|
|
Methods:
|
|
|
|
|
__init__(): initialization
|
|
|
|
|
__del__(): destruction
|
|
|
|
|
__repr__(): representation
|
|
|
|
|
__call__(): look-up and dispatch method
|
|
|
|
|
initrule(): initialize coercion rule
|
|
|
|
|
uninitrule(): uninitialize coercion rule
|
|
|
|
|
register(): register a CUFunc
|
|
|
|
|
unregister(): unregister a CUFunc
|
|
|
|
|
|
|
|
|
|
C-API:
|
|
|
|
|
This still needs to be fleshed-out.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
3. Array:
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
This class contains information about the array, such as shape,
|
|
|
|
|
type, endian-ness of the data, etc.. Its operators, '+', '-',
|
|
|
|
|
etc. just invoke the corresponding UFunc function, e.g.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
> def __add__(self, other):
|
|
|
|
|
> return ufunc.add(self, other)
|
|
|
|
|
|
|
|
|
|
The following attributes, methods, and functions are proposed for
|
|
|
|
|
the core implementation.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
Attributes:
|
|
|
|
|
.shape: shape of the array
|
|
|
|
|
.format: type of the array
|
|
|
|
|
.real (only complex): real part of a complex array
|
|
|
|
|
.imag (only complex): imaginary part of a complex array
|
|
|
|
|
Methods:
|
|
|
|
|
__init__(): initialization
|
|
|
|
|
__del__(): destruction
|
|
|
|
|
__repr_(): representation
|
|
|
|
|
__str__(): pretty representation
|
|
|
|
|
__cmp__(): rich comparison
|
|
|
|
|
__len__():
|
|
|
|
|
__getitem__():
|
|
|
|
|
__setitem__():
|
|
|
|
|
__getslice__():
|
|
|
|
|
__setslice__():
|
|
|
|
|
numeric methods:
|
|
|
|
|
copy(): copy of array
|
|
|
|
|
aslist(): create list from array
|
|
|
|
|
asstring(): create string from array
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
Functions:
|
|
|
|
|
fromlist(): create array from sequence
|
|
|
|
|
fromstring(): create array from string
|
|
|
|
|
array(): create array with shape and value
|
|
|
|
|
concat(): concatenate two arrays
|
|
|
|
|
resize(): resize array
|
|
|
|
|
|
|
|
|
|
C-API:
|
|
|
|
|
This still needs to be fleshed-out.
|
|
|
|
|
|
|
|
|
|
4. ArrayView
|
|
|
|
|
|
|
|
|
|
This class is similar to the Array class except that the reshape
|
|
|
|
|
and flat methods will raise exceptions, since non-contiguous
|
|
|
|
|
arrays cannot be reshaped or flattened using just pointer and
|
|
|
|
|
step-size information.
|
|
|
|
|
|
|
|
|
|
C-API:
|
|
|
|
|
This still needs to be fleshed-out.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
5. C-extension modules:
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
Numeric2 will have several C-extension modules.
|
|
|
|
|
|
|
|
|
|
a. _ufunc:
|
|
|
|
|
|
|
|
|
|
The primary module of this set is the _ufuncmodule.c. The
|
|
|
|
|
intention of this module is to do the bare minimum,
|
|
|
|
|
i.e. iterate over arrays using a specified C function. The
|
|
|
|
|
interface of these functions is the same as Numeric 1, i.e.
|
|
|
|
|
|
|
|
|
|
int (*CFunc)(char *data, int *steps, int repeat, void *func);
|
|
|
|
|
|
|
|
|
|
and their functionality is expected to be the same, i.e. they
|
|
|
|
|
iterate over the inner-most dimension.
|
|
|
|
|
|
|
|
|
|
The following attributes and methods are proposed for the core
|
|
|
|
|
implementation.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
Attributes:
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
Methods:
|
|
|
|
|
compute():
|
|
|
|
|
|
|
|
|
|
C-API:
|
|
|
|
|
This still needs to be fleshed-out.
|
|
|
|
|
|
|
|
|
|
b. _int32, _real64, etc.:
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
There will also be C-extension modules for each array type,
|
|
|
|
|
e.g. _int32module.c, _real64module.c, etc. As mentioned
|
|
|
|
|
previously, when these modules are imported by the UFunc
|
|
|
|
|
module, they will automatically register their functions and
|
|
|
|
|
coercion rules. New or improved versions of these modules can
|
|
|
|
|
be easily implemented and used without affecting the rest of
|
|
|
|
|
Numeric 2.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Open Issues
|
|
|
|
|
|
|
|
|
|
1. Does slicing syntax default to copy or view behavior?
|
|
|
|
|
|
|
|
|
|
The default behavior of Python is to return a copy of a sub-list
|
|
|
|
|
or tuple when slicing syntax is used, whereas Numeric 1 returns a
|
|
|
|
|
view into the array. The choice made for Numeric 1 is apparently
|
|
|
|
|
for reasons of performance: the developers wish to avoid the
|
|
|
|
|
penalty of allocating and copying the data buffer during each
|
|
|
|
|
array operation and feel that the need for a deep copy of an array
|
|
|
|
|
to be rare. Yet, some have argued that Numeric's slice notation
|
|
|
|
|
should also have copy behavior to be consistent with Python lists.
|
|
|
|
|
In this case the performance penalty associated with copy behavior
|
|
|
|
|
can be minimized by implementing copy-on-write. This scheme has
|
|
|
|
|
both arrays sharing one data buffer (as in view behavior) until
|
|
|
|
|
either array is assigned new data at which point a copy of the
|
|
|
|
|
data buffer is made. View behavior would then be implemented by
|
|
|
|
|
an ArrayView class, whose behavior be similar to Numeric 1 arrays,
|
|
|
|
|
i.e. .shape is not settable for non-contiguous arrays. The use of
|
|
|
|
|
an ArrayView class also makes explicit what type of data the array
|
|
|
|
|
contains.
|
|
|
|
|
|
|
|
|
|
2. Does item syntax default to copy or view behavior?
|
|
|
|
|
|
|
|
|
|
A similar question arises with the item syntax. For example, if a
|
|
|
|
|
= [[0,1,2], [3,4,5]] and b = a[0], then changing b[0] also changes
|
|
|
|
|
a[0][0], because a[0] is a reference or view of the first row of
|
|
|
|
|
a. Therefore, if c is a 2-d array, it would appear that c[i]
|
|
|
|
|
should return a 1-d array which is a view into, instead of a copy
|
|
|
|
|
of, c for consistency. Yet, c[i] can be considered just a
|
|
|
|
|
shorthand for c[i,:] which would imply copy behavior assuming
|
|
|
|
|
slicing syntax returns a copy. Should Numeric 2 behave the same
|
|
|
|
|
way as lists and return a view or should it return a copy.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
3. How is scalar coercion implemented?
|
|
|
|
|
|
|
|
|
|
Python has fewer numeric types than Numeric which can cause
|
2016-07-11 11:14:08 -04:00
|
|
|
|
coercion problems. For example, when multiplying a Python scalar
|
2001-02-15 18:01:53 -05:00
|
|
|
|
of type float and a Numeric array of type float, the Numeric array
|
|
|
|
|
is converted to a double, since the Python float type is actually
|
|
|
|
|
a double. This is often not the desired behavior, since the
|
|
|
|
|
Numeric array will be doubled in size which is likely to be
|
|
|
|
|
annoying, particularly for very large arrays. We prefer that the
|
|
|
|
|
array type trumps the python type for the same type class, namely
|
2016-07-11 11:14:08 -04:00
|
|
|
|
integer, float, and complex. Therefore, an operation between a
|
2001-02-15 18:01:53 -05:00
|
|
|
|
Python integer and an Int16 (short) array will return an Int16
|
|
|
|
|
array. Whereas an operation between a Python float and an Int16
|
|
|
|
|
array would return a Float64 (double) array. Operations between
|
|
|
|
|
two arrays use normal coercion rules.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
4. How is integer division handled?
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
In a future version of Python, the behavior of integer division
|
|
|
|
|
will change. The operands will be converted to floats, so the
|
|
|
|
|
result will be a float. If we implement the proposed scalar
|
|
|
|
|
coercion rules where arrays have precedence over Python scalars,
|
|
|
|
|
then dividing an array by an integer will return an integer array
|
|
|
|
|
and will not be consistent with a future version of Python which
|
|
|
|
|
would return an array of type double. Scientific programmers are
|
|
|
|
|
familiar with the distinction between integer and float-point
|
|
|
|
|
division, so should Numeric 2 continue with this behavior?
|
|
|
|
|
|
|
|
|
|
5. How should records be implemented?
|
|
|
|
|
|
|
|
|
|
There are two approaches to implementing records depending on your
|
|
|
|
|
point-of-view. The first is two divide arrays into separate
|
2016-07-11 11:14:08 -04:00
|
|
|
|
classes depending on the behavior of their types. For example,
|
2001-02-15 18:01:53 -05:00
|
|
|
|
numeric arrays are one class, strings a second, and records a
|
|
|
|
|
third, because the range and type of operations of each class
|
|
|
|
|
differ. As such, a record array is not a new type, but a
|
|
|
|
|
mechanism for a more flexible form of array. To easily access and
|
|
|
|
|
manipulate such complex data, the class is comprised of numeric
|
|
|
|
|
arrays having different byte offsets into the data buffer. For
|
|
|
|
|
example, one might have a table consisting of an array of Int16,
|
|
|
|
|
Real32 values. Two numeric arrays, one with an offset of 0 bytes
|
|
|
|
|
and a stride of 6 bytes to be interpreted as Int16, and one with an
|
|
|
|
|
offset of 2 bytes and a stride of 6 bytes to be interpreted as
|
|
|
|
|
Real32 would represent the record array. Both numeric arrays
|
|
|
|
|
would refer to the same data buffer, but have different offset and
|
|
|
|
|
stride attributes, and a different numeric type.
|
|
|
|
|
|
|
|
|
|
The second approach is to consider a record as one of many array
|
|
|
|
|
types, albeit with fewer, and possibly different, array operations
|
|
|
|
|
than for numeric arrays. This approach considers an array type to
|
|
|
|
|
be a mapping of a fixed-length string. The mapping can either be
|
|
|
|
|
simple, like integer and floating-point numbers, or complex, like
|
|
|
|
|
a complex number, a byte string, and a C-structure. The record
|
|
|
|
|
type effectively merges the struct and Numeric modules into a
|
|
|
|
|
multi-dimensional struct array. This approach implies certain
|
|
|
|
|
changes to the array interface. For example, the 'typecode'
|
|
|
|
|
keyword argument should probably be changed to the more
|
|
|
|
|
descriptive 'format' keyword.
|
|
|
|
|
|
|
|
|
|
a. How are record semantics defined and implemented?
|
|
|
|
|
|
|
|
|
|
Which ever implementation approach is taken for records, the
|
|
|
|
|
syntax and semantics of how they are to be accessed and
|
|
|
|
|
manipulated must be decided, if one wishes to have access to
|
|
|
|
|
sub-fields of records. In this case, the record type can
|
|
|
|
|
essentially be considered an inhomogeneous list, like a tuple
|
|
|
|
|
returned by the unpack method of the struct module; and a 1-d
|
|
|
|
|
array of records may be interpreted as a 2-d array with the
|
|
|
|
|
second dimension being the index into the list of fields.
|
|
|
|
|
This enhanced array semantics makes access to an array of one
|
|
|
|
|
or more of the fields easy and straightforward. It also
|
|
|
|
|
allows a user to do array operations on a field in a natural
|
|
|
|
|
and intuitive way. If we assume that records are implemented
|
|
|
|
|
as an array type, then last dimension defaults to 0 and can
|
|
|
|
|
therefore be neglected for arrays comprised of simple types,
|
|
|
|
|
like numeric.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
6. How are masked-arrays implemented?
|
|
|
|
|
|
|
|
|
|
Masked-arrays in Numeric 1 are implemented as a separate array
|
|
|
|
|
class. With the ability to add new array types to Numeric 2, it
|
|
|
|
|
is possible that masked-arrays in Numeric 2 could be implemented
|
|
|
|
|
as a new array type instead of an array class.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
7. How are numerical errors handled (IEEE floating-point errors in
|
|
|
|
|
particular)?
|
|
|
|
|
|
|
|
|
|
It is not clear to the proposers (Paul Barrett and Travis
|
|
|
|
|
Oliphant) what is the best or preferred way of handling errors.
|
|
|
|
|
Since most of the C functions that do the operation, iterate over
|
|
|
|
|
the inner-most (last) dimension of the array. This dimension
|
|
|
|
|
could contain a thousand or more items having one or more errors
|
|
|
|
|
of differing type, such as divide-by-zero, underflow, and
|
|
|
|
|
overflow. Additionally, keeping track of these errors may come at
|
|
|
|
|
the expense of performance. Therefore, we suggest several
|
|
|
|
|
options:
|
|
|
|
|
|
|
|
|
|
a. Print a message of the most severe error, leaving it to
|
|
|
|
|
the user to locate the errors.
|
|
|
|
|
|
|
|
|
|
b. Print a message of all errors that occurred and the number
|
|
|
|
|
of occurrences, leaving it to the user to locate the errors.
|
|
|
|
|
|
|
|
|
|
c. Print a message of all errors that occurred and a list of
|
|
|
|
|
where they occurred.
|
|
|
|
|
|
|
|
|
|
d. Or use a hybrid approach, printing only the most severe
|
|
|
|
|
error, yet keeping track of what and where the errors
|
|
|
|
|
occurred. This would allow the user to locate the errors
|
|
|
|
|
while keeping the error message brief.
|
|
|
|
|
|
|
|
|
|
8. What features are needed to ease the integration of FORTRAN
|
|
|
|
|
libraries and code?
|
|
|
|
|
|
|
|
|
|
It would be a good idea at this stage to consider how to ease the
|
|
|
|
|
integration of FORTRAN libraries and user code in Numeric 2.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Implementation Steps
|
|
|
|
|
|
|
|
|
|
1. Implement basic UFunc capability
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
a. Minimal Array class:
|
|
|
|
|
|
|
|
|
|
Necessary class attributes and methods, e.g. .shape, .data,
|
|
|
|
|
.type, etc.
|
|
|
|
|
|
|
|
|
|
b. Minimal ArrayType class:
|
|
|
|
|
|
|
|
|
|
Int32, Real64, Complex64, Char, Object
|
|
|
|
|
|
|
|
|
|
c. Minimal UFunc class:
|
|
|
|
|
|
|
|
|
|
UFunc instantiation, CFunction registration, UFunc call for
|
|
|
|
|
1-D arrays including the rules for doing alignment,
|
|
|
|
|
byte-swapping, and coercion.
|
|
|
|
|
|
|
|
|
|
d. Minimal C-extension module:
|
|
|
|
|
|
|
|
|
|
_UFunc, which does the innermost array loop in C.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
This step implements whatever is needed to do: 'c = add(a, b)'
|
|
|
|
|
where a, b, and c are 1-D arrays. It teaches us how to add
|
|
|
|
|
new UFuncs, to coerce the arrays, to pass the necessary
|
|
|
|
|
information to a C iterator method and to do the actually
|
|
|
|
|
computation.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
2. Continue enhancing the UFunc iterator and Array class
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
a. Implement some access methods for the Array class:
|
|
|
|
|
print, repr, getitem, setitem, etc.
|
|
|
|
|
|
|
|
|
|
b. Implement multidimensional arrays
|
|
|
|
|
|
|
|
|
|
c. Implement some of basic Array methods using UFuncs:
|
|
|
|
|
+, -, *, /, etc.
|
|
|
|
|
|
|
|
|
|
d. Enable UFuncs to use Python sequences.
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
3. Complete the standard UFunc and Array class behavior
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
a. Implement getslice and setslice behavior
|
|
|
|
|
|
|
|
|
|
b. Work on Array broadcasting rules
|
|
|
|
|
|
|
|
|
|
c. Implement Record type
|
|
|
|
|
|
|
|
|
|
4. Add additional functionality
|
2017-03-24 17:11:33 -04:00
|
|
|
|
|
2001-02-15 18:01:53 -05:00
|
|
|
|
a. Add more UFuncs
|
|
|
|
|
|
|
|
|
|
b. Implement buffer or mmap access
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Incompatibilities
|
|
|
|
|
|
|
|
|
|
The following is a list of incompatibilities in behavior between
|
|
|
|
|
Numeric 1 and Numeric 2.
|
|
|
|
|
|
|
|
|
|
1. Scalar coercion rules
|
|
|
|
|
|
|
|
|
|
Numeric 1 has single set of coercion rules for array and Python
|
|
|
|
|
numeric types. This can cause unexpected and annoying problems
|
|
|
|
|
during the calculation of an array expression. Numeric 2 intends
|
|
|
|
|
to overcome these problems by having two sets of coercion rules:
|
|
|
|
|
one for arrays and Python numeric types, and another just for
|
|
|
|
|
arrays.
|
|
|
|
|
|
|
|
|
|
2. No savespace attribute
|
|
|
|
|
|
|
|
|
|
The savespace attribute in Numeric 1 makes arrays with this
|
|
|
|
|
attribute set take precedence over those that do not have it set.
|
|
|
|
|
Numeric 2 will not have such an attribute and therefore normal
|
|
|
|
|
array coercion rules will be in effect.
|
|
|
|
|
|
|
|
|
|
3. Slicing syntax returns a copy
|
|
|
|
|
|
|
|
|
|
The slicing syntax in Numeric 1 returns a view into the original
|
|
|
|
|
array. The slicing behavior for Numeric 2 will be a copy. You
|
|
|
|
|
should use the ArrayView class to get a view into an array.
|
|
|
|
|
|
|
|
|
|
4. Boolean comparisons return a boolean array
|
|
|
|
|
|
|
|
|
|
A comparison between arrays in Numeric 1 results in a Boolean
|
|
|
|
|
scalar, because of current limitations in Python. The advent of
|
|
|
|
|
Rich Comparisons in Python 2.1 will allow an array of Booleans to
|
|
|
|
|
be returned.
|
|
|
|
|
|
|
|
|
|
5. Type characters are deprecated
|
|
|
|
|
|
|
|
|
|
Numeric 2 will have an ArrayType class composed of Type instances,
|
|
|
|
|
for example Int8, Int16, Int32, and Int for signed integers. The
|
|
|
|
|
typecode scheme in Numeric 1 will be available for backward
|
|
|
|
|
compatibility, but will be deprecated.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Appendices
|
|
|
|
|
|
|
|
|
|
A. Implicit sub-arrays iteration
|
|
|
|
|
|
|
|
|
|
A computer animation is composed of a number of 2-D images or
|
|
|
|
|
frames of identical shape. By stacking these images into a single
|
|
|
|
|
block of memory, a 3-D array is created. Yet the operations to be
|
|
|
|
|
performed are not meant for the entire 3-D array, but on the set
|
|
|
|
|
of 2-D sub-arrays. In most array languages, each frame has to be
|
|
|
|
|
extracted, operated on, and then reinserted into the output array
|
|
|
|
|
using a for-like loop. The J language allows the programmer to
|
|
|
|
|
perform such operations implicitly by having a rank for the frame
|
|
|
|
|
and array. By default these ranks will be the same during the
|
|
|
|
|
creation of the array. It was the intention of the Numeric 1
|
|
|
|
|
developers to implement this feature, since it is based on the
|
|
|
|
|
language J. The Numeric 1 code has the required variables for
|
|
|
|
|
implementing this behavior, but was never implemented. We intend
|
|
|
|
|
to implement implicit sub-array iteration in Numeric 2, if the
|
|
|
|
|
array broadcasting rules found in Numeric 1 do not fully support
|
|
|
|
|
this behavior.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
|
|
|
|
|
This document is placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Related PEPs
|
|
|
|
|
|
|
|
|
|
PEP 207: Rich Comparisons
|
|
|
|
|
by Guido van Rossum and David Ascher
|
|
|
|
|
|
|
|
|
|
PEP 208: Reworking the Coercion Model
|
|
|
|
|
by Neil Schemenauer and Marc-Andre' Lemburg
|
|
|
|
|
|
|
|
|
|
PEP 211: Adding New Linear Algebra Operators to Python
|
|
|
|
|
by Greg Wilson
|
|
|
|
|
|
|
|
|
|
PEP 225: Elementwise/Objectwise Operators
|
|
|
|
|
by Huaiyu Zhu
|
|
|
|
|
|
|
|
|
|
PEP 228: Reworking Python's Numeric Model
|
|
|
|
|
by Moshe Zadka
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
|
|
|
|
|
|
[1] P. Greenfield 2000. private communication.
|
2000-07-15 19:28:27 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
End:
|