Checking in Paul Barrett's latest draft, after spell checking only.
This commit is contained in:
parent
248b499ee6
commit
9501aa7464
676
pep-0209.txt
676
pep-0209.txt
|
@ -1,14 +1,682 @@
|
|||
PEP: 209
|
||||
Title: Adding Multidimensional Arrays
|
||||
Title: Multi-dimensional Arrays
|
||||
Version: $Revision$
|
||||
Author: davida@activestate.com (David Ascher)
|
||||
Author: barrett@stsci.edu (Paul Barrett), oliphant@ee.byu.edu (Travis Oliphant)
|
||||
Python-Version: 2.2
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Python-Version: 2.1
|
||||
Created: 15-Jul-2000
|
||||
Created: 03-Jan-2001
|
||||
Post-History:
|
||||
|
||||
|
||||
Abstract
|
||||
|
||||
This PEP proposes a redesign and re-implementation of the multi-
|
||||
dimensional array module, Numeric, to make it easier to add new
|
||||
features and functionality to the module. Aspects of Numeric 2
|
||||
that will receive special attention are efficient access to arrays
|
||||
exceeding a gigabyte in size and composed of inhomogeneous data
|
||||
structures or records. The proposed design uses four Python
|
||||
classes: ArrayType, UFunc, Array, and ArrayView; and a low-level
|
||||
C-extension module, _ufunc, to handle the array operations
|
||||
efficiently. In addition, each array type has its own C-extension
|
||||
module which defines the coercion rules, operations, and methods
|
||||
for that type. This design enables new types, features, and
|
||||
functionality to be added in a modular fashion. The new version
|
||||
will introduce some incompatibilities with the current Numeric.
|
||||
|
||||
|
||||
Motivation
|
||||
|
||||
Multi-dimensional arrays are commonly used to store and manipulate
|
||||
data in science, engineering, and computing. Python currently has
|
||||
an extension module, named Numeric (henceforth called Numeric 1),
|
||||
which provides a satisfactory set of functionality for users
|
||||
manipulating homogeneous arrays of data of moderate size (of order
|
||||
10 MB). For access to larger arrays (of order 100 MB or more) of
|
||||
possibly inhomogeneous data, the implementation of Numeric 1 is
|
||||
inefficient and cumbersome. In the future, requests by the
|
||||
Numerical Python community for additional functionality is also
|
||||
likely as PEPs 211: Adding New Linear Operators to Python, and
|
||||
225: Elementwise/Objectwise Operators illustrate.
|
||||
|
||||
|
||||
Proposal
|
||||
|
||||
This proposal recommends a re-design and re-implementation of
|
||||
Numeric 1, henceforth called Numeric 2, which will enable new
|
||||
types, features, and functionality to be added in an easy and
|
||||
modular manner. The initial design of Numeric 2 should focus on
|
||||
providing a generic framework for manipulating arrays of various
|
||||
types and should enable a straightforward mechanism for adding new
|
||||
array types and UFuncs. Functional methods that are more specific
|
||||
to various disciplines can then be layered on top of this core.
|
||||
This new module will still be called Numeric and most of the
|
||||
behavior found in Numeric 1 will be preserved.
|
||||
|
||||
The proposed design uses four Python classes: ArrayType, UFunc,
|
||||
Array, and ArrayView; and a low-level C-extension module to handle
|
||||
the array operations efficiently. In addition, each array type
|
||||
has its own C-extension module which defines the coercion rules,
|
||||
operations, and methods for that type. At a later date, when core
|
||||
functionality is stable, some Python classes can be converted to
|
||||
C-extension types.
|
||||
|
||||
Some planned features are:
|
||||
|
||||
1. Improved memory usage
|
||||
|
||||
This feature is particularly important when handling large arrays
|
||||
and can produce significant improvements in performance as well as
|
||||
memory usage. We have identified several areas where memory usage
|
||||
can be improved:
|
||||
|
||||
a. Use a local coercion model
|
||||
|
||||
Instead of using Python's global coercion model which creates
|
||||
temporary arrays, Numeric 2, like Numeric 1, will implement a
|
||||
local coercion model as described in PEP 208 which defers the
|
||||
responsibility of coercion to the operator. By using internal
|
||||
buffers, a coercion operation can be done for each array
|
||||
(including output arrays), if necessary, at the time of the
|
||||
operation. Benchmarks [1] have shown that performance is at
|
||||
most degraded only slightly and is improved in cases where the
|
||||
internal buffers are less than the L2 cache size and the
|
||||
processor is under load. To avoid array coercion altogether,
|
||||
C functions having arguments of mixed type are allowed in
|
||||
Numeric 2.
|
||||
|
||||
b. Avoid creation of temporary arrays
|
||||
|
||||
In complex array expressions (i.e. having more than one
|
||||
operation), each operation will create a temporary array which
|
||||
will be used and then deleted by the succeeding operation. A
|
||||
better approach would be to identify these temporary arrays
|
||||
and reuse their data buffers when possible, namely when the
|
||||
array shape and type are the same as the temporary array being
|
||||
created. This can be done by checking the temporary array's
|
||||
reference count. If it is 1, then it will be deleted once the
|
||||
operation is done and is a candidate for reuse.
|
||||
|
||||
c. Optional use of memory-mapped files
|
||||
|
||||
Numeric users sometimes need to access data from very large
|
||||
files or to handle data that is greater than the available
|
||||
memory. Memory-mapped arrays provide a mechanism to do this
|
||||
by storing the data on disk while making it appear to be in
|
||||
memory. Memory- mapped arrays should improve access to all
|
||||
files by eliminating one of two copy steps during a file
|
||||
access. Numeric should be able to access in-memory and
|
||||
memory-mapped arrays transparently.
|
||||
|
||||
d. Record access
|
||||
|
||||
In some fields of science, data is stored in files as binary
|
||||
records. For example in astronomy, photon data is stored as a
|
||||
1 dimensional list of photons in order of arrival time. These
|
||||
records or C-like structures contain information about the
|
||||
detected photon, such as its arrival time, its position on the
|
||||
detector, and its energy. Each field may be of a different
|
||||
type, such as char, int, or float. Such arrays introduce new
|
||||
issues that must be dealt with, in particular byte alignment
|
||||
or byte swapping may need to be performed for the numeric
|
||||
values to be properly accessed (though byte swapping is also
|
||||
an issue for memory mapped data). Numeric 2 is designed to
|
||||
automatically handle alignment and representational issues
|
||||
when data is accessed or operated on. There are two
|
||||
approaches to implementing records; as either a derived array
|
||||
class or a special array type, depending on your point-of-
|
||||
view. We defer this discussion to the Open Issues section.
|
||||
|
||||
|
||||
2. Additional array types
|
||||
|
||||
Numeric 1 has 11 defined types: char, ubyte, sbyte, short, int,
|
||||
long, float, double, cfloat, cdouble, and object. There are no
|
||||
ushort, uint, or ulong types, nor are there more complex types
|
||||
such as a bit type which is of use to some fields of science and
|
||||
possibly for implementing masked-arrays. The design of Numeric 1
|
||||
makes the addition of these and other types a difficult and
|
||||
error-prone process. To enable the easy addition (and deletion)
|
||||
of new array types such as a bit type described below, a re-design
|
||||
of Numeric is necessary.
|
||||
|
||||
a. Bit type
|
||||
|
||||
The result of a rich comparison between arrays is an array of
|
||||
boolean values. The result can be stored in an array of type
|
||||
char, but this is an unnecessary waste of memory. A better
|
||||
implementation would use a bit or boolean type, compressing
|
||||
the array size by a factor of eight. This is currently being
|
||||
implemented for Numeric 1 (by Travis Oliphant) and should be
|
||||
included in Numeric 2.
|
||||
|
||||
3. Enhanced array indexing syntax
|
||||
|
||||
The extended slicing syntax was added to Python to provide greater
|
||||
flexibility when manipulating Numeric arrays by allowing
|
||||
step-sizes greater than 1. This syntax works well as a shorthand
|
||||
for a list of regularly spaced indices. For those situations
|
||||
where a list of irregularly spaced indices are needed, an enhanced
|
||||
array indexing syntax would allow 1-D arrays to be arguments.
|
||||
|
||||
4. Rich comparisons
|
||||
|
||||
The implementation of PEP 207: Rich Comparisons in Python 2.1
|
||||
provides additional flexibility when manipulating arrays. We
|
||||
intend to implement this feature in Numeric 2.
|
||||
|
||||
5. Array broadcasting rules
|
||||
|
||||
When an operation between a scalar and an array is done, the
|
||||
implied behavior is to create a new array having the same shape as
|
||||
the array operand containing the scalar value. This is called
|
||||
array broadcasting. It also works with arrays of lesser rank,
|
||||
such as vectors. This implicit behavior is implemented in Numeric
|
||||
1 and will also be implemented in Numeric 2.
|
||||
|
||||
|
||||
Design and Implementation
|
||||
|
||||
The design of Numeric 2 has four primary classes:
|
||||
|
||||
1. ArrayType:
|
||||
|
||||
This is a simple class that describes the fundamental properties
|
||||
of an array-type, e.g. its name, its size in bytes, its coercion
|
||||
relations with respect to other types, etc., e.g.
|
||||
|
||||
> Int32 = ArrayType('Int32', 4, 'doc-string')
|
||||
|
||||
Its relation to the other types is defined when the C-extension
|
||||
module for that type is imported. The corresponding Python code
|
||||
is:
|
||||
|
||||
> Int32.astype[Real64] = Real64
|
||||
|
||||
This says that the Real64 array-type has higher priority than the
|
||||
Int32 array-type.
|
||||
|
||||
The following attributes and methods are proposed for the core
|
||||
implementation. Additional attributes can be added on an
|
||||
individual basis, e.g. .bitsize or .bitstrides for the bit type.
|
||||
|
||||
Attributes:
|
||||
.name: e.g. "Int32", "Float64", etc.
|
||||
.typecode: e.g. 'i', 'f', etc.
|
||||
(for backward compatibility)
|
||||
.size (in bytes): e.g. 4, 8, etc.
|
||||
.array_rules (mapping): rules between array types
|
||||
.pyobj_rules (mapping): rules between array and python types
|
||||
.doc: documentation string
|
||||
Methods:
|
||||
__init__(): initialization
|
||||
__del__(): destruction
|
||||
__repr__(): representation
|
||||
|
||||
C-API:
|
||||
This still needs to be fleshed-out.
|
||||
|
||||
|
||||
2. UFunc:
|
||||
|
||||
This class is the heart of Numeric 2. Its design is similar to
|
||||
that of ArrayType in that the UFunc creates a singleton callable
|
||||
object whose attributes are name, total and input number of
|
||||
arguments, a document string, and an empty CFunc dictionary; e.g.
|
||||
|
||||
> add = UFunc('add', 3, 2, 'doc-string')
|
||||
|
||||
When defined the add instance has no C functions associated with
|
||||
it and therefore can do no work. The CFunc dictionary is
|
||||
populated or registered later when the C-extension module for an
|
||||
array-type is imported. The arguments of the register method are:
|
||||
function name, function descriptor, and the CUFunc object. The
|
||||
corresponding Python code is
|
||||
|
||||
> add.register('add', (Int32, Int32, Int32), cfunc-add)
|
||||
|
||||
In the initialization function of an array type module, e.g.
|
||||
Int32, there are two C API functions: one to initialize the
|
||||
coercion rules and the other to register the CFunc objects.
|
||||
|
||||
When an operation is applied to some arrays, the __call__ method
|
||||
is invoked. It gets the type of each array (if the output array
|
||||
is not given, it is created from the coercion rules) and checks
|
||||
the CFunc dictionary for a key that matches the argument types.
|
||||
If it exists the operation is performed immediately, otherwise the
|
||||
coercion rules are used to search for a related operation and set
|
||||
of conversion functions. The __call__ method then invokes a
|
||||
compute method written in C to iterate over slices of each array,
|
||||
namely:
|
||||
|
||||
> _ufunc.compute(slice, data, func, swap, conv)
|
||||
|
||||
The 'func' argument is a CFuncObject, while the 'swap' and 'conv'
|
||||
arguments are lists of CFuncObjects for those arrays needing pre-
|
||||
or post-processing, otherwise None is used. The data argument is
|
||||
a list of buffer objects, and the slice argument gives the number
|
||||
of iterations for each dimension along with the buffer offset and
|
||||
step size for each array and each dimension.
|
||||
|
||||
We have predefined several UFuncs for use by the __call__ method:
|
||||
cast, swap, getobj, and setobj. The cast and swap functions do
|
||||
coercion and byte-swapping, respectively and the getobj and setobj
|
||||
functions do coercion between Numeric arrays and Python sequences.
|
||||
|
||||
The following attributes and methods are proposed for the core
|
||||
implementation.
|
||||
|
||||
Attributes:
|
||||
.name: e.g. "add", "subtract", etc.
|
||||
.nargs: number of total arguments
|
||||
.iargs: number of input arguments
|
||||
.cfuncs (mapping): the set C functions
|
||||
.doc: documentation string
|
||||
Methods:
|
||||
__init__(): initialization
|
||||
__del__(): destruction
|
||||
__repr__(): representation
|
||||
__call__(): look-up and dispatch method
|
||||
initrule(): initialize coercion rule
|
||||
uninitrule(): uninitialize coercion rule
|
||||
register(): register a CUFunc
|
||||
unregister(): unregister a CUFunc
|
||||
|
||||
C-API:
|
||||
This still needs to be fleshed-out.
|
||||
|
||||
3. Array:
|
||||
|
||||
This class contains information about the array, such as shape,
|
||||
type, endian-ness of the data, etc.. Its operators, '+', '-',
|
||||
etc. just invoke the corresponding UFunc function, e.g.
|
||||
|
||||
> def __add__(self, other):
|
||||
> return ufunc.add(self, other)
|
||||
|
||||
The following attributes, methods, and functions are proposed for
|
||||
the core implementation.
|
||||
|
||||
Attributes:
|
||||
.shape: shape of the array
|
||||
.format: type of the array
|
||||
.real (only complex): real part of a complex array
|
||||
.imag (only complex): imaginary part of a complex array
|
||||
Methods:
|
||||
__init__(): initialization
|
||||
__del__(): destruction
|
||||
__repr_(): representation
|
||||
__str__(): pretty representation
|
||||
__cmp__(): rich comparison
|
||||
__len__():
|
||||
__getitem__():
|
||||
__setitem__():
|
||||
__getslice__():
|
||||
__setslice__():
|
||||
numeric methods:
|
||||
copy(): copy of array
|
||||
aslist(): create list from array
|
||||
asstring(): create string from array
|
||||
|
||||
Functions:
|
||||
fromlist(): create array from sequence
|
||||
fromstring(): create array from string
|
||||
array(): create array with shape and value
|
||||
concat(): concatenate two arrays
|
||||
resize(): resize array
|
||||
|
||||
C-API:
|
||||
This still needs to be fleshed-out.
|
||||
|
||||
4. ArrayView
|
||||
|
||||
This class is similar to the Array class except that the reshape
|
||||
and flat methods will raise exceptions, since non-contiguous
|
||||
arrays cannot be reshaped or flattened using just pointer and
|
||||
step-size information.
|
||||
|
||||
C-API:
|
||||
This still needs to be fleshed-out.
|
||||
|
||||
5. C-extension modules:
|
||||
|
||||
Numeric2 will have several C-extension modules.
|
||||
|
||||
a. _ufunc:
|
||||
|
||||
The primary module of this set is the _ufuncmodule.c. The
|
||||
intention of this module is to do the bare minimum,
|
||||
i.e. iterate over arrays using a specified C function. The
|
||||
interface of these functions is the same as Numeric 1, i.e.
|
||||
|
||||
int (*CFunc)(char *data, int *steps, int repeat, void *func);
|
||||
|
||||
and their functionality is expected to be the same, i.e. they
|
||||
iterate over the inner-most dimension.
|
||||
|
||||
The following attributes and methods are proposed for the core
|
||||
implementation.
|
||||
|
||||
Attributes:
|
||||
|
||||
Methods:
|
||||
compute():
|
||||
|
||||
C-API:
|
||||
This still needs to be fleshed-out.
|
||||
|
||||
b. _int32, _real64, etc.:
|
||||
|
||||
There will also be C-extension modules for each array type,
|
||||
e.g. _int32module.c, _real64module.c, etc. As mentioned
|
||||
previously, when these modules are imported by the UFunc
|
||||
module, they will automatically register their functions and
|
||||
coercion rules. New or improved versions of these modules can
|
||||
be easily implemented and used without affecting the rest of
|
||||
Numeric 2.
|
||||
|
||||
|
||||
Open Issues
|
||||
|
||||
1. Does slicing syntax default to copy or view behavior?
|
||||
|
||||
The default behavior of Python is to return a copy of a sub-list
|
||||
or tuple when slicing syntax is used, whereas Numeric 1 returns a
|
||||
view into the array. The choice made for Numeric 1 is apparently
|
||||
for reasons of performance: the developers wish to avoid the
|
||||
penalty of allocating and copying the data buffer during each
|
||||
array operation and feel that the need for a deep copy of an array
|
||||
to be rare. Yet, some have argued that Numeric's slice notation
|
||||
should also have copy behavior to be consistent with Python lists.
|
||||
In this case the performance penalty associated with copy behavior
|
||||
can be minimized by implementing copy-on-write. This scheme has
|
||||
both arrays sharing one data buffer (as in view behavior) until
|
||||
either array is assigned new data at which point a copy of the
|
||||
data buffer is made. View behavior would then be implemented by
|
||||
an ArrayView class, whose behavior be similar to Numeric 1 arrays,
|
||||
i.e. .shape is not settable for non-contiguous arrays. The use of
|
||||
an ArrayView class also makes explicit what type of data the array
|
||||
contains.
|
||||
|
||||
2. Does item syntax default to copy or view behavior?
|
||||
|
||||
A similar question arises with the item syntax. For example, if a
|
||||
= [[0,1,2], [3,4,5]] and b = a[0], then changing b[0] also changes
|
||||
a[0][0], because a[0] is a reference or view of the first row of
|
||||
a. Therefore, if c is a 2-d array, it would appear that c[i]
|
||||
should return a 1-d array which is a view into, instead of a copy
|
||||
of, c for consistency. Yet, c[i] can be considered just a
|
||||
shorthand for c[i,:] which would imply copy behavior assuming
|
||||
slicing syntax returns a copy. Should Numeric 2 behave the same
|
||||
way as lists and return a view or should it return a copy.
|
||||
|
||||
3. How is scalar coercion implemented?
|
||||
|
||||
Python has fewer numeric types than Numeric which can cause
|
||||
coercion problems. For example when multiplying a Python scalar
|
||||
of type float and a Numeric array of type float, the Numeric array
|
||||
is converted to a double, since the Python float type is actually
|
||||
a double. This is often not the desired behavior, since the
|
||||
Numeric array will be doubled in size which is likely to be
|
||||
annoying, particularly for very large arrays. We prefer that the
|
||||
array type trumps the python type for the same type class, namely
|
||||
integer, float, and complex. Therefore an operation between a
|
||||
Python integer and an Int16 (short) array will return an Int16
|
||||
array. Whereas an operation between a Python float and an Int16
|
||||
array would return a Float64 (double) array. Operations between
|
||||
two arrays use normal coercion rules.
|
||||
|
||||
4. How is integer division handled?
|
||||
|
||||
In a future version of Python, the behavior of integer division
|
||||
will change. The operands will be converted to floats, so the
|
||||
result will be a float. If we implement the proposed scalar
|
||||
coercion rules where arrays have precedence over Python scalars,
|
||||
then dividing an array by an integer will return an integer array
|
||||
and will not be consistent with a future version of Python which
|
||||
would return an array of type double. Scientific programmers are
|
||||
familiar with the distinction between integer and float-point
|
||||
division, so should Numeric 2 continue with this behavior?
|
||||
|
||||
5. How should records be implemented?
|
||||
|
||||
There are two approaches to implementing records depending on your
|
||||
point-of-view. The first is two divide arrays into separate
|
||||
classes depending on the behavior of their types. For example
|
||||
numeric arrays are one class, strings a second, and records a
|
||||
third, because the range and type of operations of each class
|
||||
differ. As such, a record array is not a new type, but a
|
||||
mechanism for a more flexible form of array. To easily access and
|
||||
manipulate such complex data, the class is comprised of numeric
|
||||
arrays having different byte offsets into the data buffer. For
|
||||
example, one might have a table consisting of an array of Int16,
|
||||
Real32 values. Two numeric arrays, one with an offset of 0 bytes
|
||||
and a stride of 6 bytes to be interpreted as Int16, and one with an
|
||||
offset of 2 bytes and a stride of 6 bytes to be interpreted as
|
||||
Real32 would represent the record array. Both numeric arrays
|
||||
would refer to the same data buffer, but have different offset and
|
||||
stride attributes, and a different numeric type.
|
||||
|
||||
The second approach is to consider a record as one of many array
|
||||
types, albeit with fewer, and possibly different, array operations
|
||||
than for numeric arrays. This approach considers an array type to
|
||||
be a mapping of a fixed-length string. The mapping can either be
|
||||
simple, like integer and floating-point numbers, or complex, like
|
||||
a complex number, a byte string, and a C-structure. The record
|
||||
type effectively merges the struct and Numeric modules into a
|
||||
multi-dimensional struct array. This approach implies certain
|
||||
changes to the array interface. For example, the 'typecode'
|
||||
keyword argument should probably be changed to the more
|
||||
descriptive 'format' keyword.
|
||||
|
||||
a. How are record semantics defined and implemented?
|
||||
|
||||
Which ever implementation approach is taken for records, the
|
||||
syntax and semantics of how they are to be accessed and
|
||||
manipulated must be decided, if one wishes to have access to
|
||||
sub-fields of records. In this case, the record type can
|
||||
essentially be considered an inhomogeneous list, like a tuple
|
||||
returned by the unpack method of the struct module; and a 1-d
|
||||
array of records may be interpreted as a 2-d array with the
|
||||
second dimension being the index into the list of fields.
|
||||
This enhanced array semantics makes access to an array of one
|
||||
or more of the fields easy and straightforward. It also
|
||||
allows a user to do array operations on a field in a natural
|
||||
and intuitive way. If we assume that records are implemented
|
||||
as an array type, then last dimension defaults to 0 and can
|
||||
therefore be neglected for arrays comprised of simple types,
|
||||
like numeric.
|
||||
|
||||
6. How are masked-arrays implemented?
|
||||
|
||||
Masked-arrays in Numeric 1 are implemented as a separate array
|
||||
class. With the ability to add new array types to Numeric 2, it
|
||||
is possible that masked-arrays in Numeric 2 could be implemented
|
||||
as a new array type instead of an array class.
|
||||
|
||||
7. How are numerical errors handled (IEEE floating-point errors in
|
||||
particular)?
|
||||
|
||||
It is not clear to the proposers (Paul Barrett and Travis
|
||||
Oliphant) what is the best or preferred way of handling errors.
|
||||
Since most of the C functions that do the operation, iterate over
|
||||
the inner-most (last) dimension of the array. This dimension
|
||||
could contain a thousand or more items having one or more errors
|
||||
of differing type, such as divide-by-zero, underflow, and
|
||||
overflow. Additionally, keeping track of these errors may come at
|
||||
the expense of performance. Therefore, we suggest several
|
||||
options:
|
||||
|
||||
a. Print a message of the most severe error, leaving it to
|
||||
the user to locate the errors.
|
||||
|
||||
b. Print a message of all errors that occurred and the number
|
||||
of occurrences, leaving it to the user to locate the errors.
|
||||
|
||||
c. Print a message of all errors that occurred and a list of
|
||||
where they occurred.
|
||||
|
||||
d. Or use a hybrid approach, printing only the most severe
|
||||
error, yet keeping track of what and where the errors
|
||||
occurred. This would allow the user to locate the errors
|
||||
while keeping the error message brief.
|
||||
|
||||
8. What features are needed to ease the integration of FORTRAN
|
||||
libraries and code?
|
||||
|
||||
It would be a good idea at this stage to consider how to ease the
|
||||
integration of FORTRAN libraries and user code in Numeric 2.
|
||||
|
||||
|
||||
Implementation Steps
|
||||
|
||||
1. Implement basic UFunc capability
|
||||
|
||||
a. Minimal Array class:
|
||||
|
||||
Necessary class attributes and methods, e.g. .shape, .data,
|
||||
.type, etc.
|
||||
|
||||
b. Minimal ArrayType class:
|
||||
|
||||
Int32, Real64, Complex64, Char, Object
|
||||
|
||||
c. Minimal UFunc class:
|
||||
|
||||
UFunc instantiation, CFunction registration, UFunc call for
|
||||
1-D arrays including the rules for doing alignment,
|
||||
byte-swapping, and coercion.
|
||||
|
||||
d. Minimal C-extension module:
|
||||
|
||||
_UFunc, which does the innermost array loop in C.
|
||||
|
||||
This step implements whatever is needed to do: 'c = add(a, b)'
|
||||
where a, b, and c are 1-D arrays. It teaches us how to add
|
||||
new UFuncs, to coerce the arrays, to pass the necessary
|
||||
information to a C iterator method and to do the actually
|
||||
computation.
|
||||
|
||||
2. Continue enhancing the UFunc iterator and Array class
|
||||
|
||||
a. Implement some access methods for the Array class:
|
||||
print, repr, getitem, setitem, etc.
|
||||
|
||||
b. Implement multidimensional arrays
|
||||
|
||||
c. Implement some of basic Array methods using UFuncs:
|
||||
+, -, *, /, etc.
|
||||
|
||||
d. Enable UFuncs to use Python sequences.
|
||||
|
||||
3. Complete the standard UFunc and Array class behavior
|
||||
|
||||
a. Implement getslice and setslice behavior
|
||||
|
||||
b. Work on Array broadcasting rules
|
||||
|
||||
c. Implement Record type
|
||||
|
||||
4. Add additional functionality
|
||||
|
||||
a. Add more UFuncs
|
||||
|
||||
b. Implement buffer or mmap access
|
||||
|
||||
|
||||
Incompatibilities
|
||||
|
||||
The following is a list of incompatibilities in behavior between
|
||||
Numeric 1 and Numeric 2.
|
||||
|
||||
1. Scalar coercion rules
|
||||
|
||||
Numeric 1 has single set of coercion rules for array and Python
|
||||
numeric types. This can cause unexpected and annoying problems
|
||||
during the calculation of an array expression. Numeric 2 intends
|
||||
to overcome these problems by having two sets of coercion rules:
|
||||
one for arrays and Python numeric types, and another just for
|
||||
arrays.
|
||||
|
||||
2. No savespace attribute
|
||||
|
||||
The savespace attribute in Numeric 1 makes arrays with this
|
||||
attribute set take precedence over those that do not have it set.
|
||||
Numeric 2 will not have such an attribute and therefore normal
|
||||
array coercion rules will be in effect.
|
||||
|
||||
3. Slicing syntax returns a copy
|
||||
|
||||
The slicing syntax in Numeric 1 returns a view into the original
|
||||
array. The slicing behavior for Numeric 2 will be a copy. You
|
||||
should use the ArrayView class to get a view into an array.
|
||||
|
||||
4. Boolean comparisons return a boolean array
|
||||
|
||||
A comparison between arrays in Numeric 1 results in a Boolean
|
||||
scalar, because of current limitations in Python. The advent of
|
||||
Rich Comparisons in Python 2.1 will allow an array of Booleans to
|
||||
be returned.
|
||||
|
||||
5. Type characters are deprecated
|
||||
|
||||
Numeric 2 will have an ArrayType class composed of Type instances,
|
||||
for example Int8, Int16, Int32, and Int for signed integers. The
|
||||
typecode scheme in Numeric 1 will be available for backward
|
||||
compatibility, but will be deprecated.
|
||||
|
||||
|
||||
Appendices
|
||||
|
||||
A. Implicit sub-arrays iteration
|
||||
|
||||
A computer animation is composed of a number of 2-D images or
|
||||
frames of identical shape. By stacking these images into a single
|
||||
block of memory, a 3-D array is created. Yet the operations to be
|
||||
performed are not meant for the entire 3-D array, but on the set
|
||||
of 2-D sub-arrays. In most array languages, each frame has to be
|
||||
extracted, operated on, and then reinserted into the output array
|
||||
using a for-like loop. The J language allows the programmer to
|
||||
perform such operations implicitly by having a rank for the frame
|
||||
and array. By default these ranks will be the same during the
|
||||
creation of the array. It was the intention of the Numeric 1
|
||||
developers to implement this feature, since it is based on the
|
||||
language J. The Numeric 1 code has the required variables for
|
||||
implementing this behavior, but was never implemented. We intend
|
||||
to implement implicit sub-array iteration in Numeric 2, if the
|
||||
array broadcasting rules found in Numeric 1 do not fully support
|
||||
this behavior.
|
||||
|
||||
|
||||
Copyright
|
||||
|
||||
This document is placed in the public domain.
|
||||
|
||||
|
||||
Related PEPs
|
||||
|
||||
PEP 207: Rich Comparisons
|
||||
by Guido van Rossum and David Ascher
|
||||
|
||||
PEP 208: Reworking the Coercion Model
|
||||
by Neil Schemenauer and Marc-Andre' Lemburg
|
||||
|
||||
PEP 211: Adding New Linear Algebra Operators to Python
|
||||
by Greg Wilson
|
||||
|
||||
PEP 225: Elementwise/Objectwise Operators
|
||||
by Huaiyu Zhu
|
||||
|
||||
PEP 228: Reworking Python's Numeric Model
|
||||
by Moshe Zadka
|
||||
|
||||
|
||||
References
|
||||
|
||||
[1] P. Greenfield 2000. private communication.
|
||||
|
||||
|
||||
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
|
|
Loading…
Reference in New Issue