From ca41d18043e7a64b8a733f8eef33df7e4c6fe4bc Mon Sep 17 00:00:00 2001 From: Huang Huang Date: Wed, 13 Sep 2017 07:22:41 +0800 Subject: [PATCH] reSTify PEP 209 (#406) --- pep-0209.txt | 358 +++++++++++++++++++++++++++------------------------ 1 file changed, 193 insertions(+), 165 deletions(-) diff --git a/pep-0209.txt b/pep-0209.txt index 9063d7f3d..9358d6af9 100644 --- a/pep-0209.txt +++ b/pep-0209.txt @@ -5,74 +5,78 @@ Last-Modified: $Date$ Author: barrett@stsci.edu (Paul Barrett), oliphant@ee.byu.edu (Travis Oliphant) Status: Withdrawn Type: Standards Track +Content-Type: text/x-rst Created: 03-Jan-2001 Python-Version: 2.2 Post-History: Abstract +======== - This PEP proposes a redesign and re-implementation of the multi- - dimensional array module, Numeric, to make it easier to add new - features and functionality to the module. Aspects of Numeric 2 - that will receive special attention are efficient access to arrays - exceeding a gigabyte in size and composed of inhomogeneous data - structures or records. The proposed design uses four Python - classes: ArrayType, UFunc, Array, and ArrayView; and a low-level - C-extension module, _ufunc, to handle the array operations - efficiently. In addition, each array type has its own C-extension - module which defines the coercion rules, operations, and methods - for that type. This design enables new types, features, and - functionality to be added in a modular fashion. The new version - will introduce some incompatibilities with the current Numeric. +This PEP proposes a redesign and re-implementation of the multi- +dimensional array module, Numeric, to make it easier to add new +features and functionality to the module. Aspects of Numeric 2 +that will receive special attention are efficient access to arrays +exceeding a gigabyte in size and composed of inhomogeneous data +structures or records. The proposed design uses four Python +classes: ArrayType, UFunc, Array, and ArrayView; and a low-level +C-extension module, _ufunc, to handle the array operations +efficiently. In addition, each array type has its own C-extension +module which defines the coercion rules, operations, and methods +for that type. This design enables new types, features, and +functionality to be added in a modular fashion. The new version +will introduce some incompatibilities with the current Numeric. Motivation +========== - Multi-dimensional arrays are commonly used to store and manipulate - data in science, engineering, and computing. Python currently has - an extension module, named Numeric (henceforth called Numeric 1), - which provides a satisfactory set of functionality for users - manipulating homogeneous arrays of data of moderate size (of order - 10 MB). For access to larger arrays (of order 100 MB or more) of - possibly inhomogeneous data, the implementation of Numeric 1 is - inefficient and cumbersome. In the future, requests by the - Numerical Python community for additional functionality is also - likely as PEPs 211: Adding New Linear Operators to Python, and - 225: Elementwise/Objectwise Operators illustrate. +Multi-dimensional arrays are commonly used to store and manipulate +data in science, engineering, and computing. Python currently has +an extension module, named Numeric (henceforth called Numeric 1), +which provides a satisfactory set of functionality for users +manipulating homogeneous arrays of data of moderate size (of order +10 MB). For access to larger arrays (of order 100 MB or more) of +possibly inhomogeneous data, the implementation of Numeric 1 is +inefficient and cumbersome. In the future, requests by the +Numerical Python community for additional functionality is also +likely as PEPs 211: Adding New Linear Operators to Python, and +225: Elementwise/Objectwise Operators illustrate. Proposal +======== - This proposal recommends a re-design and re-implementation of - Numeric 1, henceforth called Numeric 2, which will enable new - types, features, and functionality to be added in an easy and - modular manner. The initial design of Numeric 2 should focus on - providing a generic framework for manipulating arrays of various - types and should enable a straightforward mechanism for adding new - array types and UFuncs. Functional methods that are more specific - to various disciplines can then be layered on top of this core. - This new module will still be called Numeric and most of the - behavior found in Numeric 1 will be preserved. +This proposal recommends a re-design and re-implementation of +Numeric 1, henceforth called Numeric 2, which will enable new +types, features, and functionality to be added in an easy and +modular manner. The initial design of Numeric 2 should focus on +providing a generic framework for manipulating arrays of various +types and should enable a straightforward mechanism for adding new +array types and UFuncs. Functional methods that are more specific +to various disciplines can then be layered on top of this core. +This new module will still be called Numeric and most of the +behavior found in Numeric 1 will be preserved. - The proposed design uses four Python classes: ArrayType, UFunc, - Array, and ArrayView; and a low-level C-extension module to handle - the array operations efficiently. In addition, each array type - has its own C-extension module which defines the coercion rules, - operations, and methods for that type. At a later date, when core - functionality is stable, some Python classes can be converted to - C-extension types. +The proposed design uses four Python classes: ArrayType, UFunc, +Array, and ArrayView; and a low-level C-extension module to handle +the array operations efficiently. In addition, each array type +has its own C-extension module which defines the coercion rules, +operations, and methods for that type. At a later date, when core +functionality is stable, some Python classes can be converted to +C-extension types. - Some planned features are: +Some planned features are: - 1. Improved memory usage +1. Improved memory usage This feature is particularly important when handling large arrays and can produce significant improvements in performance as well as memory usage. We have identified several areas where memory usage can be improved: - a. Use a local coercion model + a. Use a local coercion model Instead of using Python's global coercion model which creates temporary arrays, Numeric 2, like Numeric 1, will implement a @@ -80,14 +84,14 @@ Proposal responsibility of coercion to the operator. By using internal buffers, a coercion operation can be done for each array (including output arrays), if necessary, at the time of the - operation. Benchmarks [1] have shown that performance is at + operation. Benchmarks [1]_ have shown that performance is at most degraded only slightly and is improved in cases where the internal buffers are less than the L2 cache size and the processor is under load. To avoid array coercion altogether, C functions having arguments of mixed type are allowed in Numeric 2. - b. Avoid creation of temporary arrays + b. Avoid creation of temporary arrays In complex array expressions (i.e. having more than one operation), each operation will create a temporary array which @@ -99,7 +103,7 @@ Proposal reference count. If it is 1, then it will be deleted once the operation is done and is a candidate for reuse. - c. Optional use of memory-mapped files + c. Optional use of memory-mapped files Numeric users sometimes need to access data from very large files or to handle data that is greater than the available @@ -110,7 +114,7 @@ Proposal access. Numeric should be able to access in-memory and memory-mapped arrays transparently. - d. Record access + d. Record access In some fields of science, data is stored in files as binary records. For example, in astronomy, photon data is stored as a @@ -130,7 +134,7 @@ Proposal view. We defer this discussion to the Open Issues section. - 2. Additional array types +2. Additional array types Numeric 1 has 11 defined types: char, ubyte, sbyte, short, int, long, float, double, cfloat, cdouble, and object. There are no @@ -142,7 +146,7 @@ Proposal of new array types such as a bit type described below, a re-design of Numeric is necessary. - a. Bit type + a. Bit type The result of a rich comparison between arrays is an array of boolean values. The result can be stored in an array of type @@ -152,7 +156,7 @@ Proposal implemented for Numeric 1 (by Travis Oliphant) and should be included in Numeric 2. - 3. Enhanced array indexing syntax +3. Enhanced array indexing syntax The extended slicing syntax was added to Python to provide greater flexibility when manipulating Numeric arrays by allowing @@ -161,39 +165,42 @@ Proposal where a list of irregularly spaced indices are needed, an enhanced array indexing syntax would allow 1-D arrays to be arguments. - 4. Rich comparisons +4. Rich comparisons The implementation of PEP 207: Rich Comparisons in Python 2.1 provides additional flexibility when manipulating arrays. We intend to implement this feature in Numeric 2. - 5. Array broadcasting rules +5. Array broadcasting rules - When an operation between a scalar and an array is done, the - implied behavior is to create a new array having the same shape as - the array operand containing the scalar value. This is called - array broadcasting. It also works with arrays of lesser rank, - such as vectors. This implicit behavior is implemented in Numeric - 1 and will also be implemented in Numeric 2. + When an operation between a scalar and an array is done, the + implied behavior is to create a new array having the same shape as + the array operand containing the scalar value. This is called + array broadcasting. It also works with arrays of lesser rank, + such as vectors. This implicit behavior is implemented in Numeric + 1 and will also be implemented in Numeric 2. Design and Implementation +========================= - The design of Numeric 2 has four primary classes: +The design of Numeric 2 has four primary classes: - 1. ArrayType: +1. ArrayType: This is a simple class that describes the fundamental properties of an array-type, e.g. its name, its size in bytes, its coercion relations with respect to other types, etc., e.g. - > Int32 = ArrayType('Int32', 4, 'doc-string') + :: + + Int32 = ArrayType('Int32', 4, 'doc-string') Its relation to the other types is defined when the C-extension module for that type is imported. The corresponding Python code - is: + is:: - > Int32.astype[Real64] = Real64 + Int32.astype[Real64] = Real64 This says that the Real64 array-type has higher priority than the Int32 array-type. @@ -202,7 +209,8 @@ Design and Implementation implementation. Additional attributes can be added on an individual basis, e.g. .bitsize or .bitstrides for the bit type. - Attributes: + Attributes:: + .name: e.g. "Int32", "Float64", etc. .typecode: e.g. 'i', 'f', etc. (for backward compatibility) @@ -210,23 +218,26 @@ Design and Implementation .array_rules (mapping): rules between array types .pyobj_rules (mapping): rules between array and python types .doc: documentation string - Methods: + + Methods:: + __init__(): initialization __del__(): destruction __repr__(): representation - C-API: - This still needs to be fleshed-out. + C-API: This still needs to be fleshed-out. - 2. UFunc: +2. UFunc: This class is the heart of Numeric 2. Its design is similar to that of ArrayType in that the UFunc creates a singleton callable object whose attributes are name, total and input number of arguments, a document string, and an empty CFunc dictionary; e.g. - > add = UFunc('add', 3, 2, 'doc-string') + :: + + add = UFunc('add', 3, 2, 'doc-string') When defined the add instance has no C functions associated with it and therefore can do no work. The CFunc dictionary is @@ -235,23 +246,25 @@ Design and Implementation function name, function descriptor, and the CUFunc object. The corresponding Python code is - > add.register('add', (Int32, Int32, Int32), cfunc-add) + :: + + add.register('add', (Int32, Int32, Int32), cfunc-add) In the initialization function of an array type module, e.g. Int32, there are two C API functions: one to initialize the coercion rules and the other to register the CFunc objects. - When an operation is applied to some arrays, the __call__ method + When an operation is applied to some arrays, the ``__call__`` method is invoked. It gets the type of each array (if the output array is not given, it is created from the coercion rules) and checks the CFunc dictionary for a key that matches the argument types. If it exists the operation is performed immediately, otherwise the coercion rules are used to search for a related operation and set - of conversion functions. The __call__ method then invokes a + of conversion functions. The ``__call__`` method then invokes a compute method written in C to iterate over slices of each array, - namely: + namely:: - > _ufunc.compute(slice, data, func, swap, conv) + _ufunc.compute(slice, data, func, swap, conv) The 'func' argument is a CFuncObject, while the 'swap' and 'conv' arguments are lists of CFuncObjects for those arrays needing pre- @@ -260,7 +273,7 @@ Design and Implementation of iterations for each dimension along with the buffer offset and step size for each array and each dimension. - We have predefined several UFuncs for use by the __call__ method: + We have predefined several UFuncs for use by the ``__call__`` method: cast, swap, getobj, and setobj. The cast and swap functions do coercion and byte-swapping, respectively and the getobj and setobj functions do coercion between Numeric arrays and Python sequences. @@ -268,13 +281,16 @@ Design and Implementation The following attributes and methods are proposed for the core implementation. - Attributes: + Attributes:: + .name: e.g. "add", "subtract", etc. .nargs: number of total arguments .iargs: number of input arguments .cfuncs (mapping): the set C functions .doc: documentation string - Methods: + + Methods:: + __init__(): initialization __del__(): destruction __repr__(): representation @@ -284,27 +300,31 @@ Design and Implementation register(): register a CUFunc unregister(): unregister a CUFunc - C-API: - This still needs to be fleshed-out. + C-API: This still needs to be fleshed-out. - 3. Array: +3. Array: This class contains information about the array, such as shape, type, endian-ness of the data, etc.. Its operators, '+', '-', etc. just invoke the corresponding UFunc function, e.g. - > def __add__(self, other): - > return ufunc.add(self, other) + :: + + def __add__(self, other): + return ufunc.add(self, other) The following attributes, methods, and functions are proposed for the core implementation. - Attributes: + Attributes:: + .shape: shape of the array .format: type of the array .real (only complex): real part of a complex array .imag (only complex): imaginary part of a complex array - Methods: + + Methods:: + __init__(): initialization __del__(): destruction __repr_(): representation @@ -320,38 +340,39 @@ Design and Implementation aslist(): create list from array asstring(): create string from array - Functions: + Functions:: + fromlist(): create array from sequence fromstring(): create array from string array(): create array with shape and value concat(): concatenate two arrays resize(): resize array - C-API: - This still needs to be fleshed-out. + C-API: This still needs to be fleshed-out. - 4. ArrayView +4. ArrayView This class is similar to the Array class except that the reshape and flat methods will raise exceptions, since non-contiguous arrays cannot be reshaped or flattened using just pointer and step-size information. - C-API: - This still needs to be fleshed-out. + C-API: This still needs to be fleshed-out. - 5. C-extension modules: +5. C-extension modules: Numeric2 will have several C-extension modules. - a. _ufunc: + a. _ufunc: The primary module of this set is the _ufuncmodule.c. The intention of this module is to do the bare minimum, i.e. iterate over arrays using a specified C function. The interface of these functions is the same as Numeric 1, i.e. - int (*CFunc)(char *data, int *steps, int repeat, void *func); + :: + + int (*CFunc)(char *data, int *steps, int repeat, void *func); and their functionality is expected to be the same, i.e. they iterate over the inner-most dimension. @@ -361,13 +382,13 @@ Design and Implementation Attributes: - Methods: + Methods:: + compute(): - C-API: - This still needs to be fleshed-out. + C-API: This still needs to be fleshed-out. - b. _int32, _real64, etc.: + b. _int32, _real64, etc.: There will also be C-extension modules for each array type, e.g. _int32module.c, _real64module.c, etc. As mentioned @@ -379,8 +400,9 @@ Design and Implementation Open Issues +=========== - 1. Does slicing syntax default to copy or view behavior? +1. Does slicing syntax default to copy or view behavior? The default behavior of Python is to return a copy of a sub-list or tuple when slicing syntax is used, whereas Numeric 1 returns a @@ -400,19 +422,19 @@ Open Issues an ArrayView class also makes explicit what type of data the array contains. - 2. Does item syntax default to copy or view behavior? +2. Does item syntax default to copy or view behavior? - A similar question arises with the item syntax. For example, if a - = [[0,1,2], [3,4,5]] and b = a[0], then changing b[0] also changes - a[0][0], because a[0] is a reference or view of the first row of - a. Therefore, if c is a 2-d array, it would appear that c[i] + A similar question arises with the item syntax. For example, if + ``a = [[0,1,2], [3,4,5]]`` and ``b = a[0]``, then changing ``b[0]`` also changes + ``a[0][0]``, because ``a[0]`` is a reference or view of the first row of a. + Therefore, if c is a 2-d array, it would appear that ``c[i]`` should return a 1-d array which is a view into, instead of a copy - of, c for consistency. Yet, c[i] can be considered just a - shorthand for c[i,:] which would imply copy behavior assuming + of, c for consistency. Yet, ``c[i]`` can be considered just a + shorthand for ``c[i,:]`` which would imply copy behavior assuming slicing syntax returns a copy. Should Numeric 2 behave the same way as lists and return a view or should it return a copy. - 3. How is scalar coercion implemented? +3. How is scalar coercion implemented? Python has fewer numeric types than Numeric which can cause coercion problems. For example, when multiplying a Python scalar @@ -428,7 +450,7 @@ Open Issues array would return a Float64 (double) array. Operations between two arrays use normal coercion rules. - 4. How is integer division handled? +4. How is integer division handled? In a future version of Python, the behavior of integer division will change. The operands will be converted to floats, so the @@ -440,7 +462,7 @@ Open Issues familiar with the distinction between integer and float-point division, so should Numeric 2 continue with this behavior? - 5. How should records be implemented? +5. How should records be implemented? There are two approaches to implementing records depending on your point-of-view. The first is two divide arrays into separate @@ -471,7 +493,7 @@ Open Issues keyword argument should probably be changed to the more descriptive 'format' keyword. - a. How are record semantics defined and implemented? + a. How are record semantics defined and implemented? Which ever implementation approach is taken for records, the syntax and semantics of how they are to be accessed and @@ -489,15 +511,15 @@ Open Issues therefore be neglected for arrays comprised of simple types, like numeric. - 6. How are masked-arrays implemented? +6. How are masked-arrays implemented? Masked-arrays in Numeric 1 are implemented as a separate array class. With the ability to add new array types to Numeric 2, it is possible that masked-arrays in Numeric 2 could be implemented as a new array type instead of an array class. - 7. How are numerical errors handled (IEEE floating-point errors in - particular)? +7. How are numerical errors handled (IEEE floating-point errors in + particular)? It is not clear to the proposers (Paul Barrett and Travis Oliphant) what is the best or preferred way of handling errors. @@ -509,47 +531,48 @@ Open Issues the expense of performance. Therefore, we suggest several options: - a. Print a message of the most severe error, leaving it to + a. Print a message of the most severe error, leaving it to the user to locate the errors. - b. Print a message of all errors that occurred and the number + b. Print a message of all errors that occurred and the number of occurrences, leaving it to the user to locate the errors. - c. Print a message of all errors that occurred and a list of + c. Print a message of all errors that occurred and a list of where they occurred. - d. Or use a hybrid approach, printing only the most severe + d. Or use a hybrid approach, printing only the most severe error, yet keeping track of what and where the errors occurred. This would allow the user to locate the errors while keeping the error message brief. - 8. What features are needed to ease the integration of FORTRAN - libraries and code? +8. What features are needed to ease the integration of FORTRAN + libraries and code? - It would be a good idea at this stage to consider how to ease the - integration of FORTRAN libraries and user code in Numeric 2. +It would be a good idea at this stage to consider how to ease the +integration of FORTRAN libraries and user code in Numeric 2. Implementation Steps +==================== - 1. Implement basic UFunc capability +1. Implement basic UFunc capability - a. Minimal Array class: + a. Minimal Array class: Necessary class attributes and methods, e.g. .shape, .data, .type, etc. - b. Minimal ArrayType class: + b. Minimal ArrayType class: Int32, Real64, Complex64, Char, Object - c. Minimal UFunc class: + c. Minimal UFunc class: UFunc instantiation, CFunction registration, UFunc call for 1-D arrays including the rules for doing alignment, byte-swapping, and coercion. - d. Minimal C-extension module: + d. Minimal C-extension module: _UFunc, which does the innermost array loop in C. @@ -559,39 +582,40 @@ Implementation Steps information to a C iterator method and to do the actually computation. - 2. Continue enhancing the UFunc iterator and Array class +2. Continue enhancing the UFunc iterator and Array class - a. Implement some access methods for the Array class: - print, repr, getitem, setitem, etc. + a. Implement some access methods for the Array class: + print, repr, getitem, setitem, etc. - b. Implement multidimensional arrays + b. Implement multidimensional arrays - c. Implement some of basic Array methods using UFuncs: - +, -, *, /, etc. + c. Implement some of basic Array methods using UFuncs: + +, -, \*, /, etc. - d. Enable UFuncs to use Python sequences. + d. Enable UFuncs to use Python sequences. - 3. Complete the standard UFunc and Array class behavior +3. Complete the standard UFunc and Array class behavior - a. Implement getslice and setslice behavior + a. Implement getslice and setslice behavior - b. Work on Array broadcasting rules + b. Work on Array broadcasting rules - c. Implement Record type + c. Implement Record type - 4. Add additional functionality +4. Add additional functionality - a. Add more UFuncs + a. Add more UFuncs - b. Implement buffer or mmap access + b. Implement buffer or mmap access Incompatibilities +================= - The following is a list of incompatibilities in behavior between - Numeric 1 and Numeric 2. +The following is a list of incompatibilities in behavior between +Numeric 1 and Numeric 2. - 1. Scalar coercion rules +1. Scalar coercion rules Numeric 1 has single set of coercion rules for array and Python numeric types. This can cause unexpected and annoying problems @@ -600,27 +624,27 @@ Incompatibilities one for arrays and Python numeric types, and another just for arrays. - 2. No savespace attribute +2. No savespace attribute The savespace attribute in Numeric 1 makes arrays with this attribute set take precedence over those that do not have it set. Numeric 2 will not have such an attribute and therefore normal array coercion rules will be in effect. - 3. Slicing syntax returns a copy +3. Slicing syntax returns a copy The slicing syntax in Numeric 1 returns a view into the original array. The slicing behavior for Numeric 2 will be a copy. You should use the ArrayView class to get a view into an array. - 4. Boolean comparisons return a boolean array +4. Boolean comparisons return a boolean array A comparison between arrays in Numeric 1 results in a Boolean scalar, because of current limitations in Python. The advent of Rich Comparisons in Python 2.1 will allow an array of Booleans to be returned. - 5. Type characters are deprecated +5. Type characters are deprecated Numeric 2 will have an ArrayType class composed of Type instances, for example Int8, Int16, Int32, and Int for signed integers. The @@ -629,8 +653,9 @@ Incompatibilities Appendices +========== - A. Implicit sub-arrays iteration +A. Implicit sub-arrays iteration A computer animation is composed of a number of 2-D images or frames of identical shape. By stacking these images into a single @@ -651,35 +676,38 @@ Appendices Copyright +========= - This document is placed in the public domain. +This document is placed in the public domain. Related PEPs +============ - PEP 207: Rich Comparisons - by Guido van Rossum and David Ascher +* PEP 207: Rich Comparisons + by Guido van Rossum and David Ascher - PEP 208: Reworking the Coercion Model - by Neil Schemenauer and Marc-Andre' Lemburg +* PEP 208: Reworking the Coercion Model + by Neil Schemenauer and Marc-Andre' Lemburg - PEP 211: Adding New Linear Algebra Operators to Python - by Greg Wilson +* PEP 211: Adding New Linear Algebra Operators to Python + by Greg Wilson - PEP 225: Elementwise/Objectwise Operators - by Huaiyu Zhu +* PEP 225: Elementwise/Objectwise Operators + by Huaiyu Zhu - PEP 228: Reworking Python's Numeric Model - by Moshe Zadka +* PEP 228: Reworking Python's Numeric Model + by Moshe Zadka References +========== - [1] P. Greenfield 2000. private communication. +.. [1] P. Greenfield 2000. private communication. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + End: