PEP: 3123
Title: Making PyObject_HEAD conform to standard C
Version: $Revision$
Last-Modified: $Date$
Author: Martin von Löwis <martin@v.loewis.de>
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 27-Apr-2007
Python-Version: 3.0
Post-History:

Abstract
========

Python currently relies on undefined C behavior, with its
usage of ``PyObject_HEAD``. This PEP proposes to change that
into standard C.

Rationale
=========

Standard C defines that an object must be accessed only through a
pointer of its type, and that all other accesses are undefined
behavior, with a few exceptions. In particular, the following
code has undefined behavior::

  struct FooObject{
    PyObject_HEAD
    int data;
  };

  PyObject *foo(struct FooObject*f){
   return (PyObject*)f;
  }

  int bar(){
   struct FooObject *f = malloc(sizeof(struct FooObject));
   struct PyObject *o = foo(f);
   f->ob_refcnt = 0;
   o->ob_refcnt = 1;
   return f->ob_refcnt;
  }

The problem here is that the storage is both accessed as
if it where struct ``PyObject``, and as struct ``FooObject``.

Historically, compilers did not have any problems with this
code. However, modern compilers use that clause as an 
optimization opportunity, finding that ``f->ob_refcnt`` and
``o->ob_refcnt`` cannot possibly refer to the same memory, and
that therefore the function should return 0, without having
to fetch the value of ob_refcnt at all in the return
statement. For GCC, Python now uses ``-fno-strict-aliasing``
to work around that problem; with other compilers, it
may just see undefined behavior. Even with GCC, using
``-fno-strict-aliasing`` may pessimize the generated code
unnecessarily.

Specification
=============

Standard C has one specific exception to its aliasing rules precisely
designed to support the case of Python: a value of a struct type may
also be accessed through a pointer to the first field. E.g. if a
struct starts with an ``int``, the ``struct *`` may also be cast to
an ``int *``, allowing to write int values into the first field.

For Python, ``PyObject_HEAD`` and ``PyObject_VAR_HEAD`` will be changed
to not list all fields anymore, but list a single field of type
``PyObject``/``PyVarObject``::

  typedef struct _object {
    _PyObject_HEAD_EXTRA
    Py_ssize_t ob_refcnt;
    struct _typeobject *ob_type;
  } PyObject;

  typedef struct {
    PyObject ob_base;
    Py_ssize_t ob_size;
  } PyVarObject;

  #define PyObject_HEAD        PyObject ob_base;
  #define PyObject_VAR_HEAD    PyVarObject ob_base;

Types defined as fixed-size structure will then include PyObject
as its first field, PyVarObject for variable-sized objects. E.g.::

  typedef struct {
    PyObject ob_base;
    PyObject *start, *stop, *step;
  } PySliceObject;

  typedef struct {
    PyVarObject ob_base;
    PyObject **ob_item;
    Py_ssize_t allocated;
  } PyListObject;

The above definitions of ``PyObject_HEAD`` are normative, so extension
authors MAY either use the macro, or put the ``ob_base`` field explicitly
into their structs.

As a convention, the base field SHOULD be called ob_base. However, all
accesses to ob_refcnt and ob_type MUST cast the object pointer to
PyObject* (unless the pointer is already known to have that type), and
SHOULD use the respective accessor macros. To simplify access to
ob_type, ob_refcnt, and ob_size, macros::

  #define Py_TYPE(o)    (((PyObject*)(o))->ob_type)
  #define Py_REFCNT(o)  (((PyObject*)(o))->ob_refcnt)
  #define Py_SIZE(o)    (((PyVarObject*)(o))->ob_size)

are added. E.g. the code blocks ::

  #define PyList_CheckExact(op) ((op)->ob_type == &PyList_Type)

  return func->ob_type->tp_name;

needs to be changed to::

  #define PyList_CheckExact(op) (Py_TYPE(op) == &PyList_Type)

  return Py_TYPE(func)->tp_name;

For initialization of type objects, the current sequence ::

  PyObject_HEAD_INIT(NULL)
  0, /* ob_size */

becomes incorrect, and must be replaced with ::

  PyVarObject_HEAD_INIT(NULL, 0)

Compatibility with Python 2.6
=============================

To support modules that compile with both Python 2.6 and Python 3.0,
the ``Py_*`` macros are added to Python 2.6. The macros ``Py_INCREF``
and ``Py_DECREF`` will be changed to cast their argument to ``PyObject *``,
so that module authors can also explicitly declare the ``ob_base``
field in modules designed for Python 2.6. 

Copyright
=========

This document has been placed in the public domain.



..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End: