Added optional two-phase commit (TPC) API extension as proposed

by James Henstridge and discussed on the DB-SIG.

Lots of small text edits (XXX->*, prepend all methods with a dot,
more indents, etc.).

Updated note on the Python datetime module objects.
This commit is contained in:
Marc-André Lemburg 2008-03-03 12:37:19 +00:00
parent 989daa17e3
commit 6cc202c7bf
1 changed files with 232 additions and 96 deletions

View File

@ -27,11 +27,12 @@ Introduction
* Implementation Hints for Module Authors * Implementation Hints for Module Authors
* Optional DB API Extensions * Optional DB API Extensions
* Optional Error Handling Extensions * Optional Error Handling Extensions
* Optional Two-Phase Commit Extensions
* Frequently Asked Questions * Frequently Asked Questions
* Major Changes from Version 1.0 to Version 2.0 * Major Changes from Version 1.0 to Version 2.0
* Open Issues * Open Issues
* Footnotes * Footnotes
* Acknowledgements * Acknowledgments
Comments and questions about this specification may be directed Comments and questions about this specification may be directed
to the SIG for Database Interfacing with Python to the SIG for Database Interfacing with Python
@ -242,14 +243,14 @@ Connection Objects
Cursor Objects Cursor Objects
These objects represent a database cursor, which is used to These objects represent a database cursor, which is used to manage
manage the context of a fetch operation. Cursors created from the context of a fetch operation. Cursors created from the same
the same connection are not isolated, i.e., any changes connection are not isolated, i.e., any changes done to the
done to the database by a cursor are immediately visible by the database by a cursor are immediately visible by the other
other cursors. Cursors created from different connections can cursors. Cursors created from different connections can or can not
or can not be isolated, depending on how the transaction support be isolated, depending on how the transaction support is
is implemented (see also the connection's rollback() and commit() implemented (see also the connection's .rollback() and .commit()
methods.) methods).
Cursor Objects should respond to the following methods and Cursor Objects should respond to the following methods and
attributes: attributes:
@ -257,16 +258,26 @@ Cursor Objects
.description .description
This read-only attribute is a sequence of 7-item This read-only attribute is a sequence of 7-item
sequences. Each of these sequences contains information sequences.
describing one result column: (name, type_code,
display_size, internal_size, precision, scale, Each of these sequences contains information describing
null_ok). The first two items (name and type_code) are one result column:
mandatory, the other five are optional and must be set to
None if meaningfull values are not provided. (name,
type_code,
display_size,
internal_size,
precision,
scale,
null_ok)
The first two items (name and type_code) are mandatory,
the other five are optional and are set to None if no
meaningful values can be provided.
This attribute will be None for operations that This attribute will be None for operations that
do not return rows or if the cursor has not had an do not return rows or if the cursor has not had an
operation invoked via the executeXXX() method yet. operation invoked via the .execute*() method yet.
The type_code can be interpreted by comparing it to the The type_code can be interpreted by comparing it to the
Type Objects specified in the section below. Type Objects specified in the section below.
@ -274,13 +285,13 @@ Cursor Objects
.rowcount .rowcount
This read-only attribute specifies the number of rows that This read-only attribute specifies the number of rows that
the last executeXXX() produced (for DQL statements like the last .execute*() produced (for DQL statements like
'select') or affected (for DML statements like 'update' or 'select') or affected (for DML statements like 'update' or
'insert'). 'insert').
The attribute is -1 in case no executeXXX() has been The attribute is -1 in case no .execute*() has been
performed on the cursor or the rowcount of the last performed on the cursor or the rowcount of the last
operation is not determinable by the interface. [7] operation is cannot be determined by the interface. [7]
Note: Future versions of the DB API specification could Note: Future versions of the DB API specification could
redefine the latter case to have the object return None redefine the latter case to have the object return None
@ -300,7 +311,7 @@ Cursor Objects
The procedure may also provide a result set as The procedure may also provide a result set as
output. This must then be made available through the output. This must then be made available through the
standard fetchXXX() methods. standard .fetch*() methods.
.close() .close()
@ -324,7 +335,7 @@ Cursor Objects
but different parameters are bound to it (many times). but different parameters are bound to it (many times).
For maximum efficiency when reusing an operation, it is For maximum efficiency when reusing an operation, it is
best to use the setinputsizes() method to specify the best to use the .setinputsizes() method to specify the
parameter types and sizes ahead of time. It is legal for parameter types and sizes ahead of time. It is legal for
a parameter to not match the predefined information; the a parameter to not match the predefined information; the
implementation should compensate, possibly with a loss of implementation should compensate, possibly with a loss of
@ -332,7 +343,7 @@ Cursor Objects
The parameters may also be specified as list of tuples to The parameters may also be specified as list of tuples to
e.g. insert multiple rows in a single operation, but this e.g. insert multiple rows in a single operation, but this
kind of usage is deprecated: executemany() should be used kind of usage is deprecated: .executemany() should be used
instead. instead.
Return values are not defined. Return values are not defined.
@ -344,7 +355,7 @@ Cursor Objects
found in the sequence seq_of_parameters. found in the sequence seq_of_parameters.
Modules are free to implement this method using multiple Modules are free to implement this method using multiple
calls to the execute() method or by using array operations calls to the .execute() method or by using array operations
to have the database process the sequence as a whole in to have the database process the sequence as a whole in
one call. one call.
@ -354,7 +365,7 @@ Cursor Objects
an exception when it detects that a result set has been an exception when it detects that a result set has been
created by an invocation of the operation. created by an invocation of the operation.
The same comments as for execute() also apply accordingly The same comments as for .execute() also apply accordingly
to this method. to this method.
Return values are not defined. Return values are not defined.
@ -366,10 +377,10 @@ Cursor Objects
available. [6] available. [6]
An Error (or subclass) exception is raised if the previous An Error (or subclass) exception is raised if the previous
call to executeXXX() did not produce any result set or no call to .execute*() did not produce any result set or no
call was issued yet. call was issued yet.
fetchmany([size=cursor.arraysize]) .fetchmany([size=cursor.arraysize])
Fetch the next set of rows of a query result, returning a Fetch the next set of rows of a query result, returning a
sequence of sequences (e.g. a list of tuples). An empty sequence of sequences (e.g. a list of tuples). An empty
@ -384,14 +395,14 @@ Cursor Objects
returned. returned.
An Error (or subclass) exception is raised if the previous An Error (or subclass) exception is raised if the previous
call to executeXXX() did not produce any result set or no call to .execute*() did not produce any result set or no
call was issued yet. call was issued yet.
Note there are performance considerations involved with Note there are performance considerations involved with
the size parameter. For optimal performance, it is the size parameter. For optimal performance, it is
usually best to use the arraysize attribute. If the size usually best to use the arraysize attribute. If the size
parameter is used, then it is best for it to retain the parameter is used, then it is best for it to retain the
same value from one fetchmany() call to the next. same value from one .fetchmany() call to the next.
.fetchall() .fetchall()
@ -401,7 +412,7 @@ Cursor Objects
performance of this operation. performance of this operation.
An Error (or subclass) exception is raised if the previous An Error (or subclass) exception is raised if the previous
call to executeXXX() did not produce any result set or no call to .execute*() did not produce any result set or no
call was issued yet. call was issued yet.
.nextset() .nextset()
@ -419,23 +430,23 @@ Cursor Objects
result set. result set.
An Error (or subclass) exception is raised if the previous An Error (or subclass) exception is raised if the previous
call to executeXXX() did not produce any result set or no call to .execute*() did not produce any result set or no
call was issued yet. call was issued yet.
.arraysize .arraysize
This read/write attribute specifies the number of rows to This read/write attribute specifies the number of rows to
fetch at a time with fetchmany(). It defaults to 1 meaning fetch at a time with .fetchmany(). It defaults to 1
to fetch a single row at a time. meaning to fetch a single row at a time.
Implementations must observe this value with respect to Implementations must observe this value with respect to
the fetchmany() method, but are free to interact with the the .fetchmany() method, but are free to interact with the
database a single row at a time. It may also be used in database a single row at a time. It may also be used in
the implementation of executemany(). the implementation of .executemany().
.setinputsizes(sizes) .setinputsizes(sizes)
This can be used before a call to executeXXX() to This can be used before a call to .execute*() to
predefine memory areas for the operation's parameters. predefine memory areas for the operation's parameters.
sizes is specified as a sequence -- one item for each sizes is specified as a sequence -- one item for each
@ -446,7 +457,7 @@ Cursor Objects
area will be reserved for that column (this is useful to area will be reserved for that column (this is useful to
avoid predefined areas for large inputs). avoid predefined areas for large inputs).
This method would be used before the executeXXX() method This method would be used before the .execute*() method
is invoked. is invoked.
Implementations are free to have this method do nothing Implementations are free to have this method do nothing
@ -460,7 +471,7 @@ Cursor Objects
will set the default size for all large columns in the will set the default size for all large columns in the
cursor. cursor.
This method would be used before the executeXXX() method This method would be used before the .execute*() method
is invoked. is invoked.
Implementations are free to have this method do nothing Implementations are free to have this method do nothing
@ -475,7 +486,7 @@ Type Objects and Constructors
database in a particular string format. Similar problems exist database in a particular string format. Similar problems exist
for "Row ID" columns or large binary items (e.g. blobs or RAW for "Row ID" columns or large binary items (e.g. blobs or RAW
columns). This presents problems for Python since the parameters columns). This presents problems for Python since the parameters
to the executeXXX() method are untyped. When the database module to the .execute*() method are untyped. When the database module
sees a Python string object, it doesn't know if it should be bound sees a Python string object, it doesn't know if it should be bound
as a simple CHAR column, as a raw BINARY item, or as a DATE. as a simple CHAR column, as a raw BINARY item, or as a DATE.
@ -567,23 +578,11 @@ Type Objects and Constructors
Implementation Hints for Module Authors Implementation Hints for Module Authors
* The preferred object types for the date/time objects are those * Date/time objects can be implemented as Python datetime module
defined in the mxDateTime package. It provides all necessary objects (available since Python 2.3, with a C API since 2.4) or
constructors and methods both at Python and C level. using the mxDateTime package (available for all Python versions
since 1.5.2). They both provide all necessary constructors and
* The preferred object type for Binary objects are the methods at Python and C level.
buffer types available in standard Python starting with
version 1.5.2. Please see the Python documentation for
details. For information about the C interface have a
look at Include/bufferobject.h and
Objects/bufferobject.c in the Python source
distribution.
* Starting with Python 2.3, module authors can also use the object
types defined in the standard datetime module for date/time
processing. However, it should be noted that this does not
expose a C API like mxDateTime does which means that integration
with C based database modules is more difficult.
* Here is a sample implementation of the Unix ticks based * Here is a sample implementation of the Unix ticks based
constructors for date/time delegating work to the generic constructors for date/time delegating work to the generic
@ -592,13 +591,21 @@ Implementation Hints for Module Authors
import time import time
def DateFromTicks(ticks): def DateFromTicks(ticks):
return apply(Date,time.localtime(ticks)[:3]) return Date(*time.localtime(ticks)[:3])
def TimeFromTicks(ticks): def TimeFromTicks(ticks):
return apply(Time,time.localtime(ticks)[3:6]) return Time(*time.localtime(ticks)[3:6])
def TimestampFromTicks(ticks): def TimestampFromTicks(ticks):
return apply(Timestamp,time.localtime(ticks)[:6]) return Timestamp(*time.localtime(ticks)[:6])
* The preferred object type for Binary objects are the
buffer types available in standard Python starting with
version 1.5.2. Please see the Python documentation for
details. For information about the C interface have a
look at Include/bufferobject.h and
Objects/bufferobject.c in the Python source
distribution.
* This Python class allows implementing the above type * This Python class allows implementing the above type
objects even though the description type code field yields objects even though the description type code field yields
@ -675,17 +682,18 @@ Optional DB API Extensions
It has been proposed to make usage of these extensions optionally It has been proposed to make usage of these extensions optionally
visible to the programmer by issuing Python warnings through the visible to the programmer by issuing Python warnings through the
Python warning framework. To make this feature useful, the warning Python warning framework. To make this feature useful, the warning
messages must be standardized in order to be able to mask them. These messages must be standardized in order to be able to mask
standard messages are referred to below as "Warning Message". them. These standard messages are referred to below as "Warning
Message".
Cursor Attribute .rownumber Cursor Attribute .rownumber
This read-only attribute should provide the current 0-based This read-only attribute should provide the current 0-based
index of the cursor in the result set or None if the index cannot index of the cursor in the result set or None if the index
be determined. cannot be determined.
The index can be seen as index of the cursor in a sequence (the The index can be seen as index of the cursor in a sequence
result set). The next fetch operation will fetch the row (the result set). The next fetch operation will fetch the row
indexed by .rownumber in that sequence. indexed by .rownumber in that sequence.
Warning Message: "DB-API extension cursor.rownumber used" Warning Message: "DB-API extension cursor.rownumber used"
@ -740,7 +748,7 @@ Optional DB API Extensions
this cursor. this cursor.
The list is cleared by all standard cursor methods calls (prior The list is cleared by all standard cursor methods calls (prior
to executing the call) except for the .fetchXXX() calls to executing the call) except for the .fetch*() calls
automatically to avoid excessive memory usage and can also be automatically to avoid excessive memory usage and can also be
cleared by executing "del cursor.messages[:]". cleared by executing "del cursor.messages[:]".
@ -779,7 +787,8 @@ Optional DB API Extensions
Cursor Method .__iter__() Cursor Method .__iter__()
Return self to make cursors compatible to the iteration protocol. Return self to make cursors compatible to the iteration
protocol [8].
Warning Message: "DB-API extension cursor.__iter__() used" Warning Message: "DB-API extension cursor.__iter__() used"
@ -812,31 +821,151 @@ Optional Error Handling Extensions
Cursor/Connection Attribute .errorhandler Cursor/Connection Attribute .errorhandler
Read/write attribute which references an error handler to call Read/write attribute which references an error handler to call
in case an error condition is met. in case an error condition is met.
The handler must be a Python callable taking the following The handler must be a Python callable taking the following
arguments: errorhandler(connection, cursor, errorclass, arguments:
errorvalue) where connection is a reference to the connection
on which the cursor operates, cursor a reference to the cursor
(or None in case the error does not apply to a cursor),
errorclass is an error class which to instantiate using
errorvalue as construction argument.
The standard error handler should add the error information to errorhandler(connection, cursor, errorclass, errorvalue)
the appropriate .messages attribute (connection.messages or
cursor.messages) and raise the exception defined by the given
errorclass and errorvalue parameters.
If no errorhandler is set (the attribute is None), the standard where connection is a reference to the connection on which the
error handling scheme as outlined above, should be applied. cursor operates, cursor a reference to the cursor (or None in
case the error does not apply to a cursor), errorclass is an
error class which to instantiate using errorvalue as
construction argument.
Warning Message: "DB-API extension .errorhandler used" The standard error handler should add the error information to
the appropriate .messages attribute (connection.messages or
cursor.messages) and raise the exception defined by the given
errorclass and errorvalue parameters.
If no errorhandler is set (the attribute is None), the
standard error handling scheme as outlined above, should be
applied.
Warning Message: "DB-API extension .errorhandler used"
Cursors should inherit the .errorhandler setting from their Cursors should inherit the .errorhandler setting from their
connection objects at cursor creation time. connection objects at cursor creation time.
Optional Two-Phase Commit Extensions
Many databases have support for two-phase commit (TPC) which
allows managing transactions across multiple database connections
and other resources.
If a database backend provides support for two-phase commit and
the database module author wishes to expose this support, the
following API should be implemented. NotSupportedError should be
raised, if the database backend support for two-phase commit
can only be checked at run-time.
TPC Transaction IDs
As many databases follow the XA specification, transaction IDs
are formed from three components:
* a format ID
* a global transaction ID
* a branch qualifier
For a particular global transaction, the first two components
should be the same for all resources. Each resource in the
global transaction should be assigned a different branch
qualifier.
The various components must satisfy the following criteria:
* format ID: a non-negative 32-bit integer.
* global transaction ID and branch qualifier: byte strings no
longer than 64 characters.
Transaction IDs are created with the .xid() connection method:
.xid(format_id, global_transaction_id, branch_qualifier)
Returns a transaction ID object suitable for passing to the
.tpc_*() methods of this connection.
If the database connection does not support TPC, a
NotSupportedError is raised.
The type of the object returned by .xid() is not defined, but
it must provide sequence behaviour, allowing access to the
three components. A conforming database module could choose
to represent transaction IDs with tuples rather than a custom
object.
TPC Connection Methods
.tpc_begin(xid)
Begins a TPC transaction with the given transaction ID xid.
This method should be called outside of a transaction
(i.e. nothing may have executed since the last .commit() or
.rollback()).
Furthermore, it is an error to call .commit() or .rollback()
within the TPC transaction. A ProgrammingError is raised, if
the application calls .commit() or .rollback() during an
active TPC transaction.
If the database connection does not support TPC, a
NotSupportedError is raised.
.tpc_prepare()
Performs the first phase of a transaction started with
.tpc_begin(). A ProgrammingError should be raised if this
method outside of a TPC transaction.
After calling .tpc_prepare(), no statements can be executed
until tpc_commit() or tpc_rollback() have been called.
.tpc_commit([xid])
When called with no arguments, .tpc_commit() commits a TPC
transaction previously prepared with .tpc_prepare().
If .tpc_commit() is called prior to .tpc_prepare(), a single
phase commit is performed. A transaction manager may choose
to do this if only a single resource is participating in the
global transaction.
When called with a transaction ID xid, the database commits
the given transaction. If an invalid transaction ID is
provided, a ProgrammingError will be raised. This form should
be called outside of a transaction, and is intended for use in
recovery.
On return, the TPC transaction is ended.
.tpc_rollback([xid])
When called with no arguments, .tpc_rollback() rolls back a
TPC transaction. It may be called before or after
.tpc_prepare().
When called with a transaction ID xid, it rolls back the given
transaction. If an invalid transaction ID is provided, a
ProgrammingError is raised. This form should be called
outside of a transaction, and is intended for use in recovery.
On return, the TPC transaction is ended.
.tpc_recover()
Returns a list of pending transaction IDs suitable for use
with .tpc_commit(xid) or .tpc_rollback(xid).
If the database does not support transaction recovery, it may
return an empty list or raise NotSupportedError.
Frequently Asked Questions Frequently Asked Questions
The database SIG often sees reoccurring questions about the DB API The database SIG often sees reoccurring questions about the DB API
@ -846,7 +975,7 @@ Frequently Asked Questions
Question: Question:
How can I construct a dictionary out of the tuples returned by How can I construct a dictionary out of the tuples returned by
.fetchxxx(): .fetch*():
Answer: Answer:
@ -856,7 +985,7 @@ Frequently Asked Questions
as basis for the keys in the row dictionary. as basis for the keys in the row dictionary.
Note that the reason for not extending the DB API specification Note that the reason for not extending the DB API specification
to also support dictionary return values for the .fetchxxx() to also support dictionary return values for the .fetch*()
methods is that this approach has several drawbacks: methods is that this approach has several drawbacks:
* Some databases don't support case-sensitive column names or * Some databases don't support case-sensitive column names or
@ -891,8 +1020,8 @@ Major Changes from Version 1.0 to Version 2.0
found in modern SQL databases. found in modern SQL databases.
* New constants (apilevel, threadlevel, paramstyle) and * New constants (apilevel, threadlevel, paramstyle) and
methods (executemany, nextset) were added to provide better methods (.executemany(), .nextset()) were added to provide
database bindings. better database bindings.
* The semantics of .callproc() needed to call stored * The semantics of .callproc() needed to call stored
procedures are now clearly defined. procedures are now clearly defined.
@ -927,8 +1056,8 @@ Open Issues
* Define a useful return value for .nextset() for the case where * Define a useful return value for .nextset() for the case where
a new result set is available. a new result set is available.
* Create a fixed point numeric type for use as loss-less * Integrate the decimal module Decimal object for use as
monetary and decimal interchange format. loss-less monetary and decimal interchange format.
Footnotes Footnotes
@ -937,15 +1066,15 @@ Footnotes
implemented as keyword parameters for more intuitive use and implemented as keyword parameters for more intuitive use and
follow this order of parameters: follow this order of parameters:
dsn Data source name as string dsn Data source name as string
user User name as string (optional) user User name as string (optional)
password Password as string (optional) password Password as string (optional)
host Hostname (optional) host Hostname (optional)
database Database name (optional) database Database name (optional)
E.g. a connect could look like this: E.g. a connect could look like this:
connect(dsn='myhost:MYDB',user='guido',password='234$') connect(dsn='myhost:MYDB',user='guido',password='234$')
[2] Module implementors should prefer 'numeric', 'named' or [2] Module implementors should prefer 'numeric', 'named' or
'pyformat' over the other formats because these offer more 'pyformat' over the other formats because these offer more
@ -970,7 +1099,7 @@ Footnotes
[4] a database interface may choose to support named cursors by [4] a database interface may choose to support named cursors by
allowing a string argument to the method. This feature is allowing a string argument to the method. This feature is
not part of the specification, since it complicates not part of the specification, since it complicates
semantics of the .fetchXXX() methods. semantics of the .fetch*() methods.
[5] The module will use the __getitem__ method of the parameters [5] The module will use the __getitem__ method of the parameters
object to map either positions (integers) or names (strings) object to map either positions (integers) or names (strings)
@ -992,7 +1121,11 @@ Footnotes
[7] The rowcount attribute may be coded in a way that updates [7] The rowcount attribute may be coded in a way that updates
its value dynamically. This can be useful for databases that its value dynamically. This can be useful for databases that
return usable rowcount values only after the first call to return usable rowcount values only after the first call to
a .fetchXXX() method. a .fetch*() method.
[8] Implementation Note: Python C extensions will have to
implement the tp_iter slot on the cursor object instead of the
.__iter__() method.
Acknowledgements Acknowledgements
@ -1000,6 +1133,9 @@ Acknowledgements
Database API Specification 2.0 from the original HTML format into Database API Specification 2.0 from the original HTML format into
the PEP format. the PEP format.
Many thanks to James Henstridge for leading the discussion which
led to the standardization of the two-phase commit API extensions.
Copyright Copyright
This document has been placed in the Public Domain. This document has been placed in the Public Domain.