python-peps/peps/pep-0247.rst

174 lines
6.0 KiB
ReStructuredText

PEP: 247
Title: API for Cryptographic Hash Functions
Version: $Revision$
Last-Modified: $Date$
Author: A.M. Kuchling <amk@amk.ca>
Status: Final
Type: Informational
Content-Type: text/x-rst
Created: 23-Mar-2001
Post-History: 20-Sep-2001
Abstract
========
There are several different modules available that implement cryptographic
hashing algorithms such as MD5 or SHA. This document specifies a standard API
for such algorithms, to make it easier to switch between different
implementations.
Specification
=============
All hashing modules should present the same interface. Additional methods or
variables can be added, but those described in this document should always be
present.
Hash function modules define one function:
| ``new([string]) (unkeyed hashes)``
| ``new([key] , [string]) (keyed hashes)``
Create a new hashing object and return it. The first form is for hashes
that are unkeyed, such as MD5 or SHA. For keyed hashes such as HMAC, *key*
is a required parameter containing a string giving the key to use. In both
cases, the optional *string* parameter, if supplied, will be immediately
hashed into the object's starting state, as if ``obj.update(string)``
was called.
After creating a hashing object, arbitrary strings can be fed into the
object using its ``update()`` method, and the hash value can be obtained at
any time by calling the object's ``digest()`` method.
Arbitrary additional keyword arguments can be added to this function, but if
they're not supplied, sensible default values should be used. For example,
``rounds`` and ``digest_size`` keywords could be added for a hash function
which supports a variable number of rounds and several different output
sizes, and they should default to values believed to be secure.
Hash function modules define one variable:
| ``digest_size``
An integer value; the size of the digest produced by the hashing objects
created by this module, measured in bytes. You could also obtain this value
by creating a sample object and accessing its ``digest_size`` attribute, but
it can be convenient to have this value available from the module. Hashes
with a variable output size will set this variable to ``None``.
Hashing objects require a single attribute:
| ``digest_size``
This attribute is identical to the module-level ``digest_size`` variable,
measuring the size of the digest produced by the hashing object, measured in
bytes. If the hash has a variable output size, this output size must be
chosen when the hashing object is created, and this attribute must contain
the selected size. Therefore, ``None`` is *not* a legal value for this
attribute.
Hashing objects require the following methods:
| ``copy()``
Return a separate copy of this hashing object. An update to this copy won't
affect the original object.
| ``digest()``
Return the hash value of this hashing object as a string containing 8-bit
data. The object is not altered in any way by this function; you can
continue updating the object after calling this function.
| ``hexdigest()``
Return the hash value of this hashing object as a string containing
hexadecimal digits. Lowercase letters should be used for the digits ``a``
through ``f``. Like the ``.digest()`` method, this method mustn't alter the
object.
| ``update(string)``
Hash *string* into the current state of the hashing object. ``update()`` can
be called any number of times during a hashing object's lifetime.
Hashing modules can define additional module-level functions or object methods
and still be compliant with this specification.
Here's an example, using a module named ``MD5``::
>>> from Crypto.Hash import MD5
>>> m = MD5.new()
>>> m.digest_size
16
>>> m.update('abc')
>>> m.digest()
'\x90\x01P\x98<\xd2O\xb0\xd6\x96?}(\xe1\x7fr'
>>> m.hexdigest()
'900150983cd24fb0d6963f7d28e17f72'
>>> MD5.new('abc').digest()
'\x90\x01P\x98<\xd2O\xb0\xd6\x96?}(\xe1\x7fr'
Rationale
=========
The digest size is measured in bytes, not bits, even though hash algorithm
sizes are usually quoted in bits; MD5 is a 128-bit algorithm and not a 16-byte
one, for example. This is because, in the sample code I looked at, the length
in bytes is often needed (to seek ahead or behind in a file; to compute the
length of an output string) while the length in bits is rarely used. Therefore,
the burden will fall on the few people actually needing the size in bits, who
will have to multiply ``digest_size`` by 8.
It's been suggested that the ``update()`` method would be better named
``append()``. However, that method is really causing the current state of the
hashing object to be updated, and ``update()`` is already used by the md5 and
sha modules included with Python, so it seems simplest to leave the name
``update()`` alone.
The order of the constructor's arguments for keyed hashes was a sticky issue.
It wasn't clear whether the *key* should come first or second. It's a required
parameter, and the usual convention is to place required parameters first, but
that also means that the *string* parameter moves from the first position to
the second. It would be possible to get confused and pass a single argument to
a keyed hash, thinking that you're passing an initial string to an unkeyed
hash, but it doesn't seem worth making the interface for keyed hashes more
obscure to avoid this potential error.
Changes
=======
2001-09-17: Renamed ``clear()`` to ``reset()``; added ``digest_size`` attribute
to objects; added ``.hexdigest()`` method.
2001-09-20: Removed ``reset()`` method completely.
2001-09-28: Set ``digest_size`` to ``None`` for variable-size hashes.
Acknowledgements
================
Thanks to Aahz, Andrew Archibald, Rich Salz, Itamar Shtull-Trauring, and the
readers of the python-crypto list for their comments on this PEP.
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
End: