diff --git a/pep-0452.txt b/pep-0452.txt index 61d91954f..221613e2c 100644 --- a/pep-0452.txt +++ b/pep-0452.txt @@ -5,245 +5,258 @@ Last-Modified: $Date$ Author: A.M. Kuchling , Christian Heimes Status: Draft Type: Informational +Content-Type: text/x-rst Created: 15-Aug-2013 Post-History: Replaces: 247 Abstract +======== - There are several different modules available that implement - cryptographic hashing algorithms such as MD5 or SHA. This - document specifies a standard API for such algorithms, to make it - easier to switch between different implementations. +There are several different modules available that implement +cryptographic hashing algorithms such as MD5 or SHA. This +document specifies a standard API for such algorithms, to make it +easier to switch between different implementations. Specification +============= - All hashing modules should present the same interface. Additional - methods or variables can be added, but those described in this - document should always be present. +All hashing modules should present the same interface. Additional +methods or variables can be added, but those described in this +document should always be present. - Hash function modules define one function: +Hash function modules define one function: - new([string]) (unkeyed hashes) - new(key, [string], [digestmod]) (keyed hashes) +``new([string]) (unkeyed hashes)`` - Create a new hashing object and return it. The first form is - for hashes that are unkeyed, such as MD5 or SHA. For keyed - hashes such as HMAC, 'key' is a required parameter containing - a string giving the key to use. In both cases, the optional - 'string' parameter, if supplied, will be immediately hashed - into the object's starting state, as if obj.update(string) was - called. +``new(key, [string], [digestmod]) (keyed hashes)`` + Create a new hashing object and return it. The first form is + for hashes that are unkeyed, such as MD5 or SHA. For keyed + hashes such as HMAC, 'key' is a required parameter containing + a string giving the key to use. In both cases, the optional + 'string' parameter, if supplied, will be immediately hashed + into the object's starting state, as if ``obj.update(string)`` was + called. - After creating a hashing object, arbitrary bytes can be fed - into the object using its update() method, and the hash value - can be obtained at any time by calling the object's digest() - method. + After creating a hashing object, arbitrary bytes can be fed + into the object using its ``update()`` method, and the hash value + can be obtained at any time by calling the object's ``digest()`` + method. - Although the parameter is called 'string', hashing objects operate - on 8-bit data only. Both 'key' and 'string' must be a bytes-like - object (bytes, bytearray...). A hashing object may support - one-dimensional, contiguous buffers as argument, too. Text - (unicode) is no longer supported in Python 3.x. Python 2.x - implementations may take ASCII-only unicode as argument, but - portable code should not rely on the feature. + Although the parameter is called 'string', hashing objects operate + on 8-bit data only. Both 'key' and 'string' must be a bytes-like + object (bytes, bytearray...). A hashing object may support + one-dimensional, contiguous buffers as argument, too. Text + (unicode) is no longer supported in Python 3.x. Python 2.x + implementations may take ASCII-only unicode as argument, but + portable code should not rely on the feature. - Arbitrary additional keyword arguments can be added to this - function, but if they're not supplied, sensible default values - should be used. For example, 'rounds' and 'digest_size' - keywords could be added for a hash function which supports a - variable number of rounds and several different output sizes, - and they should default to values believed to be secure. + Arbitrary additional keyword arguments can be added to this + function, but if they're not supplied, sensible default values + should be used. For example, 'rounds' and 'digest_size' + keywords could be added for a hash function which supports a + variable number of rounds and several different output sizes, + and they should default to values believed to be secure. - Hash function modules define one variable: +Hash function modules define one variable: - digest_size +``digest_size`` + An integer value; the size of the digest produced by the + hashing objects created by this module, measured in bytes. + You could also obtain this value by creating a sample object + and accessing its 'digest_size' attribute, but it can be + convenient to have this value available from the module. + Hashes with a variable output size will set this variable to + None. - An integer value; the size of the digest produced by the - hashing objects created by this module, measured in bytes. - You could also obtain this value by creating a sample object - and accessing its 'digest_size' attribute, but it can be - convenient to have this value available from the module. - Hashes with a variable output size will set this variable to - None. +Hashing objects require the following attribute: - Hashing objects require the following attribute: +``digest_size`` + This attribute is identical to the module-level digest_size + variable, measuring the size of the digest produced by the + hashing object, measured in bytes. If the hash has a variable + output size, this output size must be chosen when the hashing + object is created, and this attribute must contain the + selected size. Therefore, ``None`` is **not** a legal value for this + attribute. - digest_size +``block_size`` + An integer value or ``NotImplemented``; the internal block size + of the hash algorithm in bytes. The block size is used by the + HMAC module to pad the secret key to ``digest_size`` or to hash the + secret key if it is longer than ``digest_size``. If no HMAC + algorithm is standardized for the hash algorithm, return + ``NotImplemented`` instead. - This attribute is identical to the module-level digest_size - variable, measuring the size of the digest produced by the - hashing object, measured in bytes. If the hash has a variable - output size, this output size must be chosen when the hashing - object is created, and this attribute must contain the - selected size. Therefore, None is *not* a legal value for this - attribute. +``name`` + A text string value; the canonical, lowercase name of the hashing + algorithm. The name should be a suitable parameter for + ``hashlib.new``. - block_size +Hashing objects require the following methods: - An integer value or ``NotImplemented``; the internal block size - of the hash algorithm in bytes. The block size is used by the - HMAC module to pad the secret key to digest_size or to hash the - secret key if it is longer than digest_size. If no HMAC - algorithm is standardized for the hash algorithm, return - ``NotImplemented`` instead. +``copy()`` + Return a separate copy of this hashing object. An update to + this copy won't affect the original object. - name +``digest()`` + Return the hash value of this hashing object as a bytes + containing 8-bit data. The object is not altered in any way + by this function; you can continue updating the object after + calling this function. - A text string value; the canonical, lowercase name of the hashing - algorithm. The name should be a suitable parameter for - :func:`hashlib.new`. +``hexdigest()`` + Return the hash value of this hashing object as a string + containing hexadecimal digits. Lowercase letters should be used + for the digits 'a' through 'f'. Like the ``.digest()`` method, this + method mustn't alter the object. - Hashing objects require the following methods: +``update(string)`` + Hash bytes-like 'string' into the current state of the hashing + object. ``update()`` can be called any number of times during a + hashing object's lifetime. - copy() +Hashing modules can define additional module-level functions or +object methods and still be compliant with this specification. - Return a separate copy of this hashing object. An update to - this copy won't affect the original object. +Here's an example, using a module named 'MD5':: - digest() - - Return the hash value of this hashing object as a bytes - containing 8-bit data. The object is not altered in any way - by this function; you can continue updating the object after - calling this function. - - hexdigest() - - Return the hash value of this hashing object as a string - containing hexadecimal digits. Lowercase letters should be used - for the digits 'a' through 'f'. Like the .digest() method, this - method mustn't alter the object. - - update(string) - - Hash bytes-like 'string' into the current state of the hashing - object. update() can be called any number of times during a - hashing object's lifetime. - - Hashing modules can define additional module-level functions or - object methods and still be compliant with this specification. - - Here's an example, using a module named 'MD5': - - >>> import hashlib - >>> from Crypto.Hash import MD5 - >>> m = MD5.new() - >>> isinstance(m, hashlib.CryptoHash) - True - >>> m.name - 'md5' - >>> m.digest_size - 16 - >>> m.block_size - 64 - >>> m.update(b'abc') - >>> m.digest() - b'\x90\x01P\x98<\xd2O\xb0\xd6\x96?}(\xe1\x7fr' - >>> m.hexdigest() - '900150983cd24fb0d6963f7d28e17f72' - >>> MD5.new(b'abc').digest() - b'\x90\x01P\x98<\xd2O\xb0\xd6\x96?}(\xe1\x7fr' + >>> import hashlib + >>> from Crypto.Hash import MD5 + >>> m = MD5.new() + >>> isinstance(m, hashlib.CryptoHash) + True + >>> m.name + 'md5' + >>> m.digest_size + 16 + >>> m.block_size + 64 + >>> m.update(b'abc') + >>> m.digest() + b'\x90\x01P\x98<\xd2O\xb0\xd6\x96?}(\xe1\x7fr' + >>> m.hexdigest() + '900150983cd24fb0d6963f7d28e17f72' + >>> MD5.new(b'abc').digest() + b'\x90\x01P\x98<\xd2O\xb0\xd6\x96?}(\xe1\x7fr' Rationale +========= - The digest size is measured in bytes, not bits, even though hash - algorithm sizes are usually quoted in bits; MD5 is a 128-bit - algorithm and not a 16-byte one, for example. This is because, in - the sample code I looked at, the length in bytes is often needed - (to seek ahead or behind in a file; to compute the length of an - output string) while the length in bits is rarely used. - Therefore, the burden will fall on the few people actually needing - the size in bits, who will have to multiply digest_size by 8. +The digest size is measured in bytes, not bits, even though hash +algorithm sizes are usually quoted in bits; MD5 is a 128-bit +algorithm and not a 16-byte one, for example. This is because, in +the sample code I looked at, the length in bytes is often needed +(to seek ahead or behind in a file; to compute the length of an +output string) while the length in bits is rarely used. +Therefore, the burden will fall on the few people actually needing +the size in bits, who will have to multiply digest_size by 8. - It's been suggested that the update() method would be better named - append(). However, that method is really causing the current - state of the hashing object to be updated, and update() is already - used by the md5 and sha modules included with Python, so it seems - simplest to leave the name update() alone. +It's been suggested that the ``update()`` method would be better named +``append()``. However, that method is really causing the current +state of the hashing object to be updated, and ``update()`` is already +used by the md5 and sha modules included with Python, so it seems +simplest to leave the name ``update()`` alone. - The order of the constructor's arguments for keyed hashes was a - sticky issue. It wasn't clear whether the key should come first - or second. It's a required parameter, and the usual convention is - to place required parameters first, but that also means that the - 'string' parameter moves from the first position to the second. - It would be possible to get confused and pass a single argument to - a keyed hash, thinking that you're passing an initial string to an - unkeyed hash, but it doesn't seem worth making the interface - for keyed hashes more obscure to avoid this potential error. +The order of the constructor's arguments for keyed hashes was a +sticky issue. It wasn't clear whether the key should come first +or second. It's a required parameter, and the usual convention is +to place required parameters first, but that also means that the +'string' parameter moves from the first position to the second. +It would be possible to get confused and pass a single argument to +a keyed hash, thinking that you're passing an initial string to an +unkeyed hash, but it doesn't seem worth making the interface +for keyed hashes more obscure to avoid this potential error. Changes from Version 1.0 to Version 2.0 +======================================= - Version 2.0 of API for Cryptographic Hash Functions clarifies some - aspects of the API and brings it up-to-date. It also formalized aspects - that were already de facto standards and provided by most - implementations. +Version 2.0 of API for Cryptographic Hash Functions clarifies some +aspects of the API and brings it up-to-date. It also formalized aspects +that were already de facto standards and provided by most +implementations. - Version 2.0 introduces the following new attributes: +Version 2.0 introduces the following new attributes: - name +``name`` + The name property was made mandatory by `issue 18532`_. - The name property was made mandatory by :issue:`18532`. +``block_size`` + The new version also specifies that the return value + ``NotImplemented`` prevents HMAC support. - block_size - - The new version also specifies that the return value - ``NotImplemented`` prevents HMAC support. - - Version 2.0 takes the separation of binary and text data in Python - 3.0 into account. The 'string' argument to new() and update() as - well as the 'key' argument must be bytes-like objects. On Python - 2.x a hashing object may also support ASCII-only unicode. The actual - name of argument is not changed as it is part of the public API. - Code may depend on the fact that the argument is called 'string'. +Version 2.0 takes the separation of binary and text data in Python +3.0 into account. The 'string' argument to ``new()`` and ``update()`` as +well as the 'key' argument must be bytes-like objects. On Python +2.x a hashing object may also support ASCII-only unicode. The actual +name of argument is not changed as it is part of the public API. +Code may depend on the fact that the argument is called 'string'. Recommended names for common hashing algorithms +=============================================== - algorithm variant recommended name - ---------- --------- ---------------- - MD5 md5 - RIPEMD-160 ripemd160 - SHA-1 sha1 - SHA-2 SHA-224 sha224 - SHA-256 sha256 - SHA-384 sha384 - SHA-512 sha512 - SHA-3 SHA-3-224 sha3_224 - SHA-3-256 sha3_256 - SHA-3-384 sha3_384 - SHA-3-512 sha3_512 - WHIRLPOOL whirlpool ++------------+------------+-------------------+ +| algorithm | variant | recommended name | ++============+============+===================+ +| MD5 | | md5 | ++------------+------------+-------------------+ +| RIPEMD-160 | | ripemd160 | ++------------+------------+-------------------+ +| SHA-1 | | sha1 | ++------------+------------+-------------------+ +| SHA-2 | SHA-224 | sha224 | ++ +------------+-------------------+ +| | SHA-256 | sha256 | ++ +------------+-------------------+ +| | SHA-384 | sha384 | ++ +------------+-------------------+ +| | SHA-512 | sha512 | ++------------+------------+-------------------+ +| SHA-3 | SHA-3-224 | sha3_224 | ++ +------------+-------------------+ +| | SHA-3-256 | sha3_256 | ++ +------------+-------------------+ +| | SHA-3-384 | sha3_384 | ++ +------------+-------------------+ +| | SHA-3-512 | sha3_512 | ++------------+------------+-------------------+ +| WHIRLPOOL | | whirlpool | ++------------+------------+-------------------+ Changes +======= - 2001-09-17: Renamed clear() to reset(); added digest_size attribute - to objects; added .hexdigest() method. - 2001-09-20: Removed reset() method completely. - 2001-09-28: Set digest_size to None for variable-size hashes. - 2013-08-15: Added block_size and name attributes; clarified that - 'string' actually referes to bytes-like objects. +* 2001-09-17: Renamed ``clear()`` to ``reset()``; added ``digest_size`` attribute + to objects; added ``.hexdigest()`` method. +* 2001-09-20: Removed ``reset()`` method completely. +* 2001-09-28: Set ``digest_size`` to ``None`` for variable-size hashes. +* 2013-08-15: Added ``block_size`` and ``name`` attributes; clarified that + 'string' actually referes to bytes-like objects. Acknowledgements +================ - Thanks to Aahz, Andrew Archibald, Rich Salz, Itamar - Shtull-Trauring, and the readers of the python-crypto list for - their comments on this PEP. +Thanks to Aahz, Andrew Archibald, Rich Salz, Itamar +Shtull-Trauring, and the readers of the python-crypto list for +their comments on this PEP. Copyright +========= - This document has been placed in the public domain. +This document has been placed in the public domain. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -End: +.. _issue 18532: http://bugs.python.org/issue18532 +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + End: