PEP: 460 Title: Adding % and {} formatting to bytes Version: $Revision$ Last-Modified: $Date$ Author: Ethan Furman Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2014-01-13 Python-Version: 3.5 Post-History: 2014-01-13 Resolution: Abstract ======== This PEP proposes adding the % and {} formatting operations from str to bytes. Proposed semantics for bytes formatting ======================================= %-interpolation --------------- All the numeric formatting codes (such as %x, %o, %e, %f, %g, etc.) will be supported, and will work as they do for str, including the padding, justification and other related modifiers. Example:: >>> b'%4x' % 10 b' a' %c will insert a single byte, either from an int in range(256), or from a bytes argument of length 1. Example: >>> b'%c' % 48 b'0' >>> b'%c' % b'a' b'a' %s, because it is the most general, has the most convoluted resolution: - input type is bytes? pass it straight through - input type is numeric? use its __xxx__ [1] [2] method and ascii-encode it (strictly) - input type is something else? use its __bytes__ method; if there isn't one, raise an exception [3] Examples: >>> b'%s' % b'abc' b'abc' >>> b'%s' % 3.14 b'3.14' >>> b'%s' % 'hello world!' Traceback (most recent call last): ... TypeError: 'hello world' has no __bytes__ method, perhaps you need to encode it? .. note:: Because the str type does not have a __bytes__ method, attempts to directly use 'a string' as a bytes interpolation value will raise an exception. To use 'string' values, they must be encoded or otherwise transformed into a bytes sequence:: 'a string'.encode('latin-1') format ------ The format mini language will be used as-is, with the behaviors as listed for %-interpolation. Open Questions ============== For %s there has been some discussion of trying to use the buffer protocol (Py_buffer) before trying __bytes__. This question should be answered before the PEP is implemented. Proposed variations =================== It has been suggested to use %b for bytes instead of %s. - Rejected as %b does not exist in Python 2.x %-interpolation, which is why we are using %s. It has been proposed to automatically use .encode('ascii','strict') for str arguments to %s. - Rejected as this would lead to intermittent failures. Better to have the operation always fail so the trouble-spot can be correctly fixed. It has been proposed to have %s return the ascii-encoded repr when the value is a str (b'%s' % 'abc' --> b"'abc'"). - Rejected as this would lead to hard to debug failures far from the problem site. Better to have the operation always fail so the trouble-spot can be easily fixed. Foot notes ========== .. [1] Not sure if this should be the numeric __str__ or the numeric __repr__, or if there's any difference .. [2] Any proper numeric class would then have to provide an ascii representation of its value, either via __repr__ or __str__ (whichever we choose in [1]). .. [3] TypeError, ValueError, or UnicodeEncodeError? Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: