WSGI is now Python 3-friendly. This does not cover the other planned

addenda/errata, and it may need more work even on these bits, but it
is now begun.  (Many thanks to Graham and Ian.)
This commit is contained in:
Phillip J. Eby 2010-09-25 19:44:55 +00:00
parent c9e8207628
commit 71b23a6320
1 changed files with 101 additions and 42 deletions

View File

@ -142,6 +142,51 @@ callable was provided to it. Callables are only to be called, not
introspected upon.
A Note On String Types
----------------------
In general, HTTP deals with bytes, which means that this specification
is mostly about handling bytes.
However, the content of those bytes often has some kind of textual
interpretation, and in Python, strings are the most convenient way
to handle text.
But in many Python versions and implementations, strings are Unicode,
rather than bytes. This requires a careful balance between a usable
API and correct translations between bytes and text in the context of
HTTP... especially to support porting code between Python
implementations with different ``str`` types.
WSGI therefore defines two kinds of "string":
* "Native" strings (which are always implemented using the type
named ``str``) that are used for request/response headers and
metadata
* "Bytestrings" (which are implemented using the ``bytes`` type
in Python 3, and ``str`` elsewhere), that are used for the bodies
of requests and responses (e.g. POST/PUT input data and HTML page
outputs).
Do not be confused however: even if Python's ``str`` type is actually
Unicode "under the hood", the *content* of native strings must
still be translatable to bytes via the Latin-1 encoding! (See
the section on `Unicode Issues`_ later in this document for more
details.)
In short: where you see the word "string" in this document, it refers
to a "native" string, i.e., an object of type ``str``, whether it is
internally implemented as bytes or unicode. Where you see references
to "bytestring", this should be read as "an object of type ``bytes``
under Python 3, or type ``str`` under Python 2".
And so, even though HTTP is in some sense "really just bytes", there
are many API conveniences to be had by using whatever Python's
default ``str`` type is.
The Application/Framework Side
------------------------------
@ -164,13 +209,15 @@ support application developers.)
Here are two example application objects; one is a function, and the
other is a class::
# this would need to be a byte string in Python 3:
HELLO_WORLD = "Hello world!\n"
def simple_app(environ, start_response):
"""Simplest possible application object"""
status = '200 OK'
response_headers = [('Content-type', 'text/plain')]
start_response(status, response_headers)
return ['Hello world!\n']
return [HELLO_WORLD]
class AppClass:
"""Produce the same output, but using a class
@ -195,7 +242,7 @@ other is a class::
status = '200 OK'
response_headers = [('Content-type', 'text/plain')]
self.start(status, response_headers)
yield "Hello world!\n"
yield HELLO_WORLD
The Server/Gateway Side
@ -243,7 +290,7 @@ server.
sys.stdout.write('%s: %s\r\n' % header)
sys.stdout.write('\r\n')
sys.stdout.write(data)
sys.stdout.write(data) # TODO: this needs to be binary on Py3
sys.stdout.flush()
def start_response(status, response_headers, exc_info=None):
@ -326,7 +373,7 @@ a block boundary.)
"""Transform iterated output to piglatin, if it's okay to do so
Note that the "okayness" can change until the application yields
its first non-empty string, so 'transform_ok' has to be a mutable
its first non-empty bytestring, so 'transform_ok' has to be a mutable
truth value.
"""
@ -341,7 +388,7 @@ a block boundary.)
def next(self):
if self.transform_ok:
return piglatin(self._next())
return piglatin(self._next()) # call must be byte-safe on Py3
else:
return self._next()
@ -376,7 +423,7 @@ a block boundary.)
if transform_ok:
def write_latin(data):
write(piglatin(data))
write(piglatin(data)) # call must be byte-safe on Py3
return write_latin
else:
return write
@ -426,7 +473,7 @@ It is used only when the application has trapped an error and is
attempting to display an error message to the browser.
The ``start_response`` callable must return a ``write(body_data)``
callable that takes one positional parameter: a string to be written
callable that takes one positional parameter: a bytestring to be written
as part of the HTTP response body. (Note: the ``write()`` callable is
provided only to support certain existing frameworks' imperative output
APIs; it should not be used by new applications or frameworks if it
@ -434,24 +481,24 @@ can be avoided. See the `Buffering and Streaming`_ section for more
details.)
When called by the server, the application object must return an
iterable yielding zero or more strings. This can be accomplished in a
variety of ways, such as by returning a list of strings, or by the
application being a generator function that yields strings, or
iterable yielding zero or more bytestrings. This can be accomplished in a
variety of ways, such as by returning a list of bytestrings, or by the
application being a generator function that yields bytestrings, or
by the application being a class whose instances are iterable.
Regardless of how it is accomplished, the application object must
always return an iterable yielding zero or more strings.
always return an iterable yielding zero or more bytestrings.
The server or gateway must transmit the yielded strings to the client
in an unbuffered fashion, completing the transmission of each string
The server or gateway must transmit the yielded bytestrings to the client
in an unbuffered fashion, completing the transmission of each bytestring
before requesting another one. (In other words, applications
**should** perform their own buffering. See the `Buffering and
Streaming`_ section below for more on how application output must be
handled.)
The server or gateway should treat the yielded strings as binary byte
The server or gateway should treat the yielded bytestrings as binary byte
sequences: in particular, it should ensure that line endings are
not altered. The application is responsible for ensuring that the
string(s) to be written are in a format suitable for the client. (The
bytestring(s) to be written are in a format suitable for the client. (The
server or gateway **may** apply HTTP transfer encodings, or perform
other transformations for the purpose of implementing HTTP features
such as byte-range transmission. See `Other HTTP Features`_, below,
@ -472,7 +519,7 @@ by the application. This protocol is intended to complement PEP 325's
generator support, and other common iterables with ``close()`` methods.
(Note: the application **must** invoke the ``start_response()``
callable before the iterable yields its first body string, so that the
callable before the iterable yields its first body bytestring, so that the
server can send the headers before any body content. However, this
invocation **may** be performed by the iterable's first iteration, so
servers **must not** assume that ``start_response()`` has been called
@ -565,7 +612,7 @@ have a fallback plan in the event such a variable is absent.
Note: missing variables (such as ``REMOTE_USER`` when no
authentication has occurred) should be left out of the ``environ``
dictionary. Also note that CGI-defined variables must be strings,
dictionary. Also note that CGI-defined variables must be native strings,
if they are present at all. It is a violation of this specification
for a CGI variable's value to be of any type other than ``str``.
@ -585,9 +632,9 @@ Variable Value
``"http"`` or ``"https"``, as appropriate.
``wsgi.input`` An input stream (file-like object) from which
the HTTP request body can be read. (The server
or gateway may perform reads on-demand as
requested by the application, or it may pre-
the HTTP request body bytes can be read. (The
server or gateway may perform reads on-demand
as requested by the application, or it may pre-
read the client's request body and buffer it
in-memory or on disk, or use any other
technique for providing such an input stream,
@ -602,6 +649,12 @@ Variable Value
ending, and assume that it will be converted to
the correct line ending by the server/gateway.
(On platforms where the ``str`` type is unicode,
the error stream **should** accept and log
arbitary unicode without raising an error; it
is allowed, however, to substitute characters
that cannot be rendered in the stream's encoding.)
For many servers, ``wsgi.errors`` will be the
server's main error log. Alternatively, this
may be ``sys.stderr``, or a log file of some
@ -745,7 +798,7 @@ headers, please see the `Other HTTP Features`_ section below.)
The ``start_response`` callable **must not** actually transmit the
response headers. Instead, it must store them for the server or
gateway to transmit **only** after the first iteration of the
application return value that yields a non-empty string, or upon
application return value that yields a non-empty bytestring, or upon
the application's first invocation of the ``write()`` callable. In
other words, response headers must not be sent until there is actual
body data available, or until the application's returned iterable is
@ -820,12 +873,12 @@ able to either generate a ``Content-Length`` header, or at least
avoid the need to close the client connection. If the application
does *not* call the ``write()`` callable, and returns an iterable
whose ``len()`` is 1, then the server can automatically determine
``Content-Length`` by taking the length of the first string yielded
``Content-Length`` by taking the length of the first bytestring yielded
by the iterable.
And, if the server and client both support HTTP/1.1 "chunked
encoding" [3]_, then the server **may** use chunked encoding to send
a chunk for each ``write()`` call or string yielded by the iterable,
a chunk for each ``write()`` call or bytestring yielded by the iterable,
thus generating a ``Content-Length`` header for each chunk. This
allows the server to keep the client connection alive, if it wishes
to do so. Note that the server **must** comply fully with RFC 2616
@ -850,7 +903,7 @@ transmitted all at once, along with the response headers.
The corresponding approach in WSGI is for the application to simply
return a single-element iterable (such as a list) containing the
response body as a single string. This is the recommended approach
response body as a single bytestring. This is the recommended approach
for the vast majority of application functions, that render
HTML pages whose text easily fits in memory.
@ -899,12 +952,12 @@ In order to better support asynchronous applications and servers,
middleware components **must not** block iteration waiting for
multiple values from an application iterable. If the middleware
needs to accumulate more data from the application before it can
produce any output, it **must** yield an empty string.
produce any output, it **must** yield an empty bytestring.
To put this requirement another way, a middleware component **must
yield at least one value** each time its underlying application
yields a value. If the middleware cannot yield any other value,
it must yield an empty string.
it must yield an empty bytestring.
This requirement ensures that asynchronous applications and servers
can conspire to reduce the number of threads that are required
@ -946,22 +999,22 @@ for web servers to interleave other tasks in the same Python thread,
potentially providing better throughput for the server as a whole.
The ``write()`` callable is returned by the ``start_response()``
callable, and it accepts a single parameter: a string to be
callable, and it accepts a single parameter: a bytestring to be
written as part of the HTTP response body, that is treated exactly
as though it had been yielded by the output iterable. In other
words, before ``write()`` returns, it must guarantee that the
passed-in string was either completely sent to the client, or
passed-in bytestring was either completely sent to the client, or
that it is buffered for transmission while the application
proceeds onward.
An application **must** return an iterable object, even if it
uses ``write()`` to produce all or part of its response body.
The returned iterable **may** be empty (i.e. yield no non-empty
strings), but if it *does* yield non-empty strings, that output
bytestrings), but if it *does* yield non-empty bytestrings, that output
must be treated normally by the server or gateway (i.e., it must be
sent or queued immediately). Applications **must not** invoke
``write()`` from within their return iterable, and therefore any
strings yielded by the iterable are transmitted after all strings
bytestrings yielded by the iterable are transmitted after all bytestrings
passed to ``write()`` have been sent to the client.
@ -970,9 +1023,9 @@ Unicode Issues
HTTP does not directly support Unicode, and neither does this
interface. All encoding/decoding must be handled by the application;
all strings passed to or from the server must be standard Python byte
strings, not Unicode objects. The result of using a Unicode object
where a string object is required, is undefined.
all strings passed to or from the server must be of type ``str`` or
``bytes``, never ``unicode``. The result of using a ``unicode``
object where a string object is required, is undefined.
Note also that strings passed to ``start_response()`` as a status or
as response headers **must** follow RFC 2616 with respect to encoding.
@ -980,7 +1033,7 @@ That is, they must either be ISO-8859-1 characters, or use RFC 2047
MIME encoding.
On Python platforms where the ``str`` or ``StringType`` type is in
fact Unicode-based (e.g. Jython, IronPython, Python 3000, etc.), all
fact Unicode-based (e.g. Jython, IronPython, Python 3, etc.), all
"strings" referred to in this specification must contain only
code points representable in ISO-8859-1 encoding (``\u0000`` through
``\u00FF``, inclusive). It is a fatal error for an application to
@ -988,12 +1041,18 @@ supply strings containing any other Unicode character or code point.
Similarly, servers and gateways **must not** supply
strings to an application containing any other Unicode characters.
Again, all strings referred to in this specification **must** be
of type ``str`` or ``StringType``, and **must not** be of type
``unicode`` or ``UnicodeType``. And, even if a given platform allows
for more than 8 bits per character in ``str``/``StringType`` objects,
only the lower 8 bits may be used, for any value referred to in
this specification as a "string".
Again, all objects referred to in this specification as "strings"
**must** be of type ``str`` or ``StringType``, and **must not** be
of type ``unicode`` or ``UnicodeType``. And, even if a given platform
allows for more than 8 bits per character in ``str``/``StringType``
objects, only the lower 8 bits may be used, for any value referred
to in this specification as a "string".
For values referred to in this specification as "bytestrings"
(i.e., values read from ``wsgi.input``, passed to ``write()``
or yielded by the application), the value **must** be of type
``bytes`` under Python 3, and ``str`` in earlier versions of
Python.
Error Handling
@ -1448,7 +1507,7 @@ Questions and Answers
``environ`` dictionary. This is the recommended approach for
offering any such value-added services.
2. Why can you call ``write()`` *and* yield strings/return an
2. Why can you call ``write()`` *and* yield bytestrings/return an
iterable? Shouldn't we pick just one way?
If we supported only the iteration approach, then current