WSGI is now Python 3-friendly. This does not cover the other planned
addenda/errata, and it may need more work even on these bits, but it is now begun. (Many thanks to Graham and Ian.)
This commit is contained in:
parent
c9e8207628
commit
71b23a6320
143
pep-0333.txt
143
pep-0333.txt
|
@ -142,6 +142,51 @@ callable was provided to it. Callables are only to be called, not
|
|||
introspected upon.
|
||||
|
||||
|
||||
A Note On String Types
|
||||
----------------------
|
||||
|
||||
In general, HTTP deals with bytes, which means that this specification
|
||||
is mostly about handling bytes.
|
||||
|
||||
However, the content of those bytes often has some kind of textual
|
||||
interpretation, and in Python, strings are the most convenient way
|
||||
to handle text.
|
||||
|
||||
But in many Python versions and implementations, strings are Unicode,
|
||||
rather than bytes. This requires a careful balance between a usable
|
||||
API and correct translations between bytes and text in the context of
|
||||
HTTP... especially to support porting code between Python
|
||||
implementations with different ``str`` types.
|
||||
|
||||
WSGI therefore defines two kinds of "string":
|
||||
|
||||
* "Native" strings (which are always implemented using the type
|
||||
named ``str``) that are used for request/response headers and
|
||||
metadata
|
||||
|
||||
* "Bytestrings" (which are implemented using the ``bytes`` type
|
||||
in Python 3, and ``str`` elsewhere), that are used for the bodies
|
||||
of requests and responses (e.g. POST/PUT input data and HTML page
|
||||
outputs).
|
||||
|
||||
Do not be confused however: even if Python's ``str`` type is actually
|
||||
Unicode "under the hood", the *content* of native strings must
|
||||
still be translatable to bytes via the Latin-1 encoding! (See
|
||||
the section on `Unicode Issues`_ later in this document for more
|
||||
details.)
|
||||
|
||||
In short: where you see the word "string" in this document, it refers
|
||||
to a "native" string, i.e., an object of type ``str``, whether it is
|
||||
internally implemented as bytes or unicode. Where you see references
|
||||
to "bytestring", this should be read as "an object of type ``bytes``
|
||||
under Python 3, or type ``str`` under Python 2".
|
||||
|
||||
And so, even though HTTP is in some sense "really just bytes", there
|
||||
are many API conveniences to be had by using whatever Python's
|
||||
default ``str`` type is.
|
||||
|
||||
|
||||
|
||||
The Application/Framework Side
|
||||
------------------------------
|
||||
|
||||
|
@ -164,13 +209,15 @@ support application developers.)
|
|||
Here are two example application objects; one is a function, and the
|
||||
other is a class::
|
||||
|
||||
# this would need to be a byte string in Python 3:
|
||||
HELLO_WORLD = "Hello world!\n"
|
||||
|
||||
def simple_app(environ, start_response):
|
||||
"""Simplest possible application object"""
|
||||
status = '200 OK'
|
||||
response_headers = [('Content-type', 'text/plain')]
|
||||
start_response(status, response_headers)
|
||||
return ['Hello world!\n']
|
||||
|
||||
return [HELLO_WORLD]
|
||||
|
||||
class AppClass:
|
||||
"""Produce the same output, but using a class
|
||||
|
@ -195,7 +242,7 @@ other is a class::
|
|||
status = '200 OK'
|
||||
response_headers = [('Content-type', 'text/plain')]
|
||||
self.start(status, response_headers)
|
||||
yield "Hello world!\n"
|
||||
yield HELLO_WORLD
|
||||
|
||||
|
||||
The Server/Gateway Side
|
||||
|
@ -243,7 +290,7 @@ server.
|
|||
sys.stdout.write('%s: %s\r\n' % header)
|
||||
sys.stdout.write('\r\n')
|
||||
|
||||
sys.stdout.write(data)
|
||||
sys.stdout.write(data) # TODO: this needs to be binary on Py3
|
||||
sys.stdout.flush()
|
||||
|
||||
def start_response(status, response_headers, exc_info=None):
|
||||
|
@ -326,7 +373,7 @@ a block boundary.)
|
|||
"""Transform iterated output to piglatin, if it's okay to do so
|
||||
|
||||
Note that the "okayness" can change until the application yields
|
||||
its first non-empty string, so 'transform_ok' has to be a mutable
|
||||
its first non-empty bytestring, so 'transform_ok' has to be a mutable
|
||||
truth value.
|
||||
"""
|
||||
|
||||
|
@ -341,7 +388,7 @@ a block boundary.)
|
|||
|
||||
def next(self):
|
||||
if self.transform_ok:
|
||||
return piglatin(self._next())
|
||||
return piglatin(self._next()) # call must be byte-safe on Py3
|
||||
else:
|
||||
return self._next()
|
||||
|
||||
|
@ -376,7 +423,7 @@ a block boundary.)
|
|||
|
||||
if transform_ok:
|
||||
def write_latin(data):
|
||||
write(piglatin(data))
|
||||
write(piglatin(data)) # call must be byte-safe on Py3
|
||||
return write_latin
|
||||
else:
|
||||
return write
|
||||
|
@ -426,7 +473,7 @@ It is used only when the application has trapped an error and is
|
|||
attempting to display an error message to the browser.
|
||||
|
||||
The ``start_response`` callable must return a ``write(body_data)``
|
||||
callable that takes one positional parameter: a string to be written
|
||||
callable that takes one positional parameter: a bytestring to be written
|
||||
as part of the HTTP response body. (Note: the ``write()`` callable is
|
||||
provided only to support certain existing frameworks' imperative output
|
||||
APIs; it should not be used by new applications or frameworks if it
|
||||
|
@ -434,24 +481,24 @@ can be avoided. See the `Buffering and Streaming`_ section for more
|
|||
details.)
|
||||
|
||||
When called by the server, the application object must return an
|
||||
iterable yielding zero or more strings. This can be accomplished in a
|
||||
variety of ways, such as by returning a list of strings, or by the
|
||||
application being a generator function that yields strings, or
|
||||
iterable yielding zero or more bytestrings. This can be accomplished in a
|
||||
variety of ways, such as by returning a list of bytestrings, or by the
|
||||
application being a generator function that yields bytestrings, or
|
||||
by the application being a class whose instances are iterable.
|
||||
Regardless of how it is accomplished, the application object must
|
||||
always return an iterable yielding zero or more strings.
|
||||
always return an iterable yielding zero or more bytestrings.
|
||||
|
||||
The server or gateway must transmit the yielded strings to the client
|
||||
in an unbuffered fashion, completing the transmission of each string
|
||||
The server or gateway must transmit the yielded bytestrings to the client
|
||||
in an unbuffered fashion, completing the transmission of each bytestring
|
||||
before requesting another one. (In other words, applications
|
||||
**should** perform their own buffering. See the `Buffering and
|
||||
Streaming`_ section below for more on how application output must be
|
||||
handled.)
|
||||
|
||||
The server or gateway should treat the yielded strings as binary byte
|
||||
The server or gateway should treat the yielded bytestrings as binary byte
|
||||
sequences: in particular, it should ensure that line endings are
|
||||
not altered. The application is responsible for ensuring that the
|
||||
string(s) to be written are in a format suitable for the client. (The
|
||||
bytestring(s) to be written are in a format suitable for the client. (The
|
||||
server or gateway **may** apply HTTP transfer encodings, or perform
|
||||
other transformations for the purpose of implementing HTTP features
|
||||
such as byte-range transmission. See `Other HTTP Features`_, below,
|
||||
|
@ -472,7 +519,7 @@ by the application. This protocol is intended to complement PEP 325's
|
|||
generator support, and other common iterables with ``close()`` methods.
|
||||
|
||||
(Note: the application **must** invoke the ``start_response()``
|
||||
callable before the iterable yields its first body string, so that the
|
||||
callable before the iterable yields its first body bytestring, so that the
|
||||
server can send the headers before any body content. However, this
|
||||
invocation **may** be performed by the iterable's first iteration, so
|
||||
servers **must not** assume that ``start_response()`` has been called
|
||||
|
@ -565,7 +612,7 @@ have a fallback plan in the event such a variable is absent.
|
|||
|
||||
Note: missing variables (such as ``REMOTE_USER`` when no
|
||||
authentication has occurred) should be left out of the ``environ``
|
||||
dictionary. Also note that CGI-defined variables must be strings,
|
||||
dictionary. Also note that CGI-defined variables must be native strings,
|
||||
if they are present at all. It is a violation of this specification
|
||||
for a CGI variable's value to be of any type other than ``str``.
|
||||
|
||||
|
@ -585,9 +632,9 @@ Variable Value
|
|||
``"http"`` or ``"https"``, as appropriate.
|
||||
|
||||
``wsgi.input`` An input stream (file-like object) from which
|
||||
the HTTP request body can be read. (The server
|
||||
or gateway may perform reads on-demand as
|
||||
requested by the application, or it may pre-
|
||||
the HTTP request body bytes can be read. (The
|
||||
server or gateway may perform reads on-demand
|
||||
as requested by the application, or it may pre-
|
||||
read the client's request body and buffer it
|
||||
in-memory or on disk, or use any other
|
||||
technique for providing such an input stream,
|
||||
|
@ -602,6 +649,12 @@ Variable Value
|
|||
ending, and assume that it will be converted to
|
||||
the correct line ending by the server/gateway.
|
||||
|
||||
(On platforms where the ``str`` type is unicode,
|
||||
the error stream **should** accept and log
|
||||
arbitary unicode without raising an error; it
|
||||
is allowed, however, to substitute characters
|
||||
that cannot be rendered in the stream's encoding.)
|
||||
|
||||
For many servers, ``wsgi.errors`` will be the
|
||||
server's main error log. Alternatively, this
|
||||
may be ``sys.stderr``, or a log file of some
|
||||
|
@ -745,7 +798,7 @@ headers, please see the `Other HTTP Features`_ section below.)
|
|||
The ``start_response`` callable **must not** actually transmit the
|
||||
response headers. Instead, it must store them for the server or
|
||||
gateway to transmit **only** after the first iteration of the
|
||||
application return value that yields a non-empty string, or upon
|
||||
application return value that yields a non-empty bytestring, or upon
|
||||
the application's first invocation of the ``write()`` callable. In
|
||||
other words, response headers must not be sent until there is actual
|
||||
body data available, or until the application's returned iterable is
|
||||
|
@ -820,12 +873,12 @@ able to either generate a ``Content-Length`` header, or at least
|
|||
avoid the need to close the client connection. If the application
|
||||
does *not* call the ``write()`` callable, and returns an iterable
|
||||
whose ``len()`` is 1, then the server can automatically determine
|
||||
``Content-Length`` by taking the length of the first string yielded
|
||||
``Content-Length`` by taking the length of the first bytestring yielded
|
||||
by the iterable.
|
||||
|
||||
And, if the server and client both support HTTP/1.1 "chunked
|
||||
encoding" [3]_, then the server **may** use chunked encoding to send
|
||||
a chunk for each ``write()`` call or string yielded by the iterable,
|
||||
a chunk for each ``write()`` call or bytestring yielded by the iterable,
|
||||
thus generating a ``Content-Length`` header for each chunk. This
|
||||
allows the server to keep the client connection alive, if it wishes
|
||||
to do so. Note that the server **must** comply fully with RFC 2616
|
||||
|
@ -850,7 +903,7 @@ transmitted all at once, along with the response headers.
|
|||
|
||||
The corresponding approach in WSGI is for the application to simply
|
||||
return a single-element iterable (such as a list) containing the
|
||||
response body as a single string. This is the recommended approach
|
||||
response body as a single bytestring. This is the recommended approach
|
||||
for the vast majority of application functions, that render
|
||||
HTML pages whose text easily fits in memory.
|
||||
|
||||
|
@ -899,12 +952,12 @@ In order to better support asynchronous applications and servers,
|
|||
middleware components **must not** block iteration waiting for
|
||||
multiple values from an application iterable. If the middleware
|
||||
needs to accumulate more data from the application before it can
|
||||
produce any output, it **must** yield an empty string.
|
||||
produce any output, it **must** yield an empty bytestring.
|
||||
|
||||
To put this requirement another way, a middleware component **must
|
||||
yield at least one value** each time its underlying application
|
||||
yields a value. If the middleware cannot yield any other value,
|
||||
it must yield an empty string.
|
||||
it must yield an empty bytestring.
|
||||
|
||||
This requirement ensures that asynchronous applications and servers
|
||||
can conspire to reduce the number of threads that are required
|
||||
|
@ -946,22 +999,22 @@ for web servers to interleave other tasks in the same Python thread,
|
|||
potentially providing better throughput for the server as a whole.
|
||||
|
||||
The ``write()`` callable is returned by the ``start_response()``
|
||||
callable, and it accepts a single parameter: a string to be
|
||||
callable, and it accepts a single parameter: a bytestring to be
|
||||
written as part of the HTTP response body, that is treated exactly
|
||||
as though it had been yielded by the output iterable. In other
|
||||
words, before ``write()`` returns, it must guarantee that the
|
||||
passed-in string was either completely sent to the client, or
|
||||
passed-in bytestring was either completely sent to the client, or
|
||||
that it is buffered for transmission while the application
|
||||
proceeds onward.
|
||||
|
||||
An application **must** return an iterable object, even if it
|
||||
uses ``write()`` to produce all or part of its response body.
|
||||
The returned iterable **may** be empty (i.e. yield no non-empty
|
||||
strings), but if it *does* yield non-empty strings, that output
|
||||
bytestrings), but if it *does* yield non-empty bytestrings, that output
|
||||
must be treated normally by the server or gateway (i.e., it must be
|
||||
sent or queued immediately). Applications **must not** invoke
|
||||
``write()`` from within their return iterable, and therefore any
|
||||
strings yielded by the iterable are transmitted after all strings
|
||||
bytestrings yielded by the iterable are transmitted after all bytestrings
|
||||
passed to ``write()`` have been sent to the client.
|
||||
|
||||
|
||||
|
@ -970,9 +1023,9 @@ Unicode Issues
|
|||
|
||||
HTTP does not directly support Unicode, and neither does this
|
||||
interface. All encoding/decoding must be handled by the application;
|
||||
all strings passed to or from the server must be standard Python byte
|
||||
strings, not Unicode objects. The result of using a Unicode object
|
||||
where a string object is required, is undefined.
|
||||
all strings passed to or from the server must be of type ``str`` or
|
||||
``bytes``, never ``unicode``. The result of using a ``unicode``
|
||||
object where a string object is required, is undefined.
|
||||
|
||||
Note also that strings passed to ``start_response()`` as a status or
|
||||
as response headers **must** follow RFC 2616 with respect to encoding.
|
||||
|
@ -980,7 +1033,7 @@ That is, they must either be ISO-8859-1 characters, or use RFC 2047
|
|||
MIME encoding.
|
||||
|
||||
On Python platforms where the ``str`` or ``StringType`` type is in
|
||||
fact Unicode-based (e.g. Jython, IronPython, Python 3000, etc.), all
|
||||
fact Unicode-based (e.g. Jython, IronPython, Python 3, etc.), all
|
||||
"strings" referred to in this specification must contain only
|
||||
code points representable in ISO-8859-1 encoding (``\u0000`` through
|
||||
``\u00FF``, inclusive). It is a fatal error for an application to
|
||||
|
@ -988,12 +1041,18 @@ supply strings containing any other Unicode character or code point.
|
|||
Similarly, servers and gateways **must not** supply
|
||||
strings to an application containing any other Unicode characters.
|
||||
|
||||
Again, all strings referred to in this specification **must** be
|
||||
of type ``str`` or ``StringType``, and **must not** be of type
|
||||
``unicode`` or ``UnicodeType``. And, even if a given platform allows
|
||||
for more than 8 bits per character in ``str``/``StringType`` objects,
|
||||
only the lower 8 bits may be used, for any value referred to in
|
||||
this specification as a "string".
|
||||
Again, all objects referred to in this specification as "strings"
|
||||
**must** be of type ``str`` or ``StringType``, and **must not** be
|
||||
of type ``unicode`` or ``UnicodeType``. And, even if a given platform
|
||||
allows for more than 8 bits per character in ``str``/``StringType``
|
||||
objects, only the lower 8 bits may be used, for any value referred
|
||||
to in this specification as a "string".
|
||||
|
||||
For values referred to in this specification as "bytestrings"
|
||||
(i.e., values read from ``wsgi.input``, passed to ``write()``
|
||||
or yielded by the application), the value **must** be of type
|
||||
``bytes`` under Python 3, and ``str`` in earlier versions of
|
||||
Python.
|
||||
|
||||
|
||||
Error Handling
|
||||
|
@ -1448,7 +1507,7 @@ Questions and Answers
|
|||
``environ`` dictionary. This is the recommended approach for
|
||||
offering any such value-added services.
|
||||
|
||||
2. Why can you call ``write()`` *and* yield strings/return an
|
||||
2. Why can you call ``write()`` *and* yield bytestrings/return an
|
||||
iterable? Shouldn't we pick just one way?
|
||||
|
||||
If we supported only the iteration approach, then current
|
||||
|
|
Loading…
Reference in New Issue