1035 lines
45 KiB
Plaintext
1035 lines
45 KiB
Plaintext
PEP: 333
|
||
Title: Python Web Server Gateway Interface v1.0
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Phillip J. Eby <pje@telecommunity.com>
|
||
Discussions-To: Python Web-SIG <web-sig@python.org>
|
||
Status: Draft
|
||
Type: Informational
|
||
Content-Type: text/x-rst
|
||
Created: 07-Dec-2003
|
||
Post-History: 07-Dec-2003, 08-Aug-2004, 20-Aug-2004, 27-Aug-2004
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
This document specifies a proposed standard interface between web
|
||
servers and Python web applications or frameworks, to promote web
|
||
application portability across a variety of web servers.
|
||
|
||
|
||
Rationale and Goals
|
||
===================
|
||
|
||
Python currently boasts a wide variety of web application frameworks,
|
||
such as Zope, Quixote, Webware, SkunkWeb, PSO, and Twisted Web -- to
|
||
name just a few [1]_. This wide variety of choices can be a problem
|
||
for new Python users, because generally speaking, their choice of web
|
||
framework will limit their choice of usable web servers, and vice
|
||
versa.
|
||
|
||
By contrast, although Java has just as many web application frameworks
|
||
available, Java's "servlet" API makes it possible for applications
|
||
written with any Java web application framework to run in any web
|
||
server that supports the servlet API.
|
||
|
||
The availability and widespread use of such an API in web servers for
|
||
Python -- whether those servers are written in Python (e.g. Medusa),
|
||
embed Python (e.g. mod_python), or invoke Python via a gateway
|
||
protocol (e.g. CGI, FastCGI, etc.) -- would separate choice of
|
||
framework from choice of web server, freeing users to choose a pairing
|
||
that suits them, while freeing framework and server developers to
|
||
focus on their preferred area of specialization.
|
||
|
||
This PEP, therefore, proposes a simple and universal interface between
|
||
web servers and web applications or frameworks: the Python Web Server
|
||
Gateway Interface (WSGI).
|
||
|
||
But the mere existence of a WSGI spec does nothing to address the
|
||
existing state of servers and frameworks for Python web applications.
|
||
Server and framework authors and maintainers must actually implement
|
||
WSGI for there to be any effect.
|
||
|
||
However, since no existing servers or frameworks support WSGI, there
|
||
is little immediate reward for an author who implements WSGI support.
|
||
Thus, WSGI *must* be easy to implement, so that an author's initial
|
||
investment in the interface can be reasonably low.
|
||
|
||
Thus, simplicity of implementation on *both* the server and framework
|
||
sides of the interface is absolutely critical to the utility of the
|
||
WSGI interface, and is therefore the principal criterion for any
|
||
design decisions.
|
||
|
||
Note, however, that simplicity of implementation for a framework
|
||
author is not the same thing as ease of use for a web application
|
||
author. WSGI presents an absolutely "no frills" interface to the
|
||
framework author, because bells and whistles like response objects and
|
||
cookie handling would just get in the way of existing frameworks'
|
||
handling of these issues. Again, the goal of WSGI is to facilitate
|
||
easy interconnection of existing servers and applications or
|
||
frameworks, not to create a new web framework.
|
||
|
||
Note also that this goal precludes WSGI from requiring anything that
|
||
is not already available in deployed versions of Python. Therefore,
|
||
new standard library modules are not proposed or required by this
|
||
specification, and nothing in WSGI requires a Python version greater
|
||
than 2.2.2. (It would be a good idea, however, for future versions
|
||
of Python to include support for this interface in web servers
|
||
provided by the standard library.)
|
||
|
||
In addition to ease of implementation for existing and future
|
||
frameworks and servers, it should also be easy to create request
|
||
preprocessors, response postprocessors, and other WSGI-based
|
||
"middleware" components that look like an application to their
|
||
containing server, while acting as a server for their contained
|
||
applications.
|
||
|
||
If middleware can be both simple and robust, and WSGI is widely
|
||
available in servers and frameworks, it allows for the possibility
|
||
of an entirely new kind of Python web application framework: one
|
||
consisting of loosely-coupled WSGI middleware components. Indeed,
|
||
existing framework authors may even choose to refactor their
|
||
frameworks' existing services to be provided in this way, becoming
|
||
more like libraries used with WSGI, and less like monolithic
|
||
frameworks. This would then allow application developers to choose
|
||
"best-of-breed" components for specific functionality, rather than
|
||
having to commit to all the pros and cons of a single framework.
|
||
|
||
Of course, as of this writing, that day is doubtless quite far off.
|
||
In the meantime, it is a sufficient short-term goal for WSGI to
|
||
enable the use of any framework with any server.
|
||
|
||
Finally, it should be mentioned that the current version of WSGI
|
||
does not prescribe any particular mechanism for "deploying" an
|
||
application for use with a web server or server gateway. At the
|
||
present time, this is necessarily implementation-defined by the
|
||
server or gateway. After a sufficient number of servers and
|
||
frameworks have implemented WSGI to provide field experience with
|
||
varying deployment requirements, it may make sense to create
|
||
another PEP, describing a deployment standard for WSGI servers and
|
||
application frameworks.
|
||
|
||
|
||
Specification Overview
|
||
======================
|
||
|
||
The WSGI interface has two sides: the "server" or "gateway" side, and
|
||
the "application" or "framework" side. The server side invokes a
|
||
callable object that is provided by the application side. The
|
||
specifics of how that object is provided are up to the server or
|
||
gateway. It is assumed that some servers or gateways will require an
|
||
application's deployer to write a short script to create an instance
|
||
of the server or gateway, and supply it with the application object.
|
||
Other servers and gateways may use configuration files or other
|
||
mechanisms to specify where an application object should be
|
||
imported from, or otherwise obtained.
|
||
|
||
The application object is simply a callable object that accepts
|
||
two arguments. The term "object" should not be misconstrued as
|
||
requiring an actual object instance: a function, method, class,
|
||
or instance with a ``__call__`` method are all acceptable for
|
||
use as an application object.
|
||
|
||
(Note: although we refer to it as an "application" object, this
|
||
should not be construed to mean that application developers will use
|
||
WSGI as a web programming API! It is assumed that application
|
||
developers will continue to use existing, high-level framework
|
||
services to develop their applications. WSGI is a tool for
|
||
framework and server developers, and is not intended to directly
|
||
support application developers.)
|
||
|
||
Here are two example application objects; one is a function, and the
|
||
other is a class::
|
||
|
||
def simple_app(environ, start_response):
|
||
"""Simplest possible application object"""
|
||
status = '200 OK'
|
||
headers = [('Content-type','text/plain')]
|
||
write = start_response(status, headers)
|
||
write('Hello world!\n')
|
||
|
||
|
||
class AppClass:
|
||
"""Much the same thing, but as a class
|
||
|
||
(Note: 'AppClass' is the "application", so calling it
|
||
returns an instance of 'AppClass', which is the iterable
|
||
return value of the "application callable", as required
|
||
by the spec.)
|
||
"""
|
||
|
||
def __init__(self, environ, start_response):
|
||
self.environ = environ
|
||
self.start = start_response
|
||
|
||
def __iter__(self):
|
||
status = '200 OK'
|
||
headers = [('Content-type','text/plain')]
|
||
self.start(status, headers)
|
||
|
||
yield "Hello world!\n"
|
||
for i in range(1,11):
|
||
yield "Extra line %s\n" % i
|
||
|
||
Throughout this specification, we will use the term "a callable" to
|
||
mean "a function, method, class, or an instance with a ``__call__``
|
||
method". It is up to the server, gateway, or application implementing
|
||
the callable to choose the appropriate implementation technique for
|
||
their needs. Conversely, a server, gateway, or application that is
|
||
invoking a callable must *not* have any dependency on what kind of
|
||
callable was provided to it. Callables are only to be called, not
|
||
introspected upon.
|
||
|
||
The server or gateway invokes the application callable once for each
|
||
request it receives from an HTTP client, that is directed at the
|
||
application. To illustrate, here is a simple CGI gateway, implemented
|
||
as a function taking an application object (all error handling
|
||
omitted)::
|
||
|
||
import os, sys
|
||
|
||
def run_with_cgi(application):
|
||
|
||
environ = {}
|
||
environ.update(os.environ)
|
||
environ['wsgi.input'] = sys.stdin
|
||
environ['wsgi.errors'] = sys.stderr
|
||
environ['wsgi.version'] = (1,0)
|
||
environ['wsgi.multithread'] = False
|
||
environ['wsgi.multiprocess'] = True
|
||
environ['wsgi.last_call'] = True
|
||
|
||
def start_response(status,headers):
|
||
write = sys.stdout.write
|
||
write("Status: %s\r\n" % status)
|
||
for key,val in headers:
|
||
write("%s: %s\r\n" % (key,val))
|
||
write("\r\n")
|
||
return write
|
||
|
||
result = application(environ, start_response)
|
||
if result is not None:
|
||
try:
|
||
for data in result:
|
||
sys.stdout.write(data)
|
||
finally:
|
||
if hasattr(result,'close'):
|
||
result.close()
|
||
|
||
In the next section, we will specify the precise semantics that
|
||
these illustrations are examples of.
|
||
|
||
|
||
Specification Details
|
||
=====================
|
||
|
||
The application object must accept two positional arguments. For
|
||
the sake of illustration, we have named them ``environ`` and
|
||
``start_response``, but they are not required to have these names.
|
||
A server or gateway *must* invoke the application object using
|
||
positional (not keyword) arguments. (E.g. by calling
|
||
``result = application(environ,start_response)`` as shown above.)
|
||
|
||
The ``environ`` parameter is a dictionary object, containing CGI-style
|
||
environment variables. This object *must* be a builtin Python
|
||
dictionary (*not* a subclass, ``UserDict`` or other dictionary
|
||
emulation), and the application is allowed to modify the dictionary
|
||
in any way it desires. The dictionary must also include certain
|
||
WSGI-required variables (described in a later section), and may
|
||
also include server-specific extension variables, named according
|
||
to a convention that will be described below.
|
||
|
||
The ``start_response`` parameter is a callable accepting two
|
||
positional arguments. For the sake of illustration, we have named
|
||
them ``status`` and ``headers``, but they are not required to have
|
||
these names, and the application *must* invoke the ``start_response``
|
||
callable using positional arguments
|
||
(e.g. ``start_response(status,headers)``).
|
||
|
||
The ``status`` parameter is a status string of the form
|
||
``"999 Message here"``, and a list of ``(header_name,header_value)``
|
||
tuples describing the HTTP response header. This ``start_response``
|
||
callable must return a ``write(body_data)`` callable that takes one
|
||
positional parameter: a string to be written as part of the HTTP
|
||
response body.
|
||
|
||
The application object may return either ``None`` (indicating that
|
||
there is no additional output), or it may return a non-empty
|
||
iterable yielding strings. (For example, it could be a
|
||
generator-iterator that yields strings, or it could be a
|
||
sequence such as a list of strings.) The server or gateway will
|
||
treat the strings yielded by the iterable as if they had been
|
||
passed to the ``write()`` method. If a call to ``len(iterable)``
|
||
succeeds, the server must be able to rely on the result being
|
||
accurate. That is, if the iterable returned by the application
|
||
provides a working ``__len__()`` method, it *must* return an
|
||
accurate result.
|
||
|
||
If the returned iterable has a ``fileno`` attribute, the server *may*
|
||
assume that this is a ``fileno()`` method returning an operating
|
||
system file descriptor, and that it is allowed to read directly from
|
||
that descriptor up to the end of the file, and/or use any appropriate
|
||
operating system facilities (e.g. the ``sendfile()`` system call) to
|
||
transmit the file's contents. If the server does this, it must begin
|
||
transmission with the file's current position, and end at the end of
|
||
the file.
|
||
|
||
Finally, if the application returned an iterable, and the iterable has
|
||
a ``close()`` method, the server or gateway *must* call that method
|
||
upon completion of the current request, whether the request was
|
||
completed normally, or terminated early due to an error. (This is to
|
||
support resource release by the application. This protocol is
|
||
intended to support PEP 325, and also the simple case of an
|
||
application returning an open text file.)
|
||
|
||
(Note: the application *must* invoke the ``start_response()`` callable
|
||
before the iterable yields its first body string, so that the server
|
||
can send headers before any body content. However, this invocation
|
||
*may* be performed by the iterable's first iteration, so servers *must
|
||
not* assume that ``start_response()`` has been called before they
|
||
begin iterating over the iterable.)
|
||
|
||
|
||
``environ`` Variables
|
||
---------------------
|
||
|
||
The ``environ`` dictionary is required to contain these CGI
|
||
environment variables, as defined by the Common Gateway Interface
|
||
specification [2]_. The following variables *must* be present, but
|
||
*may* be an empty string, if there is no more appropriate value for
|
||
them:
|
||
|
||
* ``REQUEST_METHOD``
|
||
|
||
* ``SCRIPT_NAME`` (The initial portion of the request URL's "path"
|
||
that corresponds to the application object, so that the application
|
||
knows its virtual "location".)
|
||
|
||
* ``PATH_INFO`` (The remainder of the request URL's "path",
|
||
designating the virtual "location" of the request's target within
|
||
the application)
|
||
|
||
* ``QUERY_STRING``
|
||
|
||
* ``CONTENT_TYPE``
|
||
|
||
* ``CONTENT_LENGTH``
|
||
|
||
* ``SERVER_NAME`` and ``SERVER_PORT`` (which, when combined with
|
||
``SCRIPT_NAME`` and ``PATH_INFO``, should complete the URL. Note,
|
||
however, that ``HTTP_HOST``, if present, should be used in
|
||
preference to ``SERVER_NAME`` for constructing the URL. See the
|
||
`URL Reconstruction`_ section below for more detail.)
|
||
|
||
* Variables corresponding to the client-supplied HTTP headers (i.e.,
|
||
variables whose names begin with ``"HTTP_"``).
|
||
|
||
In general, a server or gateway should attempt to provide as many
|
||
other CGI variables as are applicable, including e.g. the nonstandard
|
||
SSL variables such as ``HTTPS=on``, if an SSL connection is in effect.
|
||
However, an application that uses any variables other than the ones
|
||
listed above are necessarily non-portable to web servers that do not
|
||
support the relevant extensions.
|
||
|
||
A WSGI-compliant server or gateway *should* document what variables it
|
||
provides, along with their definitions as appropriate. Applications
|
||
*should* check for the presence of any nonstandard variables they
|
||
require, and have a fallback plan in the event such a variable is
|
||
absent.
|
||
|
||
Note: missing variables (such as ``REMOTE_USER`` when no
|
||
authentication has occurred) should be left out of the ``environ``
|
||
dictionary. Also note that CGI-defined variables must be strings,
|
||
if they are present at all. It is a violation of this specification
|
||
for a CGI variable's value to be of any type other than ``str``.
|
||
|
||
In addition to the CGI-defined variables, the ``environ`` dictionary
|
||
must also contain the following WSGI-defined variables:
|
||
|
||
===================== ===============================================
|
||
Variable Value
|
||
===================== ===============================================
|
||
``wsgi.version`` The tuple ``(1,0)``, representing WSGI
|
||
version 1.0.
|
||
|
||
``wsgi.input`` An input stream from which the HTTP request
|
||
body can be read. (The server or gateway may
|
||
perform reads on-demand as requested by the
|
||
application, or it may pre-read the client's
|
||
request body and buffer it in-memory or on
|
||
disk, or use any other technique for providing
|
||
such an input stream, according to its
|
||
preference.)
|
||
|
||
``wsgi.errors`` An output stream to which error output can be
|
||
written, for the purpose of recording program
|
||
or other errors in a standardized and possibly
|
||
centralized location. For many servers, this
|
||
will be the server's main error log.
|
||
|
||
Alternatively, this may be ``sys.stderr``, or
|
||
a log file of some sort. The server's
|
||
documentation should include an explanation of
|
||
how to configure this or where to find the
|
||
recorded output. A server or gateway may
|
||
supply different error streams to different
|
||
applications, if this is desired.
|
||
|
||
``wsgi.multithread`` This value should be true if the application
|
||
object may be simultaneously invoked by another
|
||
thread in the same process, and false
|
||
otherwise.
|
||
|
||
``wsgi.multiprocess`` This value should be true if an equivalent
|
||
application object may be simultaneously
|
||
invoked by another process, and false
|
||
otherwise.
|
||
|
||
``wsgi.run_once`` This value should be true if the server/gateway
|
||
expects (but does not guarantee!) that the
|
||
application will only be invoked this one time
|
||
during the life of its containing process.
|
||
Normally, this will only be true for a gateway
|
||
based on CGI (or something similar).
|
||
===================== ===============================================
|
||
|
||
Finally, the ``environ`` dictionary may also contain server-defined
|
||
variables. These variables should be named using only lower-case
|
||
letters, numbers, dots, and underscores, and should be prefixed with
|
||
a name that is unique to the defining server or gateway. For
|
||
example, ``mod_python`` might define variables with names like
|
||
``mod_python.some_variable``.
|
||
|
||
|
||
Input and Error Streams
|
||
~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
The input and error streams provided by the server must support
|
||
the following methods:
|
||
|
||
=================== ========== ========
|
||
Method Stream Notes
|
||
=================== ========== ========
|
||
``read(size)`` ``input`` 1
|
||
``readline()`` ``input`` 1,2
|
||
``readlines(hint)`` ``input`` 1,3
|
||
``__iter__()`` ``input``
|
||
``flush()`` ``errors`` 4
|
||
``write(str)`` ``errors``
|
||
``writelines(seq)`` ``errors``
|
||
=================== ========== ========
|
||
|
||
The semantics of each method are as documented in the Python Library
|
||
Reference, except for these notes as listed in the table above:
|
||
|
||
1. The server is not required to read past the client's specified
|
||
``Content-Length``, and is allowed to simulate an end-of-file
|
||
condition if the application attempts to read past that point.
|
||
The application *should not* attempt to read more data than is
|
||
specified by the ``CONTENT_LENGTH`` variable.
|
||
|
||
2. The optional "size" argument to ``readline()`` is not supported,
|
||
as it may be complex for server authors to implement, and is not
|
||
often used in practice.
|
||
|
||
3. Note that the ``hint`` argument to ``readlines()`` is optional for
|
||
both caller and implementer. The application is free not to
|
||
supply it, and the server or gateway is free to ignore it.
|
||
|
||
4. Since the ``errors`` stream may not be rewound, a container is
|
||
free to forward write operations immediately, without buffering.
|
||
In this case, the ``flush()`` method may be a no-op. Portable
|
||
applications, however, cannot assume that output is unbuffered
|
||
or that ``flush()`` is a no-op. They must call ``flush()`` if
|
||
they need to ensure that output has in fact been written. (For
|
||
example, to minimize intermingling of data from multiple processes
|
||
writing to the same error log.)
|
||
|
||
The methods listed in the table above *must* be supported by all
|
||
servers conforming to this specification. Applications conforming
|
||
to this specification *must not* use any other methods or attributes
|
||
of the ``input`` or ``errors`` objects. In particular, applications
|
||
*must not* attempt to close these streams, even if they possess
|
||
``close()`` methods.
|
||
|
||
|
||
The ``start_response()`` Callable
|
||
---------------------------------
|
||
|
||
The second parameter passed to the application object is itself a
|
||
two-argument callable, of the form ``start_response(status,headers)``.
|
||
(As with all WSGI callables, the arguments must be supplied
|
||
positionally, not by keyword.) The ``start_response`` callable is
|
||
used to begin the HTTP response, and it must return a
|
||
``write(body_data)`` callable.
|
||
|
||
The ``status`` argument is an HTTP "status" string like ``"200 OK"``
|
||
or ``"404 Not Found"``. The string *must* be pure 7-bit ASCII,
|
||
containing no control characters. It must not be terminated with
|
||
a carriage return or linefeed.
|
||
|
||
The ``headers`` argument is a sequence of
|
||
``(header_name,header_value)`` tuples. Each ``header_name`` must be a
|
||
valid HTTP header name, without a trailing colon or other punctuation.
|
||
Each ``header_value`` *must not* include *any* control characters,
|
||
including carriage returns or linefeeds, either embedded or at the
|
||
end. (These requirements are to minimize the complexity of any
|
||
parsing that must be performed by servers, gateways, and intermediate
|
||
response processors that need to inspect or modify response headers.)
|
||
|
||
In general, the server or gateway is responsible for ensuring that
|
||
correct headers are sent to the client: if the application omits
|
||
a needed header, the server or gateway *should* add it. For example,
|
||
the HTTP ``Date:`` and ``Server:`` headers would normally be supplied
|
||
by the server or gateway. If the application supplies a header that
|
||
the server would ordinarily supply, or that contradicts the server's
|
||
intended behavior (e.g. supplying a different ``Connection:`` header),
|
||
the server or gateway *may* discard the conflicting header, provided
|
||
that its action is recorded for the benefit of the application author.
|
||
|
||
(A reminder for server/gateway authors: HTTP header names are
|
||
case-insensitive, so be sure to take that into consideration when
|
||
examining application-supplied headers!)
|
||
|
||
|
||
Handling the ``Content-Length`` Header
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
If the application does not supply a ``Content-Length`` header, a
|
||
server or gateway may choose one of several approaches to handling
|
||
it. The simplest of these is to close the client connection when
|
||
the response is completed.
|
||
|
||
Under some circumstances, however, the server or gateway may be
|
||
able to either generate a ``Content-Length`` header, or at least
|
||
avoid the need to close the client connection. If the application
|
||
does *not* call the ``write()`` callable, and returns an iterable
|
||
whose ``len()`` is 1, then the server can automatically determine
|
||
``Content-Length`` by taking the length of the first string yielded
|
||
by the iterable.
|
||
|
||
And, if the server and client both support HTTP/1.1 "chunked
|
||
encoding" [3]_, then the server *may* use chunked encoding to send
|
||
a chunk for each ``write()`` call or string yielded by the iterable,
|
||
thus generating a ``Content-Length`` header for each chunk. This
|
||
allows the server to keep the client connection alive, if it wishes
|
||
to do so. Note that the server *must* comply fully with RFC 2616 when
|
||
doing this, or else fall back to one of the other strategies for
|
||
dealing with the absence of ``Content-Length``.
|
||
|
||
|
||
The ``write()`` Callable
|
||
------------------------
|
||
|
||
The return value of the ``start_response()`` callable is a
|
||
one-argument `write()`` callable, that accepts strings to write as
|
||
part of the HTTP response body. The server or gateway must
|
||
not modify supplied strings in any way; they must be treated
|
||
as binary byte sequences with no character interpretation, line
|
||
ending changes, or other modification. The application is responsible
|
||
for ensuring that the string(s) to be written are in a format suitable
|
||
for the client.
|
||
|
||
Note that the purpose of the ``write()`` callable is primarily to
|
||
support existing application frameworks that support a streaming
|
||
"push" API. Therefore, strings passed to ``write()`` *must* be sent
|
||
to the client *as soon as possible*; they must *not* be buffered
|
||
unless the buffer will be emptied in parallel with the application's
|
||
continuing execution (e.g. by a separate I/O thread). If the server
|
||
or gateway does not have a separate I/O thread available, it *must*
|
||
finish writing the supplied string before it returns from each
|
||
``write()`` invocation.
|
||
|
||
If the application returns an iterable, each string produced by the
|
||
iterable must be treated as though it had been passed to ``write()``,
|
||
with the data sent in an "as soon as possible" manner. That is,
|
||
the iterable should not be asked for a new string until the previous
|
||
string has been sent to the client, or is buffered for such sending
|
||
by a parallel thread.
|
||
|
||
Notice that these rules discourage the generation of content before a
|
||
client is ready for it, in excess of the buffer sizes provided by the
|
||
server and operating system. For this reason, some applications may
|
||
wish to buffer data internally before passing any of it to ``write()``
|
||
or yielding it from an iterator, in order to avoid waiting for the
|
||
client to catch up with their output. This approach may yield better
|
||
throughput for dynamically generated pages of moderate size, since the
|
||
application is then freed for other tasks.
|
||
|
||
In addition to improved performance, buffering all of an application's
|
||
output has an advantage for error handling: the buffered output can
|
||
be thrown away and replaced by an error page, rather than dumping an
|
||
error message in the middle of some partially-completed output. For
|
||
this and other reasons, many existing Python frameworks already
|
||
accumulate their output for a single write, unless the application
|
||
explicitly requests streaming, or the expected output is larger than
|
||
practical for buffering (e.g. multi-megabyte PDFs). So, these
|
||
application frameworks are already a natural fit for the WSGI
|
||
streaming model: for most requests they will only call ``write()``
|
||
once anyway!
|
||
|
||
|
||
Implementation/Application Notes
|
||
================================
|
||
|
||
Unicode
|
||
-------
|
||
|
||
HTTP does not directly support Unicode, and neither does this
|
||
interface. All encoding/decoding must be handled by the application;
|
||
all strings and streams passed to or from the server must be standard
|
||
Python byte strings, not Unicode objects. The result of using a
|
||
Unicode object where a string object is required, is undefined.
|
||
|
||
|
||
Multiple Invocations
|
||
--------------------
|
||
|
||
Application objects must be able to be invoked more than once, since
|
||
virtually all servers/gateways will make such requests.
|
||
|
||
|
||
Error Handling
|
||
--------------
|
||
|
||
Servers *should* trap and log exceptions raised by
|
||
applications, and *may* continue to execute, or attempt to shut down
|
||
gracefully. Applications *should* avoid allowing exceptions to
|
||
escape their execution scope, since the result of uncaught exceptions
|
||
is server-defined.
|
||
|
||
|
||
Thread Support
|
||
--------------
|
||
|
||
Thread support, or lack thereof, is also server-dependent.
|
||
Servers that can run multiple requests in parallel, *should* also
|
||
provide the option of running an application in a single-threaded
|
||
fashion, so that applications or frameworks that are not thread-safe
|
||
may still be used with that server.
|
||
|
||
|
||
URL Reconstruction
|
||
------------------
|
||
|
||
If an application wishes to reconstruct a request's complete URL, it
|
||
may do so using the following algorithm, contributed by Ian Bicking::
|
||
|
||
if environ.get('HTTPS') == 'on':
|
||
url = 'https://'
|
||
else:
|
||
url = 'http://'
|
||
|
||
if environ.get('HTTP_HOST'):
|
||
url += environ['HTTP_HOST']
|
||
else:
|
||
url += environ['SERVER_NAME']
|
||
|
||
if environ.get('HTTPS') == 'on':
|
||
if environ['SERVER_PORT'] != '443'
|
||
url += ':' + environ['SERVER_PORT']
|
||
else:
|
||
if environ['SERVER_PORT'] != '80':
|
||
url += ':' + environ['SERVER_PORT']
|
||
|
||
url += environ['SCRIPT_NAME']
|
||
url += environ['PATH_INFO']
|
||
if environ.get('QUERY_STRING'):
|
||
url += '?' + environ['QUERY_STRING']
|
||
|
||
Note that such a reconstructed URL may not be precisely the same URI
|
||
as requested by the client. Server rewrite rules, for example, may
|
||
have modified the client's originally requested URL to place it in a
|
||
canonical form.
|
||
|
||
|
||
Application Configuration
|
||
-------------------------
|
||
|
||
This specification does not define how a server selects or obtains an
|
||
application to invoke. These and other configuration options are
|
||
highly server-specific matters. It is expected that server/gateway
|
||
authors will document how to configure the server to execute a
|
||
particular application object, and with what options (such as
|
||
threading options).
|
||
|
||
Framework authors, on the other hand, should document how to create an
|
||
application object that wraps their framework's functionality. The
|
||
user, who has chosen both the server and the application framework,
|
||
must connect the two together. However, since both the framework and
|
||
the server now have a common interface, this should be merely a
|
||
mechanical matter, rather than a significant engineering effort for
|
||
each new server/framework pair.
|
||
|
||
|
||
Middleware
|
||
----------
|
||
|
||
Note that a single object may play the role of a server with respect
|
||
to some application(s), while also acting as an application with
|
||
respect to some server(s). Such "middleware" components can perform
|
||
such functions as:
|
||
|
||
* Routing a request to different application objects based on the
|
||
target URL, after rewriting the ``environ`` accordingly.
|
||
|
||
* Allowing multiple applications or frameworks to run side-by-side
|
||
in the same process
|
||
|
||
* Load balancing and remote processing, by forwarding requests and
|
||
responses over a network
|
||
|
||
* Perform content postprocessing, such as applying XSL stylesheets
|
||
|
||
Given the existence of applications and servers conforming to this
|
||
specification, the appearance of such reusable middleware becomes
|
||
a possibility.
|
||
|
||
|
||
Server Extension APIs
|
||
---------------------
|
||
|
||
Some server authors may wish to expose more advanced APIs, that
|
||
application or framework authors can use for specialized purposes.
|
||
For example, a gateway based on ``mod_python`` might wish to expose
|
||
part of the Apache API as a WSGI extension.
|
||
|
||
In the simplest case, this requires nothing more than defining an
|
||
``environ`` variable, such as ``mod_python.some_api``. But, in many
|
||
cases, the possible presence of middleware can make this difficult.
|
||
For example, an API that offers access to the same HTTP headers that
|
||
are found in ``environ`` variables, might return different data if
|
||
``environ`` has been modified by middleware.
|
||
|
||
In general, any extension API that duplicates, supplants, or bypasses
|
||
some portion of WSGI functionality runs the risk of being incompatible
|
||
with middleware components. Server/gateway developers should *not*
|
||
assume that nobody will use middleware, because some framework
|
||
developers specifically intend to organize or reorganize their
|
||
frameworks to function almost entirely as middleware of various kinds.
|
||
|
||
So, to provide maximum compatibility, servers and gateways that
|
||
provide extension APIs that replace some WSGI functionality, *must*
|
||
design those APIs so that they are invoked using the portion of the
|
||
API that they replace. For example, an extension API to access HTTP
|
||
request headers must require the application to pass in its current
|
||
``environ``, so that the server/gateway may verify that HTTP headers
|
||
accessible via the API have not been altered by middleware. If the
|
||
extension API cannot guarantee that it will always agree with
|
||
``environ`` about the contents of HTTP headers, it must refuse service
|
||
to the application, e.g. by raising an error, returning ``None``
|
||
instead of a header collection, or whatever is appropriate to the API.
|
||
|
||
Similarly, if an extension API provides an alternate means of writing
|
||
response data or headers, it should require the ``start_response``
|
||
callable to be passed in, before the application can obtain the
|
||
extended service. If the object passed in is not the same one that
|
||
the server/gateway originally supplied to the application, it cannot
|
||
guarantee correct operation and must refuse to provide the extended
|
||
service to the application.
|
||
|
||
These guidelines also apply to middleware that adds information such
|
||
as parsed cookies, form variables, sessions, and the like to
|
||
``environ``. Specifically, such middleware should provide these
|
||
features as functions which operate on ``environ``, rather than simply
|
||
stuffing values into ``environ``. This helps ensure that information
|
||
is calculated from ``environ`` *after* any middleware has done any URL
|
||
rewrites or other ``environ`` modifications.
|
||
|
||
It is very important that these "safe extension" rules be followed by
|
||
both server/gateway and middleware developers, in order to avoid a
|
||
future in which middleware developers are forced to delete any and all
|
||
extension APIs from ``environ`` to ensure that their mediation isn't
|
||
being bypassed by applications using those extensions!
|
||
|
||
|
||
HTTP 1.1 Expect/Continue
|
||
------------------------
|
||
|
||
Servers and gateways *must* provide transparent support for HTTP 1.1's
|
||
"expect/continue" mechanism, if they implement HTTP 1.1. This may be
|
||
done in any of several ways:
|
||
|
||
1. Reject all client requests containing an ``Expect: 100-continue``
|
||
header with a "417 Expectation failed" error. Such requests will
|
||
not be forwarded to an application object.
|
||
|
||
2. Respond to requests containing an ``Expect: 100-continue`` request
|
||
with an immediate "100 Continue" response, and proceed normally.
|
||
|
||
3. Proceed with the request normally, but provide the application
|
||
with a ``wsgi.input`` stream that will send the "100 Continue"
|
||
response if/when the application first attempts to read from the
|
||
input stream. The read request must then remain blocked until the
|
||
client responds.
|
||
|
||
Note that this behavior restriction does not apply for HTTP 1.0
|
||
requests, or for requests that are not directed to an application
|
||
object. For more information on HTTP 1.1 Expect/Continue, see RFC
|
||
2616, sections 8.2.3 and 10.1.1.
|
||
|
||
|
||
Questions and Answers
|
||
=====================
|
||
|
||
1. Why must ``environ`` be a dictionary? What's wrong with using a
|
||
subclass?
|
||
|
||
The rationale for requiring a dictionary is to maximize portability
|
||
between servers. The alternative would be to define some subset of
|
||
a dictionary's methods as being the standard and portable
|
||
interface. In practice, however, most servers will probably find a
|
||
dictionary adequate to their needs, and thus framework authors will
|
||
come to expect the full set of dictionary features to be available,
|
||
since they will be there more often than not. But, if some server
|
||
chooses *not* to use a dictionary, then there will be
|
||
interoperability problems despite that server's "conformance" to
|
||
spec. Therefore, making a dictionary mandatory simplifies the
|
||
specification and guarantees interoperabilty.
|
||
|
||
Note that this does not prevent server or framework developers from
|
||
offering specialized services as custom variables *inside* the
|
||
``environ`` dictionary. This is the recommended approach for
|
||
offering any such value-added services.
|
||
|
||
2. Why can you call ``write()`` *and* yield strings/return an
|
||
iterator? Shouldn't we pick just one way?
|
||
|
||
If we supported only the iteration approach, then current
|
||
frameworks that assume the availability of "push" suffer. But, if
|
||
we only support pushing via ``write()``, then server performance
|
||
suffers for transmission of e.g. large files (if a worker thread
|
||
can't begin work on a new request until all of the output has been
|
||
sent). Thus, this compromise allows an application framework to
|
||
support both approaches, as appropriate, but with only a little
|
||
more burden to the server implementor than a push-only approach
|
||
would require.
|
||
|
||
3. What's the ``close()`` for?
|
||
|
||
When writes are done from during the execution of an application
|
||
object, the application can ensure that resources are released
|
||
using a try/finally block. But, if the application returns an
|
||
iterator, any resources used will not be released until the
|
||
iterator is garbage collected. The ``close()`` idiom allows an
|
||
application to release critical resources at the end of a request,
|
||
and it's forward-compatible with the support for try/finally in
|
||
generators that's proposed by PEP 325.
|
||
|
||
4. Why is this interface so low-level? I want feature X! (e.g.
|
||
cookies, sessions, persistence, ...)
|
||
|
||
This isn't Yet Another Python Web Framework. It's just a way for
|
||
frameworks to talk to web servers, and vice versa. If you want
|
||
these features, you need to pick a web framework that provides the
|
||
features you want. And if that framework lets you create a WSGI
|
||
application, you should be able to run it in most WSGI-supporting
|
||
servers. Also, some WSGI servers may offer additional services via
|
||
objects provided in their ``environ`` dictionary; see the
|
||
applicable server documentation for details. (Of course,
|
||
applications that use such extensions will not be portable to other
|
||
WSGI-based servers.)
|
||
|
||
5. Why use CGI variables instead of good old HTTP headers? And why
|
||
mix them in with WSGI-defined variables?
|
||
|
||
Many existing web frameworks are built heavily upon the CGI spec,
|
||
and existing web servers know how to generate CGI variables. In
|
||
contrast, alternative ways of representing inbound HTTP information
|
||
are fragmented and lack market share. Thus, using the CGI
|
||
"standard" seems like a good way to leverage existing
|
||
implementations. As for mixing them with WSGI variables,
|
||
separating them would just require two dictionary arguments to be
|
||
passed around, while providing no real benefits.
|
||
|
||
6. What about the status string? Can't we just use the number,
|
||
passing in ``200`` instead of ``"200 OK"``?
|
||
|
||
Doing this would complicate the server or gateway, by requiring
|
||
them to have a table of numeric statuses and corresponding
|
||
messages. By contrast, it is easy for an application or framework
|
||
author to type the extra text to go with the specific response code
|
||
they are using, and existing frameworks often already have a table
|
||
containing the needed messages. So, on balance it seems better to
|
||
make the application/framework responsible, rather than the server
|
||
or gateway.
|
||
|
||
7. Why is ``wsgi.run_once`` not guaranteed to run the app only once?
|
||
|
||
Because it's merely a suggestion to the application that it should
|
||
"rig for infrequent running". This is intended for application
|
||
frameworks that have multiple modes of operation for caching,
|
||
sessions, and so forth. In a "multiple run" mode, such frameworks
|
||
may preload caches, and may not write e.g. logs or session data to
|
||
disk after each request. In "single run" mode, such frameworks
|
||
avoid preloading and flush all necessary writes after each request.
|
||
|
||
However, in order to test an application or framework to verify
|
||
correct operation in the latter mode, it may be necessary (or at
|
||
least expedient) to invoke it more than once. Therefore, an
|
||
application should not assume that it will definitely not be run
|
||
again, just because it is called with ``wsgi.run_once`` set to
|
||
``True``.
|
||
|
||
8. Feature X (dictionaries, callables, etc.) are ugly for use in
|
||
application code; why don't we use objects instead?
|
||
|
||
All of these implementation choices of WSGI are specifically
|
||
intended to *decouple* features from one another; recombining these
|
||
features into encapsulated objects makes it somewhat harder to
|
||
write servers or gateways, and an order of magnitude harder to
|
||
write middleware that replaces or modifies only small portions of
|
||
the overall functionality.
|
||
|
||
In essence, middleware wants to have a "Chain of Responsibility"
|
||
pattern, whereby it can act as a "handler" for some functions,
|
||
while allowing others to remain unchanged. This is difficult to do
|
||
with ordinary Python objects, if the interface is to remain
|
||
extensible. For example, one must use ``__getattr__`` or
|
||
``__getattribute__`` overrides, to ensure that extensions (such as
|
||
attributes defined by future WSGI versions) are passed through.
|
||
|
||
This type of code is notoriously difficult to get 100% correct, and
|
||
few people will want to write it themselves. They will therefore
|
||
copy other people's implementations, but fail to update them when
|
||
the person they copied from corrects yet another corner case.
|
||
|
||
Further, this necessary boilerplate would be pure excise, a
|
||
developer tax paid by middleware developers to support a slightly
|
||
prettier API for application framework developers. But,
|
||
application framework developers will typically only be updating
|
||
*one* framework to support WSGI, and in a very limited part of
|
||
their framework as a whole. It will likely be their first (and
|
||
maybe their only) WSGI implementation, and thus they will likely
|
||
implement with this specification ready to hand. Thus, the effort
|
||
of making the API "prettier" with object attributes and suchlike
|
||
would likely be wasted for this audience.
|
||
|
||
We encourage those who want a prettier (or otherwise improved) WSGI
|
||
interface for use in direct web application programming (as opposed
|
||
to web framework development) to develop APIs or frameworks that
|
||
wrap WSGI for convenient use by application developers. In this
|
||
way, WSGI can remain conveniently low-level for server and
|
||
middleware authors, while not being "ugly" for application
|
||
developers.
|
||
|
||
|
||
Open Issues
|
||
===========
|
||
|
||
The format of the ``headers`` passed to the ``start_response``
|
||
callable has seen some debate. Currently, it is a sequence of tuples,
|
||
but other formats have been suggested, such as a dictionary of lists,
|
||
or an ``email.Message`` object (from the Python standard library's
|
||
``email`` package). For various practical reasons, the "dictionary of
|
||
lists" approach has been ruled out, but ``email.Message`` is still a
|
||
candidate, as it provides several advantages for many middleware
|
||
developers and some application or framework developers, without an
|
||
excessive burden to anyone else.
|
||
|
||
Specifically, ``email.Message`` objects offer a mutable data
|
||
structure, (not unlike a case-insensitive dictionary) for containing
|
||
MIME headers, such as those used in an HTTP response. This makes it
|
||
very easy to modify headers, or add multi-valued headers such as
|
||
``Set-Cookie`` headers. If the ``headers`` passed to
|
||
``start_response`` were an ``email.Message``, then it would be easy
|
||
for middleware and servers to modify response headers, e.g. to supply
|
||
defaults for missing headers. It also leads to cleaner-looking
|
||
application code in some cases, e.g.::
|
||
|
||
from email.Message import Message
|
||
|
||
def application(environ, start_response):
|
||
headers = Message()
|
||
headers.set_type("text/plain")
|
||
headers.add_header("Set-Cookie", "FOO=BAR", path="/foobar")
|
||
start("200 OK", headers)("Hello world!")
|
||
|
||
Some have pointed out that this requires the developers of existing
|
||
frameworks to convert whatever header format they use to an
|
||
``email.Message``. But this is only relevant if the format they
|
||
already use is a list of name/value pairs: in all other cases they
|
||
would have to perform some conversion anyway.
|
||
|
||
In the event that the ``email.Message`` format is *not* chosen,
|
||
however, application developers will still have the option of using it
|
||
as a helper class. For example, the code below works with the current
|
||
WSGI spec, by passing the message object's ``items()`` (a list of
|
||
tuples) to ``start_response()``::
|
||
|
||
def application(environ, start_response):
|
||
headers = Message()
|
||
headers.set_type("text/plain")
|
||
headers.add_header("Set-Cookie", "FOO=BAR", path="/foobar")
|
||
start_response("200 OK", headers.items())("Hello world!")
|
||
|
||
But this doesn't help middleware authors, who would have to convert
|
||
the response headers into a ``Message`` object and back again if they
|
||
needed to modify the headers.
|
||
|
||
One other issue that's been brought up in relation to
|
||
``email.Message`` is that its ``set_type()`` method also sets a
|
||
``MIME-Version`` header. In order to comply properly with the MIME
|
||
and HTTP specifications, it would then be necessary for server/gateway
|
||
authors to ensure the presence of a ``Content-Transfer-Encoding``,
|
||
e.g.::
|
||
|
||
if ('MIME-Version' in headers and
|
||
'Content-Transfer-Encoding' not in headers
|
||
):
|
||
headers['Content-Transfer-Encoding'] = "8bit"
|
||
|
||
Also, ``email.Message`` has various features unrelated to HTTP or WSGI
|
||
that should not be used, and might be distracting or confusing to
|
||
authors.
|
||
|
||
|
||
Acknowledgements
|
||
================
|
||
|
||
Thanks go to the many folks on the Web-SIG mailing list whose
|
||
thoughtful feedback made this revised draft possible. Especially:
|
||
|
||
* Gregory "Grisha" Trubetskoy, author of ``mod_python``, who beat up
|
||
on the first draft as not offering any advantages over "plain old
|
||
CGI", thus encouraging me to look for a better approach.
|
||
|
||
* Ian Bicking, who helped nag me into properly specifying the
|
||
multithreading and multiprocess options, as well as badgering me to
|
||
provide a mechanism for servers to supply custom extension data to
|
||
an application.
|
||
|
||
* Tony Lownds, who came up with the concept of a ``start_response``
|
||
function that took the status and headers, returning a ``write``
|
||
function.
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. [1] The Python Wiki "Web Programming" topic
|
||
(http://www.python.org/cgi-bin/moinmoin/WebProgramming)
|
||
|
||
.. [2] The Common Gateway Interface Specification, v 1.1, 3rd Draft
|
||
(http://cgi-spec.golux.com/draft-coar-cgi-v11-03.txt)
|
||
|
||
.. [3] Hypertext Transfer Protocol -- HTTP/1.1, section 3.6.1
|
||
(http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1)
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
End:
|