1113 lines
48 KiB
Plaintext
1113 lines
48 KiB
Plaintext
PEP: 333
|
||
Title: Python Web Server Gateway Interface v1.0
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Phillip J. Eby <pje@telecommunity.com>
|
||
Discussions-To: Python Web-SIG <web-sig@python.org>
|
||
Status: Draft
|
||
Type: Informational
|
||
Content-Type: text/x-rst
|
||
Created: 07-Dec-2003
|
||
Post-History: 07-Dec-2003, 08-Aug-2004, 20-Aug-2004, 27-Aug-2004
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
This document specifies a proposed standard interface between web
|
||
servers and Python web applications or frameworks, to promote web
|
||
application portability across a variety of web servers.
|
||
|
||
|
||
Rationale and Goals
|
||
===================
|
||
|
||
Python currently boasts a wide variety of web application frameworks,
|
||
such as Zope, Quixote, Webware, SkunkWeb, PSO, and Twisted Web -- to
|
||
name just a few [1]_. This wide variety of choices can be a problem
|
||
for new Python users, because generally speaking, their choice of web
|
||
framework will limit their choice of usable web servers, and vice
|
||
versa.
|
||
|
||
By contrast, although Java has just as many web application frameworks
|
||
available, Java's "servlet" API makes it possible for applications
|
||
written with any Java web application framework to run in any web
|
||
server that supports the servlet API.
|
||
|
||
The availability and widespread use of such an API in web servers for
|
||
Python -- whether those servers are written in Python (e.g. Medusa),
|
||
embed Python (e.g. mod_python), or invoke Python via a gateway
|
||
protocol (e.g. CGI, FastCGI, etc.) -- would separate choice of
|
||
framework from choice of web server, freeing users to choose a pairing
|
||
that suits them, while freeing framework and server developers to
|
||
focus on their preferred area of specialization.
|
||
|
||
This PEP, therefore, proposes a simple and universal interface between
|
||
web servers and web applications or frameworks: the Python Web Server
|
||
Gateway Interface (WSGI).
|
||
|
||
But the mere existence of a WSGI spec does nothing to address the
|
||
existing state of servers and frameworks for Python web applications.
|
||
Server and framework authors and maintainers must actually implement
|
||
WSGI for there to be any effect.
|
||
|
||
However, since no existing servers or frameworks support WSGI, there
|
||
is little immediate reward for an author who implements WSGI support.
|
||
Thus, WSGI **must** be easy to implement, so that an author's initial
|
||
investment in the interface can be reasonably low.
|
||
|
||
Thus, simplicity of implementation on *both* the server and framework
|
||
sides of the interface is absolutely critical to the utility of the
|
||
WSGI interface, and is therefore the principal criterion for any
|
||
design decisions.
|
||
|
||
Note, however, that simplicity of implementation for a framework
|
||
author is not the same thing as ease of use for a web application
|
||
author. WSGI presents an absolutely "no frills" interface to the
|
||
framework author, because bells and whistles like response objects and
|
||
cookie handling would just get in the way of existing frameworks'
|
||
handling of these issues. Again, the goal of WSGI is to facilitate
|
||
easy interconnection of existing servers and applications or
|
||
frameworks, not to create a new web framework.
|
||
|
||
Note also that this goal precludes WSGI from requiring anything that
|
||
is not already available in deployed versions of Python. Therefore,
|
||
new standard library modules are not proposed or required by this
|
||
specification, and nothing in WSGI requires a Python version greater
|
||
than 2.2.2. (It would be a good idea, however, for future versions
|
||
of Python to include support for this interface in web servers
|
||
provided by the standard library.)
|
||
|
||
In addition to ease of implementation for existing and future
|
||
frameworks and servers, it should also be easy to create request
|
||
preprocessors, response postprocessors, and other WSGI-based
|
||
"middleware" components that look like an application to their
|
||
containing server, while acting as a server for their contained
|
||
applications.
|
||
|
||
If middleware can be both simple and robust, and WSGI is widely
|
||
available in servers and frameworks, it allows for the possibility
|
||
of an entirely new kind of Python web application framework: one
|
||
consisting of loosely-coupled WSGI middleware components. Indeed,
|
||
existing framework authors may even choose to refactor their
|
||
frameworks' existing services to be provided in this way, becoming
|
||
more like libraries used with WSGI, and less like monolithic
|
||
frameworks. This would then allow application developers to choose
|
||
"best-of-breed" components for specific functionality, rather than
|
||
having to commit to all the pros and cons of a single framework.
|
||
|
||
Of course, as of this writing, that day is doubtless quite far off.
|
||
In the meantime, it is a sufficient short-term goal for WSGI to
|
||
enable the use of any framework with any server.
|
||
|
||
Finally, it should be mentioned that the current version of WSGI
|
||
does not prescribe any particular mechanism for "deploying" an
|
||
application for use with a web server or server gateway. At the
|
||
present time, this is necessarily implementation-defined by the
|
||
server or gateway. After a sufficient number of servers and
|
||
frameworks have implemented WSGI to provide field experience with
|
||
varying deployment requirements, it may make sense to create
|
||
another PEP, describing a deployment standard for WSGI servers and
|
||
application frameworks.
|
||
|
||
|
||
Specification Overview
|
||
======================
|
||
|
||
The WSGI interface has two sides: the "server" or "gateway" side, and
|
||
the "application" or "framework" side. The server side invokes a
|
||
callable object that is provided by the application side. The
|
||
specifics of how that object is provided are up to the server or
|
||
gateway. It is assumed that some servers or gateways will require an
|
||
application's deployer to write a short script to create an instance
|
||
of the server or gateway, and supply it with the application object.
|
||
Other servers and gateways may use configuration files or other
|
||
mechanisms to specify where an application object should be
|
||
imported from, or otherwise obtained.
|
||
|
||
The application object is simply a callable object that accepts
|
||
two arguments. The term "object" should not be misconstrued as
|
||
requiring an actual object instance: a function, method, class,
|
||
or instance with a ``__call__`` method are all acceptable for
|
||
use as an application object.
|
||
|
||
(Note: although we refer to it as an "application" object, this
|
||
should not be construed to mean that application developers will use
|
||
WSGI as a web programming API! It is assumed that application
|
||
developers will continue to use existing, high-level framework
|
||
services to develop their applications. WSGI is a tool for
|
||
framework and server developers, and is not intended to directly
|
||
support application developers.)
|
||
|
||
Here are two example application objects; one is a function, and the
|
||
other is a class::
|
||
|
||
def simple_app(environ, start_response):
|
||
"""Simplest possible application object"""
|
||
status = '200 OK'
|
||
headers = [('Content-type','text/plain')]
|
||
start_response(status, headers)
|
||
return ['Hello world!\n']
|
||
|
||
|
||
class AppClass:
|
||
"""Produce the same output, but using a class
|
||
|
||
(Note: 'AppClass' is the "application" here, so calling it
|
||
returns an instance of 'AppClass', which is then the iterable
|
||
return value of the "application callable" as required by
|
||
the spec.
|
||
|
||
If we wanted to use *instances* of 'AppClass' as application
|
||
objects instead, we would have to implement a '__call__'
|
||
method, which would be invoked to execute the application,
|
||
and we would need to create an instance for use by the
|
||
server or gateway.
|
||
"""
|
||
|
||
def __init__(self, environ, start_response):
|
||
self.environ = environ
|
||
self.start = start_response
|
||
|
||
def __iter__(self):
|
||
status = '200 OK'
|
||
headers = [('Content-type','text/plain')]
|
||
self.start(status, headers)
|
||
yield "Hello world!\n"
|
||
|
||
Throughout this specification, we will use the term "a callable" to
|
||
mean "a function, method, class, or an instance with a ``__call__``
|
||
method". It is up to the server, gateway, or application implementing
|
||
the callable to choose the appropriate implementation technique for
|
||
their needs. Conversely, a server, gateway, or application that is
|
||
invoking a callable must *not* have any dependency on what kind of
|
||
callable was provided to it. Callables are only to be called, not
|
||
introspected upon.
|
||
|
||
The server or gateway invokes the application callable once for each
|
||
request it receives from an HTTP client, that is directed at the
|
||
application. To illustrate, here is a simple CGI gateway, implemented
|
||
as a function taking an application object (all error handling
|
||
omitted)::
|
||
|
||
import os, sys
|
||
|
||
def run_with_cgi(application):
|
||
|
||
environ = {}
|
||
environ.update(os.environ)
|
||
environ['wsgi.input'] = sys.stdin
|
||
environ['wsgi.errors'] = sys.stderr
|
||
environ['wsgi.version'] = (1,0)
|
||
environ['wsgi.multithread'] = False
|
||
environ['wsgi.multiprocess'] = True
|
||
environ['wsgi.last_call'] = True
|
||
|
||
def write(data):
|
||
sys.stdout.write(data)
|
||
sys.stdout.flush()
|
||
|
||
def start_response(status,headers):
|
||
sys.stdout.write("Status: %s\r\n" % status)
|
||
for key,val in headers:
|
||
sys.stdout.write("%s: %s\r\n" % (key,val))
|
||
sys.stdout.write("\r\n")
|
||
return write
|
||
|
||
result = application(environ, start_response)
|
||
try:
|
||
for data in result:
|
||
write(data)
|
||
finally:
|
||
if hasattr(result,'close'):
|
||
result.close()
|
||
|
||
In the next section, we will specify the precise semantics that
|
||
these illustrations are examples of.
|
||
|
||
|
||
Specification Details
|
||
=====================
|
||
|
||
The application object must accept two positional arguments. For
|
||
the sake of illustration, we have named them ``environ`` and
|
||
``start_response``, but they are not required to have these names.
|
||
A server or gateway **must** invoke the application object using
|
||
positional (not keyword) arguments. (E.g. by calling
|
||
``result = application(environ,start_response)`` as shown above.)
|
||
|
||
The ``environ`` parameter is a dictionary object, containing CGI-style
|
||
environment variables. This object **must** be a builtin Python
|
||
dictionary (*not* a subclass, ``UserDict`` or other dictionary
|
||
emulation), and the application is allowed to modify the dictionary
|
||
in any way it desires. The dictionary must also include certain
|
||
WSGI-required variables (described in a later section), and may
|
||
also include server-specific extension variables, named according
|
||
to a convention that will be described below.
|
||
|
||
The ``start_response`` parameter is a callable accepting two
|
||
positional arguments. For the sake of illustration, we have named
|
||
them ``status`` and ``headers``, but they are not required to have
|
||
these names, and the application **must** invoke the ``start_response``
|
||
callable using positional arguments
|
||
(e.g. ``start_response(status,headers)``).
|
||
|
||
The ``status`` parameter is a status string of the form
|
||
``"999 Message here"``, and a list of ``(header_name,header_value)``
|
||
tuples describing the HTTP response header. This ``start_response``
|
||
callable must return a ``write(body_data)`` callable that takes one
|
||
positional parameter: a string to be written as part of the HTTP
|
||
response body. (Note: the ``write()`` callable is provided only
|
||
to support certain existing frameworks' imperative output APIs;
|
||
it should not be used by new applications or frameworks. See
|
||
the `Buffering and Streaming`_ section for more details.)
|
||
|
||
The application object must return an iterable yielding strings.
|
||
(For example, it could be a generator-iterator that yields strings,
|
||
or it could be a sequence such as a list of strings.) The server
|
||
or gateway must transmit these strings to the client in an
|
||
unbuffered fashion, completing the transmission of each string
|
||
before requesting another one. (See the `Buffering and Streaming`_
|
||
section below for more on how application output must be handled.)
|
||
|
||
The server or gateway must not modify supplied strings in any way;
|
||
they must be treated as binary byte sequences with no character
|
||
interpretation, line ending changes, or other modification. The
|
||
application is responsible for ensuring that the string(s) to be
|
||
written are in a format suitable for the client.
|
||
|
||
If a call to ``len(iterable)`` succeeds, the server must be able
|
||
to rely on the result being accurate. That is, if the iterable
|
||
returned by the application provides a working ``__len__()``
|
||
method, it **must** return an accurate result.
|
||
|
||
If the returned iterable has a ``fileno`` attribute, the server **may**
|
||
assume that this is a ``fileno()`` method returning an operating
|
||
system file descriptor, and that it is allowed to read directly from
|
||
that descriptor up to the end of the file, and/or use any appropriate
|
||
operating system facilities (e.g. the ``sendfile()`` system call) to
|
||
transmit the file's contents. If the server does this, it must begin
|
||
transmission with the file's current position, and end at the end of
|
||
the file.
|
||
|
||
Note that an application **must not** return an iterable with a
|
||
``fileno`` attribute if it is anything other than a method returning
|
||
an **operating system file descriptor**. "File-like" objects
|
||
that do not possess a true operating system file descriptor number
|
||
are expressly forbidden. Servers running on platforms where file
|
||
descriptors do not exist, or where there is no meaningful API for
|
||
accelerating transmission from a file descriptor should ignore the
|
||
``fileno`` attribute.
|
||
|
||
If the iterable returned by the application has a ``close()`` method,
|
||
the server or gateway **must** call that method upon completion of the
|
||
current request, whether the request was completed normally, or
|
||
terminated early due to an error. (This is to support resource release
|
||
by the application. This protocol is intended to support PEP 325, and
|
||
also other simple cases such as an application returning an open text
|
||
file.)
|
||
|
||
(Note: the application **must** invoke the ``start_response()``
|
||
callable before the iterable yields its first body string, so that the
|
||
server can send the headers before any body content. However, this
|
||
invocation **may** be performed by the iterable's first iteration, so
|
||
servers **must not** assume that ``start_response()`` has been called
|
||
before they begin iterating over the iterable.)
|
||
|
||
Finally, servers **must not** directly use any other attributes of
|
||
the iterable returned by the application. For example, it the
|
||
iterable is a file object, it may have a ``read()`` method, but
|
||
the server **must not** utilize it. Only attributes specified
|
||
here, or accessed via e.g. the PEP 234 iteration APIs are
|
||
acceptable.
|
||
|
||
|
||
``environ`` Variables
|
||
---------------------
|
||
|
||
The ``environ`` dictionary is required to contain these CGI
|
||
environment variables, as defined by the Common Gateway Interface
|
||
specification [2]_. The following variables **must** be present, but
|
||
**may** be an empty string, if there is no more appropriate value for
|
||
them:
|
||
|
||
* ``REQUEST_METHOD``
|
||
|
||
* ``SCRIPT_NAME`` (The initial portion of the request URL's "path"
|
||
that corresponds to the application object, so that the application
|
||
knows its virtual "location".)
|
||
|
||
* ``PATH_INFO`` (The remainder of the request URL's "path",
|
||
designating the virtual "location" of the request's target within
|
||
the application)
|
||
|
||
* ``QUERY_STRING``
|
||
|
||
* ``CONTENT_TYPE``
|
||
|
||
* ``CONTENT_LENGTH``
|
||
|
||
* ``SERVER_NAME`` and ``SERVER_PORT`` (which, when combined with
|
||
``SCRIPT_NAME`` and ``PATH_INFO``, should complete the URL. Note,
|
||
however, that ``HTTP_HOST``, if present, should be used in
|
||
preference to ``SERVER_NAME`` for constructing the URL. See the
|
||
`URL Reconstruction`_ section below for more detail.)
|
||
|
||
* Variables corresponding to the client-supplied HTTP headers (i.e.,
|
||
variables whose names begin with ``"HTTP_"``).
|
||
|
||
In general, a server or gateway should attempt to provide as many
|
||
other CGI variables as are applicable, including e.g. the nonstandard
|
||
SSL variables such as ``HTTPS=on``, if an SSL connection is in effect.
|
||
However, an application that uses any variables other than the ones
|
||
listed above are necessarily non-portable to web servers that do not
|
||
support the relevant extensions.
|
||
|
||
A WSGI-compliant server or gateway *should* document what variables it
|
||
provides, along with their definitions as appropriate. Applications
|
||
*should* check for the presence of any nonstandard variables they
|
||
require, and have a fallback plan in the event such a variable is
|
||
absent.
|
||
|
||
Note: missing variables (such as ``REMOTE_USER`` when no
|
||
authentication has occurred) should be left out of the ``environ``
|
||
dictionary. Also note that CGI-defined variables must be strings,
|
||
if they are present at all. It is a violation of this specification
|
||
for a CGI variable's value to be of any type other than ``str``.
|
||
|
||
In addition to the CGI-defined variables, the ``environ`` dictionary
|
||
must also contain the following WSGI-defined variables:
|
||
|
||
===================== ===============================================
|
||
Variable Value
|
||
===================== ===============================================
|
||
``wsgi.version`` The tuple ``(1,0)``, representing WSGI
|
||
version 1.0.
|
||
|
||
``wsgi.input`` An input stream from which the HTTP request
|
||
body can be read. (The server or gateway may
|
||
perform reads on-demand as requested by the
|
||
application, or it may pre-read the client's
|
||
request body and buffer it in-memory or on
|
||
disk, or use any other technique for providing
|
||
such an input stream, according to its
|
||
preference.)
|
||
|
||
``wsgi.errors`` An output stream to which error output can be
|
||
written, for the purpose of recording program
|
||
or other errors in a standardized and possibly
|
||
centralized location. For many servers, this
|
||
will be the server's main error log.
|
||
|
||
Alternatively, this may be ``sys.stderr``, or
|
||
a log file of some sort. The server's
|
||
documentation should include an explanation of
|
||
how to configure this or where to find the
|
||
recorded output. A server or gateway may
|
||
supply different error streams to different
|
||
applications, if this is desired.
|
||
|
||
``wsgi.multithread`` This value should be true if the application
|
||
object may be simultaneously invoked by another
|
||
thread in the same process, and false
|
||
otherwise.
|
||
|
||
``wsgi.multiprocess`` This value should be true if an equivalent
|
||
application object may be simultaneously
|
||
invoked by another process, and false
|
||
otherwise.
|
||
|
||
``wsgi.run_once`` This value should be true if the server/gateway
|
||
expects (but does not guarantee!) that the
|
||
application will only be invoked this one time
|
||
during the life of its containing process.
|
||
Normally, this will only be true for a gateway
|
||
based on CGI (or something similar).
|
||
===================== ===============================================
|
||
|
||
Finally, the ``environ`` dictionary may also contain server-defined
|
||
variables. These variables should be named using only lower-case
|
||
letters, numbers, dots, and underscores, and should be prefixed with
|
||
a name that is unique to the defining server or gateway. For
|
||
example, ``mod_python`` might define variables with names like
|
||
``mod_python.some_variable``.
|
||
|
||
|
||
Input and Error Streams
|
||
~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
The input and error streams provided by the server must support
|
||
the following methods:
|
||
|
||
=================== ========== ========
|
||
Method Stream Notes
|
||
=================== ========== ========
|
||
``read(size)`` ``input`` 1
|
||
``readline()`` ``input`` 1,2
|
||
``readlines(hint)`` ``input`` 1,3
|
||
``__iter__()`` ``input``
|
||
``flush()`` ``errors`` 4
|
||
``write(str)`` ``errors``
|
||
``writelines(seq)`` ``errors``
|
||
=================== ========== ========
|
||
|
||
The semantics of each method are as documented in the Python Library
|
||
Reference, except for these notes as listed in the table above:
|
||
|
||
1. The server is not required to read past the client's specified
|
||
``Content-Length``, and is allowed to simulate an end-of-file
|
||
condition if the application attempts to read past that point.
|
||
The application *should not* attempt to read more data than is
|
||
specified by the ``CONTENT_LENGTH`` variable.
|
||
|
||
2. The optional "size" argument to ``readline()`` is not supported,
|
||
as it may be complex for server authors to implement, and is not
|
||
often used in practice.
|
||
|
||
3. Note that the ``hint`` argument to ``readlines()`` is optional for
|
||
both caller and implementer. The application is free not to
|
||
supply it, and the server or gateway is free to ignore it.
|
||
|
||
4. Since the ``errors`` stream may not be rewound, a container is
|
||
free to forward write operations immediately, without buffering.
|
||
In this case, the ``flush()`` method may be a no-op. Portable
|
||
applications, however, cannot assume that output is unbuffered
|
||
or that ``flush()`` is a no-op. They must call ``flush()`` if
|
||
they need to ensure that output has in fact been written. (For
|
||
example, to minimize intermingling of data from multiple processes
|
||
writing to the same error log.)
|
||
|
||
The methods listed in the table above **must** be supported by all
|
||
servers conforming to this specification. Applications conforming
|
||
to this specification **must not** use any other methods or attributes
|
||
of the ``input`` or ``errors`` objects. In particular, applications
|
||
**must not** attempt to close these streams, even if they possess
|
||
``close()`` methods.
|
||
|
||
|
||
The ``start_response()`` Callable
|
||
---------------------------------
|
||
|
||
The second parameter passed to the application object is itself a
|
||
two-argument callable, of the form ``start_response(status,headers)``.
|
||
(As with all WSGI callables, the arguments must be supplied
|
||
positionally, not by keyword.) The ``start_response`` callable is
|
||
used to begin the HTTP response, and it must return a
|
||
``write(body_data)`` callable (see the `Buffering and Streaming`_
|
||
section, below).
|
||
|
||
The ``status`` argument is an HTTP "status" string like ``"200 OK"``
|
||
or ``"404 Not Found"``. The string **must** be pure 7-bit ASCII,
|
||
containing no control characters. It must not be terminated with
|
||
a carriage return or linefeed.
|
||
|
||
The ``headers`` argument is a list of ``(header_name,header_value)``
|
||
tuples. It must be a Python list; i.e. ``type(headers) is
|
||
ListType)``, and the server **may** change its contents in any way
|
||
it desires. Each ``header_name`` must be a valid HTTP header name,
|
||
without a trailing colon or other punctuation. Each ``header_value``
|
||
**must not** include *any* control characters, including carriage
|
||
returns or linefeeds, either embedded or at the end. (These
|
||
requirements are to minimize the complexity of any parsing that must
|
||
be performed by servers, gateways, and intermediate response
|
||
processors that need to inspect or modify response headers.)
|
||
|
||
In general, the server or gateway is responsible for ensuring that
|
||
correct headers are sent to the client: if the application omits
|
||
a needed header, the server or gateway *should* add it. For example,
|
||
the HTTP ``Date:`` and ``Server:`` headers would normally be supplied
|
||
by the server or gateway.
|
||
|
||
(A reminder for server/gateway authors: HTTP header names are
|
||
case-insensitive, so be sure to take that into consideration when
|
||
examining application-supplied headers!)
|
||
|
||
If the application supplies headers that would affect the persistence
|
||
of the client's connection (e.g. ``Connection:``, "keep-alives", etc.),
|
||
the server or gateway is permitted to discard or modify these headers,
|
||
if the server cannot or will not conform to the application's requested
|
||
semantics. E.g., if the application requests a persistent connection
|
||
but the server wishes transience, or vice versa.
|
||
|
||
However, if a server or gateway discards or overrides any application
|
||
header for any reason, it **must** record this action in a log (such as
|
||
the ``wsgi.errors`` log) for the benefit of the application author.
|
||
|
||
|
||
Handling the ``Content-Length`` Header
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
If the application does not supply a ``Content-Length`` header, a
|
||
server or gateway may choose one of several approaches to handling
|
||
it. The simplest of these is to close the client connection when
|
||
the response is completed.
|
||
|
||
Under some circumstances, however, the server or gateway may be
|
||
able to either generate a ``Content-Length`` header, or at least
|
||
avoid the need to close the client connection. If the application
|
||
does *not* call the ``write()`` callable, and returns an iterable
|
||
whose ``len()`` is 1, then the server can automatically determine
|
||
``Content-Length`` by taking the length of the first string yielded
|
||
by the iterable.
|
||
|
||
And, if the server and client both support HTTP/1.1 "chunked
|
||
encoding" [3]_, then the server **may** use chunked encoding to send
|
||
a chunk for each ``write()`` call or string yielded by the iterable,
|
||
thus generating a ``Content-Length`` header for each chunk. This
|
||
allows the server to keep the client connection alive, if it wishes
|
||
to do so. Note that the server **must** comply fully with RFC 2616
|
||
when doing this, or else fall back to one of the other strategies for
|
||
dealing with the absence of ``Content-Length``.
|
||
|
||
|
||
Buffering and Streaming
|
||
-----------------------
|
||
|
||
Generally speaking, applications will achieve the best throughput
|
||
by buffering their (modestly-sized) output and sending it all at
|
||
once. When this is the case, applications **should** simply
|
||
return a single-element iterable containing their entire output as
|
||
a single string.
|
||
|
||
(In addition to improved performance, buffering all of an application's
|
||
output has an advantage for error handling: the buffered output can
|
||
be discarded and replaced by an error page, rather than dumping an
|
||
error message in the middle of some partially-completed output. For
|
||
this and other reasons, many existing Python frameworks already
|
||
accumulate their output for a single write, unless the application
|
||
explicitly requests streaming, or the expected output is larger than
|
||
practical for buffering (e.g. multi-megabyte PDFs).)
|
||
|
||
For large files, however, or for specialized uses of HTTP streaming
|
||
(such as multipart "server push"), an application may need to provide
|
||
output in smaller blocks (e.g. to avoid loading a large file into
|
||
memory). It's also sometimes the case that part of a response may
|
||
be time-consuming to produce, but it would be useful to send ahead the
|
||
portion of the response that precedes it.
|
||
|
||
In these cases, applications **should** return an iterator (usually
|
||
a generator-iterator) that produces the output in a block-by-block
|
||
fashion. These blocks may be broken to coincide with mulitpart
|
||
boundaries (for "server push"), or just before time-consuming
|
||
tasks (such as reading another block of an on-disk file).
|
||
|
||
WSGI servers and gateways **must not** delay the transmission
|
||
of any block; they **must** either fully transmit the block to
|
||
the client, or guarantee that they will continue transmission
|
||
even while the application is producing its next block. A
|
||
server/gateway may provide this guarantee in one of two ways:
|
||
|
||
1. Send the entire block to the operating system (and request
|
||
that any O/S buffers be flushed) before returning control
|
||
to the application, OR
|
||
|
||
2. Use a different thread to ensure that the block continues
|
||
to be transmitted while the application produces the next
|
||
block.
|
||
|
||
By providing this guarantee, WSGI allows applications to ensure
|
||
that transmission will not become stalled at an arbitrary point
|
||
in their output data. This is critical for proper functioning
|
||
of e.g. multipart "server push" streaming, where data between
|
||
multipart boundaries should be transmitted in full to the client.
|
||
|
||
|
||
The ``write()`` Callable
|
||
~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Some existing application framework APIs support unbuffered
|
||
output in a different manner than WSGI. Specifically, they
|
||
provide a "write" function or method of some kind to write
|
||
an unbuffered block of data, or else they provide a buffered
|
||
"write" function and a "flush" mechanism to flush the buffer.
|
||
|
||
Unfortunately, such APIs cannot be implemented in terms of
|
||
WSGI's "iterable" application return value, unless threads
|
||
or other special mechanisms are used.
|
||
|
||
Therefore, to allow these frameworks to continue using an
|
||
imperative API, WSGI includes a special ``write()`` callable,
|
||
returned by the ``start_response`` callable.
|
||
|
||
New WSGI applications and frameworks **should not** use the
|
||
``write()`` callable if it is possible to avoid doing so. The
|
||
``write()`` callable is strictly a hack to support existing
|
||
frameworks' imperative APIs. In general, applications
|
||
should either be internally buffered, or produce iterable output.
|
||
|
||
The ``write()`` callable is returned by the ``start_response()``
|
||
callable, and it accepts a single parameter: a string to be
|
||
written as part of the HTTP response body, that is treated exactly
|
||
as though it had been yielded by the output iterable. In other
|
||
words, before ``write()`` returns, it must guarantee that the
|
||
passed-in string was either completely sent to the client, or
|
||
that it is buffered for transmission while the application
|
||
proceeds forward.
|
||
|
||
An application **may** return a non-empty iterable even if it
|
||
invokes ``write()``, and that output must be treated normally
|
||
by the server or gateway.
|
||
|
||
|
||
Implementation/Application Notes
|
||
================================
|
||
|
||
Unicode
|
||
-------
|
||
|
||
HTTP does not directly support Unicode, and neither does this
|
||
interface. All encoding/decoding must be handled by the application;
|
||
all strings and streams passed to or from the server must be standard
|
||
Python byte strings, not Unicode objects. The result of using a
|
||
Unicode object where a string object is required, is undefined.
|
||
|
||
|
||
Multiple Invocations
|
||
--------------------
|
||
|
||
Application objects must be able to be invoked more than once, since
|
||
virtually all servers/gateways will make such requests.
|
||
|
||
|
||
Error Handling
|
||
--------------
|
||
|
||
Servers *should* trap and log exceptions raised by
|
||
applications, and **may** continue to execute, or attempt to shut down
|
||
gracefully. Applications *should* avoid allowing exceptions to
|
||
escape their execution scope, since the result of uncaught exceptions
|
||
is server-defined.
|
||
|
||
|
||
Thread Support
|
||
--------------
|
||
|
||
Thread support, or lack thereof, is also server-dependent.
|
||
Servers that can run multiple requests in parallel, *should* also
|
||
provide the option of running an application in a single-threaded
|
||
fashion, so that applications or frameworks that are not thread-safe
|
||
may still be used with that server.
|
||
|
||
|
||
URL Reconstruction
|
||
------------------
|
||
|
||
If an application wishes to reconstruct a request's complete URL, it
|
||
may do so using the following algorithm, contributed by Ian Bicking::
|
||
|
||
if environ.get('HTTPS') == 'on':
|
||
url = 'https://'
|
||
else:
|
||
url = 'http://'
|
||
|
||
if environ.get('HTTP_HOST'):
|
||
url += environ['HTTP_HOST']
|
||
else:
|
||
url += environ['SERVER_NAME']
|
||
|
||
if environ.get('HTTPS') == 'on':
|
||
if environ['SERVER_PORT'] != '443'
|
||
url += ':' + environ['SERVER_PORT']
|
||
else:
|
||
if environ['SERVER_PORT'] != '80':
|
||
url += ':' + environ['SERVER_PORT']
|
||
|
||
url += environ['SCRIPT_NAME']
|
||
url += environ['PATH_INFO']
|
||
if environ.get('QUERY_STRING'):
|
||
url += '?' + environ['QUERY_STRING']
|
||
|
||
Note that such a reconstructed URL may not be precisely the same URI
|
||
as requested by the client. Server rewrite rules, for example, may
|
||
have modified the client's originally requested URL to place it in a
|
||
canonical form.
|
||
|
||
|
||
Application Configuration
|
||
-------------------------
|
||
|
||
This specification does not define how a server selects or obtains an
|
||
application to invoke. These and other configuration options are
|
||
highly server-specific matters. It is expected that server/gateway
|
||
authors will document how to configure the server to execute a
|
||
particular application object, and with what options (such as
|
||
threading options).
|
||
|
||
Framework authors, on the other hand, should document how to create an
|
||
application object that wraps their framework's functionality. The
|
||
user, who has chosen both the server and the application framework,
|
||
must connect the two together. However, since both the framework and
|
||
the server now have a common interface, this should be merely a
|
||
mechanical matter, rather than a significant engineering effort for
|
||
each new server/framework pair.
|
||
|
||
|
||
Middleware
|
||
----------
|
||
|
||
Note that a single object may play the role of a server with respect
|
||
to some application(s), while also acting as an application with
|
||
respect to some server(s). Such "middleware" components can perform
|
||
such functions as:
|
||
|
||
* Routing a request to different application objects based on the
|
||
target URL, after rewriting the ``environ`` accordingly.
|
||
|
||
* Allowing multiple applications or frameworks to run side-by-side
|
||
in the same process
|
||
|
||
* Load balancing and remote processing, by forwarding requests and
|
||
responses over a network
|
||
|
||
* Perform content postprocessing, such as applying XSL stylesheets
|
||
|
||
Given the existence of applications and servers conforming to this
|
||
specification, the appearance of such reusable middleware becomes
|
||
a possibility.
|
||
|
||
|
||
Supporting Older (<2.2) Versions of Python
|
||
------------------------------------------
|
||
|
||
Some servers, gateways, or applications may wish to support older
|
||
(<2.2) versions of Python. This is especially important if Jython
|
||
is a target platform, since as of this writing a production-ready
|
||
version of Jython 2.2 is not yet available.
|
||
|
||
For servers and gateways, this is relatively straightforward:
|
||
servers and gateways targeting pre-2.2 versions of Python must
|
||
simply restrict themselves to using only a standard "for" loop to
|
||
iterate over any iterable returned by an application. This is the
|
||
only way to ensure source-level compatibility with both the pre-2.2
|
||
iterator protocol (discussed further below) and "today's" iterator
|
||
protocol (see PEP 234).
|
||
|
||
(Note that this technique necessarily applies only to servers,
|
||
gateways, or middleware that are written in Python. Discussion of
|
||
how to use iterator protocol(s) correctly from other languages is
|
||
outside the scope of this PEP.)
|
||
|
||
For applications, supporting pre-2.2 versions of Python is slightly
|
||
more complex:
|
||
|
||
* You may not return a file object and expect it to work as an iterable,
|
||
since before Python 2.2, files were not iterable.
|
||
|
||
* If you return an iterable, it **must** implement the pre-2.2 iterator
|
||
protocol. That is, provide a ``__getitem__`` method that accepts
|
||
an integer key, and raises ``IndexError`` when exhausted.
|
||
|
||
Finally, middleware that wishes to support pre-2.2 versions of Python,
|
||
and iterates over application return values or itself returns an
|
||
iterable (or both), must follow the appropriate recommendations above.
|
||
|
||
(Note: It should go without saying that to support pre-2.2 versions
|
||
of Python, any server, gateway, application, or middleware must also
|
||
use only language features available in the target version, use
|
||
1 and 0 instead of ``True`` and ``False``, etc.)
|
||
|
||
|
||
|
||
Server Extension APIs
|
||
---------------------
|
||
|
||
Some server authors may wish to expose more advanced APIs, that
|
||
application or framework authors can use for specialized purposes.
|
||
For example, a gateway based on ``mod_python`` might wish to expose
|
||
part of the Apache API as a WSGI extension.
|
||
|
||
In the simplest case, this requires nothing more than defining an
|
||
``environ`` variable, such as ``mod_python.some_api``. But, in many
|
||
cases, the possible presence of middleware can make this difficult.
|
||
For example, an API that offers access to the same HTTP headers that
|
||
are found in ``environ`` variables, might return different data if
|
||
``environ`` has been modified by middleware.
|
||
|
||
In general, any extension API that duplicates, supplants, or bypasses
|
||
some portion of WSGI functionality runs the risk of being incompatible
|
||
with middleware components. Server/gateway developers should *not*
|
||
assume that nobody will use middleware, because some framework
|
||
developers specifically intend to organize or reorganize their
|
||
frameworks to function almost entirely as middleware of various kinds.
|
||
|
||
So, to provide maximum compatibility, servers and gateways that
|
||
provide extension APIs that replace some WSGI functionality, **must**
|
||
design those APIs so that they are invoked using the portion of the
|
||
API that they replace. For example, an extension API to access HTTP
|
||
request headers must require the application to pass in its current
|
||
``environ``, so that the server/gateway may verify that HTTP headers
|
||
accessible via the API have not been altered by middleware. If the
|
||
extension API cannot guarantee that it will always agree with
|
||
``environ`` about the contents of HTTP headers, it must refuse service
|
||
to the application, e.g. by raising an error, returning ``None``
|
||
instead of a header collection, or whatever is appropriate to the API.
|
||
|
||
Similarly, if an extension API provides an alternate means of writing
|
||
response data or headers, it should require the ``start_response``
|
||
callable to be passed in, before the application can obtain the
|
||
extended service. If the object passed in is not the same one that
|
||
the server/gateway originally supplied to the application, it cannot
|
||
guarantee correct operation and must refuse to provide the extended
|
||
service to the application.
|
||
|
||
These guidelines also apply to middleware that adds information such
|
||
as parsed cookies, form variables, sessions, and the like to
|
||
``environ``. Specifically, such middleware should provide these
|
||
features as functions which operate on ``environ``, rather than simply
|
||
stuffing values into ``environ``. This helps ensure that information
|
||
is calculated from ``environ`` *after* any middleware has done any URL
|
||
rewrites or other ``environ`` modifications.
|
||
|
||
It is very important that these "safe extension" rules be followed by
|
||
both server/gateway and middleware developers, in order to avoid a
|
||
future in which middleware developers are forced to delete any and all
|
||
extension APIs from ``environ`` to ensure that their mediation isn't
|
||
being bypassed by applications using those extensions!
|
||
|
||
|
||
HTTP 1.1 Expect/Continue
|
||
------------------------
|
||
|
||
Servers and gateways **must** provide transparent support for HTTP
|
||
1.1's "expect/continue" mechanism, if they implement HTTP 1.1. This
|
||
may be done in any of several ways:
|
||
|
||
1. Reject all client requests containing an ``Expect: 100-continue``
|
||
header with a "417 Expectation failed" error. Such requests will
|
||
not be forwarded to an application object.
|
||
|
||
2. Respond to requests containing an ``Expect: 100-continue`` request
|
||
with an immediate "100 Continue" response, and proceed normally.
|
||
|
||
3. Proceed with the request normally, but provide the application
|
||
with a ``wsgi.input`` stream that will send the "100 Continue"
|
||
response if/when the application first attempts to read from the
|
||
input stream. The read request must then remain blocked until the
|
||
client responds.
|
||
|
||
Note that this behavior restriction does not apply for HTTP 1.0
|
||
requests, or for requests that are not directed to an application
|
||
object. For more information on HTTP 1.1 Expect/Continue, see RFC
|
||
2616, sections 8.2.3 and 10.1.1.
|
||
|
||
|
||
Questions and Answers
|
||
=====================
|
||
|
||
1. Why must ``environ`` be a dictionary? What's wrong with using a
|
||
subclass?
|
||
|
||
The rationale for requiring a dictionary is to maximize portability
|
||
between servers. The alternative would be to define some subset of
|
||
a dictionary's methods as being the standard and portable
|
||
interface. In practice, however, most servers will probably find a
|
||
dictionary adequate to their needs, and thus framework authors will
|
||
come to expect the full set of dictionary features to be available,
|
||
since they will be there more often than not. But, if some server
|
||
chooses *not* to use a dictionary, then there will be
|
||
interoperability problems despite that server's "conformance" to
|
||
spec. Therefore, making a dictionary mandatory simplifies the
|
||
specification and guarantees interoperabilty.
|
||
|
||
Note that this does not prevent server or framework developers from
|
||
offering specialized services as custom variables *inside* the
|
||
``environ`` dictionary. This is the recommended approach for
|
||
offering any such value-added services.
|
||
|
||
2. Why can you call ``write()`` *and* yield strings/return an
|
||
iterator? Shouldn't we pick just one way?
|
||
|
||
If we supported only the iteration approach, then current
|
||
frameworks that assume the availability of "push" suffer. But, if
|
||
we only support pushing via ``write()``, then server performance
|
||
suffers for transmission of e.g. large files (if a worker thread
|
||
can't begin work on a new request until all of the output has been
|
||
sent). Thus, this compromise allows an application framework to
|
||
support both approaches, as appropriate, but with only a little
|
||
more burden to the server implementor than a push-only approach
|
||
would require.
|
||
|
||
3. What's the ``close()`` for?
|
||
|
||
When writes are done from during the execution of an application
|
||
object, the application can ensure that resources are released
|
||
using a try/finally block. But, if the application returns an
|
||
iterator, any resources used will not be released until the
|
||
iterator is garbage collected. The ``close()`` idiom allows an
|
||
application to release critical resources at the end of a request,
|
||
and it's forward-compatible with the support for try/finally in
|
||
generators that's proposed by PEP 325.
|
||
|
||
4. Why is this interface so low-level? I want feature X! (e.g.
|
||
cookies, sessions, persistence, ...)
|
||
|
||
This isn't Yet Another Python Web Framework. It's just a way for
|
||
frameworks to talk to web servers, and vice versa. If you want
|
||
these features, you need to pick a web framework that provides the
|
||
features you want. And if that framework lets you create a WSGI
|
||
application, you should be able to run it in most WSGI-supporting
|
||
servers. Also, some WSGI servers may offer additional services via
|
||
objects provided in their ``environ`` dictionary; see the
|
||
applicable server documentation for details. (Of course,
|
||
applications that use such extensions will not be portable to other
|
||
WSGI-based servers.)
|
||
|
||
5. Why use CGI variables instead of good old HTTP headers? And why
|
||
mix them in with WSGI-defined variables?
|
||
|
||
Many existing web frameworks are built heavily upon the CGI spec,
|
||
and existing web servers know how to generate CGI variables. In
|
||
contrast, alternative ways of representing inbound HTTP information
|
||
are fragmented and lack market share. Thus, using the CGI
|
||
"standard" seems like a good way to leverage existing
|
||
implementations. As for mixing them with WSGI variables,
|
||
separating them would just require two dictionary arguments to be
|
||
passed around, while providing no real benefits.
|
||
|
||
6. What about the status string? Can't we just use the number,
|
||
passing in ``200`` instead of ``"200 OK"``?
|
||
|
||
Doing this would complicate the server or gateway, by requiring
|
||
them to have a table of numeric statuses and corresponding
|
||
messages. By contrast, it is easy for an application or framework
|
||
author to type the extra text to go with the specific response code
|
||
they are using, and existing frameworks often already have a table
|
||
containing the needed messages. So, on balance it seems better to
|
||
make the application/framework responsible, rather than the server
|
||
or gateway.
|
||
|
||
7. Why is ``wsgi.run_once`` not guaranteed to run the app only once?
|
||
|
||
Because it's merely a suggestion to the application that it should
|
||
"rig for infrequent running". This is intended for application
|
||
frameworks that have multiple modes of operation for caching,
|
||
sessions, and so forth. In a "multiple run" mode, such frameworks
|
||
may preload caches, and may not write e.g. logs or session data to
|
||
disk after each request. In "single run" mode, such frameworks
|
||
avoid preloading and flush all necessary writes after each request.
|
||
|
||
However, in order to test an application or framework to verify
|
||
correct operation in the latter mode, it may be necessary (or at
|
||
least expedient) to invoke it more than once. Therefore, an
|
||
application should not assume that it will definitely not be run
|
||
again, just because it is called with ``wsgi.run_once`` set to
|
||
``True``.
|
||
|
||
8. Feature X (dictionaries, callables, etc.) are ugly for use in
|
||
application code; why don't we use objects instead?
|
||
|
||
All of these implementation choices of WSGI are specifically
|
||
intended to *decouple* features from one another; recombining these
|
||
features into encapsulated objects makes it somewhat harder to
|
||
write servers or gateways, and an order of magnitude harder to
|
||
write middleware that replaces or modifies only small portions of
|
||
the overall functionality.
|
||
|
||
In essence, middleware wants to have a "Chain of Responsibility"
|
||
pattern, whereby it can act as a "handler" for some functions,
|
||
while allowing others to remain unchanged. This is difficult to do
|
||
with ordinary Python objects, if the interface is to remain
|
||
extensible. For example, one must use ``__getattr__`` or
|
||
``__getattribute__`` overrides, to ensure that extensions (such as
|
||
attributes defined by future WSGI versions) are passed through.
|
||
|
||
This type of code is notoriously difficult to get 100% correct, and
|
||
few people will want to write it themselves. They will therefore
|
||
copy other people's implementations, but fail to update them when
|
||
the person they copied from corrects yet another corner case.
|
||
|
||
Further, this necessary boilerplate would be pure excise, a
|
||
developer tax paid by middleware developers to support a slightly
|
||
prettier API for application framework developers. But,
|
||
application framework developers will typically only be updating
|
||
*one* framework to support WSGI, and in a very limited part of
|
||
their framework as a whole. It will likely be their first (and
|
||
maybe their only) WSGI implementation, and thus they will likely
|
||
implement with this specification ready to hand. Thus, the effort
|
||
of making the API "prettier" with object attributes and suchlike
|
||
would likely be wasted for this audience.
|
||
|
||
We encourage those who want a prettier (or otherwise improved) WSGI
|
||
interface for use in direct web application programming (as opposed
|
||
to web framework development) to develop APIs or frameworks that
|
||
wrap WSGI for convenient use by application developers. In this
|
||
way, WSGI can remain conveniently low-level for server and
|
||
middleware authors, while not being "ugly" for application
|
||
developers.
|
||
|
||
|
||
Open Issues
|
||
===========
|
||
|
||
* Some persons have requested information about whether the
|
||
``HTTP_AUTHENTICATION`` header may be provided by the server.
|
||
That is, some web servers do not supply this information to
|
||
e.g. CGI applications, and they would like the application
|
||
to know that this is the case so it can use alternative
|
||
means of authentication.
|
||
|
||
* Error handling: strategies for effective error handling are
|
||
currently in discussion on the Web-SIG mailing list. In
|
||
particular, a mechanism for specifying what errors an
|
||
application or middleware should *not* trap (because they
|
||
indicate that the request should be aborted), and mechanisms
|
||
for servers, gateways, and middleware to handle exceptions
|
||
occurring at various phases of the response processing.
|
||
|
||
* Byte strings: future versions of Python may replace today's
|
||
8-bit strings with some kind of "byte array" type. Some sort
|
||
of future-proofing would be good to have, and strategies for
|
||
this should be discussed on Web-SIG and Python-Dev. Nearly
|
||
every string in WSGI is potentially affected by this, although
|
||
some contexts should perhaps continue to allow strings as long as
|
||
they're pure ASCII.
|
||
|
||
|
||
Acknowledgements
|
||
================
|
||
|
||
Thanks go to the many folks on the Web-SIG mailing list whose
|
||
thoughtful feedback made this revised draft possible. Especially:
|
||
|
||
* Gregory "Grisha" Trubetskoy, author of ``mod_python``, who beat up
|
||
on the first draft as not offering any advantages over "plain old
|
||
CGI", thus encouraging me to look for a better approach.
|
||
|
||
* Ian Bicking, who helped nag me into properly specifying the
|
||
multithreading and multiprocess options, as well as badgering me to
|
||
provide a mechanism for servers to supply custom extension data to
|
||
an application.
|
||
|
||
* Tony Lownds, who came up with the concept of a ``start_response``
|
||
function that took the status and headers, returning a ``write``
|
||
function.
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. [1] The Python Wiki "Web Programming" topic
|
||
(http://www.python.org/cgi-bin/moinmoin/WebProgramming)
|
||
|
||
.. [2] The Common Gateway Interface Specification, v 1.1, 3rd Draft
|
||
(http://cgi-spec.golux.com/draft-coar-cgi-v11-03.txt)
|
||
|
||
.. [3] Hypertext Transfer Protocol -- HTTP/1.1, section 3.6.1
|
||
(http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1)
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
End:
|