2010-09-15 18:40:38 -04:00
|
|
|
|
PEP: 444
|
|
|
|
|
Title: Python Web3 Interface
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
|
|
|
|
Author: Chris McDonough <chrism@plope.com>,
|
|
|
|
|
Armin Ronacher <armin.ronacher@active-4.com>
|
|
|
|
|
Discussions-To: Python Web-SIG <web-sig@python.org>
|
2013-05-18 03:50:40 -04:00
|
|
|
|
Status: Deferred
|
2010-09-15 18:40:38 -04:00
|
|
|
|
Type: Informational
|
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
|
Created: 19-Jul-2010
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
========
|
|
|
|
|
|
|
|
|
|
This document specifies a proposed second-generation standard
|
|
|
|
|
interface between web servers and Python web applications or
|
|
|
|
|
frameworks.
|
|
|
|
|
|
2013-05-18 03:50:40 -04:00
|
|
|
|
PEP Deferral
|
|
|
|
|
============
|
|
|
|
|
|
|
|
|
|
Further exploration of the concepts covered in this PEP has been deferred
|
|
|
|
|
for lack of a current champion interested in promoting the goals of the PEP
|
|
|
|
|
and collecting and incorporating feedback, and with sufficient available
|
|
|
|
|
time to do so effectively.
|
|
|
|
|
|
|
|
|
|
Note that since this PEP was first created, PEP 3333 was created as a more
|
|
|
|
|
incremental update that permitted use of WSGI on Python 3.2+. However, an
|
|
|
|
|
alternative specification that furthers the Python 3 goals of a cleaner
|
|
|
|
|
separation of binary and text data may still be valuable.
|
2010-09-15 18:40:38 -04:00
|
|
|
|
|
|
|
|
|
Rationale and Goals
|
|
|
|
|
===================
|
|
|
|
|
|
|
|
|
|
This protocol and specification is influenced heavily by the Web
|
|
|
|
|
Services Gateway Interface (WSGI) 1.0 standard described in PEP 333
|
2015-02-14 12:47:39 -05:00
|
|
|
|
[1]_. The high-level rationale for having any standard that allows
|
2010-09-15 18:40:38 -04:00
|
|
|
|
Python-based web servers and applications to interoperate is outlined
|
|
|
|
|
in PEP 333. This document essentially uses PEP 333 as a template, and
|
|
|
|
|
changes its wording in various places for the purpose of forming a
|
|
|
|
|
different standard.
|
|
|
|
|
|
|
|
|
|
Python currently boasts a wide variety of web application frameworks
|
|
|
|
|
which use the WSGI 1.0 protocol. However, due to changes in the
|
|
|
|
|
language, the WSGI 1.0 protocol is not compatible with Python 3. This
|
|
|
|
|
specification describes a standardized WSGI-like protocol that lets
|
|
|
|
|
Python 2.6, 2.7 and 3.1+ applications communicate with web servers.
|
|
|
|
|
Web3 is clearly a WSGI derivative; it only uses a different name than
|
|
|
|
|
"WSGI" in order to indicate that it is not in any way backwards
|
|
|
|
|
compatible.
|
|
|
|
|
|
|
|
|
|
Applications and servers which are written to this specification are
|
|
|
|
|
meant to work properly under Python 2.6.X, Python 2.7.X and Python
|
|
|
|
|
3.1+. Neither an application nor a server that implements the Web3
|
|
|
|
|
specification can be easily written which will work under Python 2
|
|
|
|
|
versions earlier than 2.6 nor Python 3 versions earlier than 3.1.
|
|
|
|
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
|
|
|
|
Whatever Python 3 version fixed http://bugs.python.org/issue4006 so
|
|
|
|
|
``os.environ['foo']`` returns surrogates (ala PEP 383) when the
|
|
|
|
|
value of 'foo' cannot be decoded using the current locale instead
|
|
|
|
|
of failing with a KeyError is the *true* minimum Python 3 version.
|
|
|
|
|
In particular, however, Python 3.0 is not supported.
|
|
|
|
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
|
|
|
|
Python 2.6 is the first Python version that supported an alias for
|
|
|
|
|
``bytes`` and the ``b"foo"`` literal syntax. This is why it is the
|
|
|
|
|
minimum version supported by Web3.
|
|
|
|
|
|
|
|
|
|
Explicability and documentability are the main technical drivers for
|
|
|
|
|
the decisions made within the standard.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Differences from WSGI
|
|
|
|
|
=====================
|
|
|
|
|
|
|
|
|
|
- All protocol-specific environment names are prefixed with ``web3.``
|
|
|
|
|
rather than ``wsgi.``, eg. ``web3.input`` rather than
|
|
|
|
|
``wsgi.input``.
|
|
|
|
|
|
|
|
|
|
- All values present as environment dictionary *values* are explicitly
|
|
|
|
|
*bytes* instances instead of native strings. (Environment *keys*
|
|
|
|
|
however are native strings, always ``str`` regardless of
|
|
|
|
|
platform).
|
|
|
|
|
|
|
|
|
|
- All values returned by an application must be bytes instances,
|
|
|
|
|
including status code, header names and values, and the body.
|
|
|
|
|
|
|
|
|
|
- Wherever WSGI 1.0 referred to an ``app_iter``, this specification
|
|
|
|
|
refers to a ``body``.
|
|
|
|
|
|
|
|
|
|
- No ``start_response()`` callback (and therefore no ``write()``
|
|
|
|
|
callable nor ``exc_info`` data).
|
|
|
|
|
|
|
|
|
|
- The ``readline()`` function of ``web3.input`` must support a size
|
|
|
|
|
hint parameter.
|
|
|
|
|
|
|
|
|
|
- The ``read()`` function of ``web3.input`` must be length delimited.
|
|
|
|
|
A call without a size argument must not read more than the content
|
|
|
|
|
length header specifies. In case a content length header is absent
|
|
|
|
|
the stream must not return anything on read. It must never request
|
|
|
|
|
more data than specified from the client.
|
|
|
|
|
|
|
|
|
|
- No requirement for middleware to yield an empty string if it needs
|
|
|
|
|
more information from an application to produce output (e.g. no
|
|
|
|
|
"Middleware Handling of Block Boundaries").
|
|
|
|
|
|
|
|
|
|
- Filelike objects passed to a "file_wrapper" must have an
|
|
|
|
|
``__iter__`` which returns bytes (never text).
|
|
|
|
|
|
|
|
|
|
- ``wsgi.file_wrapper`` is not supported.
|
|
|
|
|
|
|
|
|
|
- ``QUERY_STRING``, ``SCRIPT_NAME``, ``PATH_INFO`` values required to
|
|
|
|
|
be placed in environ by server (each as the empty bytes instance if
|
|
|
|
|
no associated value is received in the HTTP request).
|
|
|
|
|
|
|
|
|
|
- ``web3.path_info`` and ``web3.script_name`` should be put into the
|
|
|
|
|
Web3 environment, if possible, by the origin Web3 server. When
|
|
|
|
|
available, each is the original, plain 7-bit ASCII, URL-encoded
|
|
|
|
|
variant of its CGI equivalent derived directly from the request URI
|
|
|
|
|
(with %2F segment markers and other meta-characters intact). If the
|
|
|
|
|
server cannot provide one (or both) of these values, it must omit
|
|
|
|
|
the value(s) it cannot provide from the environment.
|
|
|
|
|
|
|
|
|
|
- This requirement was removed: "middleware components **must not**
|
|
|
|
|
block iteration waiting for multiple values from an application
|
|
|
|
|
iterable. If the middleware needs to accumulate more data from the
|
|
|
|
|
application before it can produce any output, it **must** yield an
|
|
|
|
|
empty string."
|
|
|
|
|
|
|
|
|
|
- ``SERVER_PORT`` must be a bytes instance (not an integer).
|
|
|
|
|
|
|
|
|
|
- The server must not inject an additional ``Content-Length`` header
|
|
|
|
|
by guessing the length from the response iterable. This must be set
|
|
|
|
|
by the application itself in all situations.
|
|
|
|
|
|
|
|
|
|
- If the origin server advertises that it has the ``web3.async``
|
|
|
|
|
capability, a Web3 application callable used by the server is
|
|
|
|
|
permitted to return a callable that accepts no arguments. When it
|
|
|
|
|
does so, this callable is to be called periodically by the origin
|
|
|
|
|
server until it returns a non-``None`` response, which must be a
|
|
|
|
|
normal Web3 response tuple.
|
|
|
|
|
|
|
|
|
|
.. XXX (chrism) Needs a section of its own for explanation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Specification Overview
|
|
|
|
|
======================
|
|
|
|
|
|
|
|
|
|
The Web3 interface has two sides: the "server" or "gateway" side, and
|
|
|
|
|
the "application" or "framework" side. The server side invokes a
|
|
|
|
|
callable object that is provided by the application side. The
|
|
|
|
|
specifics of how that object is provided are up to the server or
|
|
|
|
|
gateway. It is assumed that some servers or gateways will require an
|
|
|
|
|
application's deployer to write a short script to create an instance
|
|
|
|
|
of the server or gateway, and supply it with the application object.
|
|
|
|
|
Other servers and gateways may use configuration files or other
|
|
|
|
|
mechanisms to specify where an application object should be imported
|
|
|
|
|
from, or otherwise obtained.
|
|
|
|
|
|
|
|
|
|
In addition to "pure" servers/gateways and applications/frameworks, it
|
|
|
|
|
is also possible to create "middleware" components that implement both
|
|
|
|
|
sides of this specification. Such components act as an application to
|
|
|
|
|
their containing server, and as a server to a contained application,
|
|
|
|
|
and can be used to provide extended APIs, content transformation,
|
|
|
|
|
navigation, and other useful functions.
|
|
|
|
|
|
|
|
|
|
Throughout this specification, we will use the term "application
|
|
|
|
|
callable" to mean "a function, a method, or an instance with a
|
|
|
|
|
``__call__`` method". It is up to the server, gateway, or application
|
|
|
|
|
implementing the application callable to choose the appropriate
|
|
|
|
|
implementation technique for their needs. Conversely, a server,
|
|
|
|
|
gateway, or application that is invoking a callable **must not** have
|
|
|
|
|
any dependency on what kind of callable was provided to it.
|
|
|
|
|
Application callables are only to be called, not introspected upon.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The Application/Framework Side
|
|
|
|
|
------------------------------
|
|
|
|
|
|
|
|
|
|
The application object is simply a callable object that accepts one
|
|
|
|
|
argument. The term "object" should not be misconstrued as requiring
|
|
|
|
|
an actual object instance: a function, method, or instance with a
|
|
|
|
|
``__call__`` method are all acceptable for use as an application
|
|
|
|
|
object. Application objects must be able to be invoked more than
|
|
|
|
|
once, as virtually all servers/gateways (other than CGI) will make
|
2010-11-15 21:06:30 -05:00
|
|
|
|
such repeated requests. If this cannot be guaranteed by the
|
2010-09-15 18:40:38 -04:00
|
|
|
|
implementation of the actual application, it has to be wrapped in a
|
|
|
|
|
function that creates a new instance on each call.
|
|
|
|
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
|
|
|
|
Although we refer to it as an "application" object, this should not
|
|
|
|
|
be construed to mean that application developers will use Web3 as a
|
|
|
|
|
web programming API. It is assumed that application developers
|
|
|
|
|
will continue to use existing, high-level framework services to
|
|
|
|
|
develop their applications. Web3 is a tool for framework and
|
|
|
|
|
server developers, and is not intended to directly support
|
|
|
|
|
application developers.)
|
|
|
|
|
|
|
|
|
|
An example of an application which is a function (``simple_app``)::
|
|
|
|
|
|
|
|
|
|
def simple_app(environ):
|
|
|
|
|
"""Simplest possible application object"""
|
|
|
|
|
status = b'200 OK'
|
|
|
|
|
headers = [(b'Content-type', b'text/plain')]
|
|
|
|
|
body = [b'Hello world!\n']
|
|
|
|
|
return body, status, headers
|
|
|
|
|
|
|
|
|
|
An example of an application which is an instance (``simple_app``)::
|
|
|
|
|
|
|
|
|
|
class AppClass(object):
|
|
|
|
|
|
|
|
|
|
"""Produce the same output, but using an instance. An
|
|
|
|
|
instance of this class must be instantiated before it is
|
|
|
|
|
passed to the server. """
|
|
|
|
|
|
|
|
|
|
def __call__(self, environ):
|
|
|
|
|
status = b'200 OK'
|
|
|
|
|
headers = [(b'Content-type', b'text/plain')]
|
|
|
|
|
body = [b'Hello world!\n']
|
|
|
|
|
return body, status, headers
|
|
|
|
|
|
|
|
|
|
simple_app = AppClass()
|
|
|
|
|
|
|
|
|
|
Alternately, an application callable may return a callable instead of
|
|
|
|
|
the tuple if the server supports asynchronous execution. See
|
|
|
|
|
information concerning ``web3.async`` for more information.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The Server/Gateway Side
|
|
|
|
|
-----------------------
|
|
|
|
|
|
|
|
|
|
The server or gateway invokes the application callable once for each
|
|
|
|
|
request it receives from an HTTP client, that is directed at the
|
|
|
|
|
application. To illustrate, here is a simple CGI gateway, implemented
|
|
|
|
|
as a function taking an application object. Note that this simple
|
|
|
|
|
example has limited error handling, because by default an uncaught
|
|
|
|
|
exception will be dumped to ``sys.stderr`` and logged by the web
|
|
|
|
|
server.
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
|
|
|
|
import locale
|
|
|
|
|
import os
|
|
|
|
|
import sys
|
|
|
|
|
|
|
|
|
|
encoding = locale.getpreferredencoding()
|
|
|
|
|
|
|
|
|
|
stdout = sys.stdout
|
|
|
|
|
|
|
|
|
|
if hasattr(sys.stdout, 'buffer'):
|
|
|
|
|
# Python 3 compatibility; we need to be able to push bytes out
|
|
|
|
|
stdout = sys.stdout.buffer
|
|
|
|
|
|
|
|
|
|
def get_environ():
|
|
|
|
|
d = {}
|
|
|
|
|
for k, v in os.environ.items():
|
|
|
|
|
# Python 3 compatibility
|
|
|
|
|
if not isinstance(v, bytes):
|
|
|
|
|
# We must explicitly encode the string to bytes under
|
|
|
|
|
# Python 3.1+
|
|
|
|
|
v = v.encode(encoding, 'surrogateescape')
|
|
|
|
|
d[k] = v
|
|
|
|
|
return d
|
|
|
|
|
|
|
|
|
|
def run_with_cgi(application):
|
|
|
|
|
|
|
|
|
|
environ = get_environ()
|
|
|
|
|
environ['web3.input'] = sys.stdin
|
|
|
|
|
environ['web3.errors'] = sys.stderr
|
|
|
|
|
environ['web3.version'] = (1, 0)
|
|
|
|
|
environ['web3.multithread'] = False
|
|
|
|
|
environ['web3.multiprocess'] = True
|
|
|
|
|
environ['web3.run_once'] = True
|
|
|
|
|
environ['web3.async'] = False
|
|
|
|
|
|
|
|
|
|
if environ.get('HTTPS', b'off') in (b'on', b'1'):
|
|
|
|
|
environ['web3.url_scheme'] = b'https'
|
|
|
|
|
else:
|
|
|
|
|
environ['web3.url_scheme'] = b'http'
|
|
|
|
|
|
|
|
|
|
rv = application(environ)
|
|
|
|
|
if hasattr(rv, '__call__'):
|
|
|
|
|
raise TypeError('This webserver does not support asynchronous '
|
|
|
|
|
'responses.')
|
|
|
|
|
body, status, headers = rv
|
|
|
|
|
|
|
|
|
|
CLRF = b'\r\n'
|
|
|
|
|
|
|
|
|
|
try:
|
|
|
|
|
stdout.write(b'Status: ' + status + CRLF)
|
|
|
|
|
for header_name, header_val in headers:
|
|
|
|
|
stdout.write(header_name + b': ' + header_val + CRLF)
|
|
|
|
|
stdout.write(CRLF)
|
|
|
|
|
for chunk in body:
|
|
|
|
|
stdout.write(chunk)
|
|
|
|
|
stdout.flush()
|
|
|
|
|
finally:
|
|
|
|
|
if hasattr(body, 'close'):
|
|
|
|
|
body.close()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Middleware: Components that Play Both Sides
|
|
|
|
|
-------------------------------------------
|
|
|
|
|
|
|
|
|
|
A single object may play the role of a server with respect to some
|
|
|
|
|
application(s), while also acting as an application with respect to
|
|
|
|
|
some server(s). Such "middleware" components can perform such
|
|
|
|
|
functions as:
|
|
|
|
|
|
|
|
|
|
* Routing a request to different application objects based on the
|
|
|
|
|
target URL, after rewriting the ``environ`` accordingly.
|
|
|
|
|
|
|
|
|
|
* Allowing multiple applications or frameworks to run side-by-side in
|
|
|
|
|
the same process.
|
|
|
|
|
|
|
|
|
|
* Load balancing and remote processing, by forwarding requests and
|
|
|
|
|
responses over a network.
|
|
|
|
|
|
|
|
|
|
* Perform content postprocessing, such as applying XSL stylesheets.
|
|
|
|
|
|
|
|
|
|
The presence of middleware in general is transparent to both the
|
|
|
|
|
"server/gateway" and the "application/framework" sides of the
|
|
|
|
|
interface, and should require no special support. A user who desires
|
|
|
|
|
to incorporate middleware into an application simply provides the
|
|
|
|
|
middleware component to the server, as if it were an application, and
|
|
|
|
|
configures the middleware component to invoke the application, as if
|
|
|
|
|
the middleware component were a server. Of course, the "application"
|
|
|
|
|
that the middleware wraps may in fact be another middleware component
|
|
|
|
|
wrapping another application, and so on, creating what is referred to
|
|
|
|
|
as a "middleware stack".
|
|
|
|
|
|
2019-06-25 00:58:50 -04:00
|
|
|
|
A middleware must support asynchronous execution if possible or fall
|
2010-09-15 18:40:38 -04:00
|
|
|
|
back to disabling itself.
|
|
|
|
|
|
|
|
|
|
Here a middleware that changes the ``HTTP_HOST`` key if an ``X-Host``
|
|
|
|
|
header exists and adds a comment to all html responses::
|
|
|
|
|
|
|
|
|
|
import time
|
|
|
|
|
|
|
|
|
|
def apply_filter(app, environ, filter_func):
|
|
|
|
|
"""Helper function that passes the return value from an
|
|
|
|
|
application to a filter function when the results are
|
|
|
|
|
ready.
|
|
|
|
|
"""
|
|
|
|
|
app_response = app(environ)
|
|
|
|
|
|
|
|
|
|
# synchronous response, filter now
|
|
|
|
|
if not hasattr(app_response, '__call__'):
|
|
|
|
|
return filter_func(*app_response)
|
|
|
|
|
|
2019-06-25 00:58:50 -04:00
|
|
|
|
# asynchronous response. filter when results are ready
|
2010-09-15 18:40:38 -04:00
|
|
|
|
def polling_function():
|
|
|
|
|
rv = app_response()
|
|
|
|
|
if rv is not None:
|
|
|
|
|
return filter_func(*rv)
|
|
|
|
|
return polling_function
|
|
|
|
|
|
|
|
|
|
def proxy_and_timing_support(app):
|
|
|
|
|
def new_application(environ):
|
|
|
|
|
def filter_func(body, status, headers):
|
|
|
|
|
now = time.time()
|
|
|
|
|
for key, value in headers:
|
|
|
|
|
if key.lower() == b'content-type' and \
|
|
|
|
|
value.split(b';')[0] == b'text/html':
|
|
|
|
|
# assumes ascii compatible encoding in body,
|
|
|
|
|
# but the middleware should actually parse the
|
|
|
|
|
# content type header and figure out the
|
|
|
|
|
# encoding when doing that.
|
|
|
|
|
body += ('<!-- Execution time: %.2fsec -->' %
|
|
|
|
|
(now - then)).encode('ascii')
|
|
|
|
|
break
|
|
|
|
|
return body, status, headers
|
|
|
|
|
then = time.time()
|
|
|
|
|
host = environ.get('HTTP_X_HOST')
|
|
|
|
|
if host is not None:
|
|
|
|
|
environ['HTTP_HOST'] = host
|
|
|
|
|
|
|
|
|
|
# use the apply_filter function that applies a given filter
|
|
|
|
|
# function for both async and sync responses.
|
|
|
|
|
return apply_filter(app, environ, filter_func)
|
|
|
|
|
return new_application
|
|
|
|
|
|
|
|
|
|
app = proxy_and_timing_support(app)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Specification Details
|
|
|
|
|
=====================
|
|
|
|
|
|
|
|
|
|
The application callable must accept one positional argument. For the
|
|
|
|
|
sake of illustration, we have named it ``environ``, but it is not
|
|
|
|
|
required to have this name. A server or gateway **must** invoke the
|
|
|
|
|
application object using a positional (not keyword) argument.
|
2010-09-16 06:50:07 -04:00
|
|
|
|
(E.g. by calling ``body, status, headers = application(environ)`` as
|
2010-09-15 18:40:38 -04:00
|
|
|
|
shown above.)
|
|
|
|
|
|
|
|
|
|
The ``environ`` parameter is a dictionary object, containing CGI-style
|
|
|
|
|
environment variables. This object **must** be a builtin Python
|
|
|
|
|
dictionary (*not* a subclass, ``UserDict`` or other dictionary
|
|
|
|
|
emulation), and the application is allowed to modify the dictionary in
|
|
|
|
|
any way it desires. The dictionary must also include certain
|
|
|
|
|
Web3-required variables (described in a later section), and may also
|
|
|
|
|
include server-specific extension variables, named according to a
|
|
|
|
|
convention that will be described below.
|
|
|
|
|
|
|
|
|
|
When called by the server, the application object must return a tuple
|
|
|
|
|
yielding three elements: ``status``, ``headers`` and ``body``, or, if
|
|
|
|
|
supported by an async server, an argumentless callable which either
|
|
|
|
|
returns ``None`` or a tuple of those three elements.
|
|
|
|
|
|
|
|
|
|
The ``status`` element is a status in bytes of the form ``b'999
|
|
|
|
|
Message here'``.
|
|
|
|
|
|
|
|
|
|
``headers`` is a Python list of ``(header_name, header_value)`` pairs
|
|
|
|
|
describing the HTTP response header. The ``headers`` structure must
|
|
|
|
|
be a literal Python list; it must yield two-tuples. Both
|
|
|
|
|
``header_name`` and ``header_value`` must be bytes values.
|
|
|
|
|
|
|
|
|
|
The ``body`` is an iterable yielding zero or more bytes instances.
|
|
|
|
|
This can be accomplished in a variety of ways, such as by returning a
|
|
|
|
|
list containing bytes instances as ``body``, or by returning a
|
|
|
|
|
generator function as ``body`` that yields bytes instances, or by the
|
|
|
|
|
``body`` being an instance of a class which is iterable. Regardless
|
|
|
|
|
of how it is accomplished, the application object must always return a
|
|
|
|
|
``body`` iterable yielding zero or more bytes instances.
|
|
|
|
|
|
|
|
|
|
The server or gateway must transmit the yielded bytes to the client in
|
|
|
|
|
an unbuffered fashion, completing the transmission of each set of
|
|
|
|
|
bytes before requesting another one. (In other words, applications
|
|
|
|
|
**should** perform their own buffering. See the `Buffering and
|
|
|
|
|
Streaming`_ section below for more on how application output must be
|
|
|
|
|
handled.)
|
|
|
|
|
|
|
|
|
|
The server or gateway should treat the yielded bytes as binary byte
|
|
|
|
|
sequences: in particular, it should ensure that line endings are not
|
|
|
|
|
altered. The application is responsible for ensuring that the
|
|
|
|
|
string(s) to be written are in a format suitable for the client. (The
|
|
|
|
|
server or gateway **may** apply HTTP transfer encodings, or perform
|
|
|
|
|
other transformations for the purpose of implementing HTTP features
|
|
|
|
|
such as byte-range transmission. See `Other HTTP Features`_, below,
|
|
|
|
|
for more details.)
|
|
|
|
|
|
|
|
|
|
If the ``body`` iterable returned by the application has a ``close()``
|
|
|
|
|
method, the server or gateway **must** call that method upon
|
|
|
|
|
completion of the current request, whether the request was completed
|
|
|
|
|
normally, or terminated early due to an error. This is to support
|
|
|
|
|
resource release by the application amd is intended to complement PEP
|
|
|
|
|
325's generator support, and other common iterables with ``close()``
|
|
|
|
|
methods.
|
|
|
|
|
|
|
|
|
|
Finally, servers and gateways **must not** directly use any other
|
|
|
|
|
attributes of the ``body`` iterable returned by the application.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
``environ`` Variables
|
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
|
|
The ``environ`` dictionary is required to contain various CGI
|
|
|
|
|
environment variables, as defined by the Common Gateway Interface
|
|
|
|
|
specification [2]_.
|
|
|
|
|
|
|
|
|
|
The following CGI variables **must** be present. Each key is a native
|
|
|
|
|
string. Each value is a bytes instance.
|
|
|
|
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
|
|
|
|
In Python 3.1+, a "native string" is a ``str`` type decoded using
|
|
|
|
|
the ``surrogateescape`` error handler, as done by
|
|
|
|
|
``os.environ.__getitem__``. In Python 2.6 and 2.7, a "native
|
|
|
|
|
string" is a ``str`` types representing a set of bytes.
|
|
|
|
|
|
|
|
|
|
``REQUEST_METHOD``
|
|
|
|
|
The HTTP request method, such as ``"GET"`` or ``"POST"``.
|
|
|
|
|
|
|
|
|
|
``SCRIPT_NAME``
|
|
|
|
|
The initial portion of the request URL's "path" that corresponds to
|
|
|
|
|
the application object, so that the application knows its virtual
|
|
|
|
|
"location". This may be the empty bytes instance if the application
|
|
|
|
|
corresponds to the "root" of the server. SCRIPT_NAME will be a
|
|
|
|
|
bytes instance representing a sequence of URL-encoded segments
|
|
|
|
|
separated by the slash character (``/``). It is assumed that
|
|
|
|
|
``%2F`` characters will be decoded into literal slash characters
|
2015-02-14 12:47:39 -05:00
|
|
|
|
within ``PATH_INFO``, as per CGI.
|
2010-09-15 18:40:38 -04:00
|
|
|
|
|
|
|
|
|
``PATH_INFO``
|
|
|
|
|
The remainder of the request URL's "path", designating the virtual
|
|
|
|
|
"location" of the request's target within the application. This
|
|
|
|
|
**may** be a bytes instance if the request URL targets the
|
|
|
|
|
application root and does not have a trailing slash. PATH_INFO will
|
|
|
|
|
be a bytes instance representing a sequence of URL-encoded segments
|
|
|
|
|
separated by the slash character (``/``). It is assumed that
|
|
|
|
|
``%2F`` characters will be decoded into literal slash characters
|
2015-02-14 12:47:39 -05:00
|
|
|
|
within ``PATH_INFO``, as per CGI.
|
2010-09-15 18:40:38 -04:00
|
|
|
|
|
|
|
|
|
``QUERY_STRING``
|
|
|
|
|
The portion of the request URL (in bytes) that follows the ``"?"``,
|
|
|
|
|
if any, or the empty bytes instance.
|
|
|
|
|
|
|
|
|
|
``SERVER_NAME``, ``SERVER_PORT``
|
|
|
|
|
When combined with ``SCRIPT_NAME`` and ``PATH_INFO`` (or their raw
|
|
|
|
|
equivalents)`, these variables can be used to complete the URL.
|
|
|
|
|
Note, however, that ``HTTP_HOST``, if present, should be used in
|
|
|
|
|
preference to ``SERVER_NAME`` for reconstructing the request URL.
|
|
|
|
|
See the `URL Reconstruction`_ section below for more detail.
|
|
|
|
|
``SERVER_PORT`` should be a bytes instance, not an integer.
|
|
|
|
|
|
|
|
|
|
``SERVER_PROTOCOL``
|
|
|
|
|
The version of the protocol the client used to send the request.
|
|
|
|
|
Typically this will be something like ``"HTTP/1.0"`` or
|
|
|
|
|
``"HTTP/1.1"`` and may be used by the application to determine how
|
|
|
|
|
to treat any HTTP request headers. (This variable should probably
|
|
|
|
|
be called ``REQUEST_PROTOCOL``, since it denotes the protocol used
|
|
|
|
|
in the request, and is not necessarily the protocol that will be
|
|
|
|
|
used in the server's response. However, for compatibility with CGI
|
|
|
|
|
we have to keep the existing name.)
|
|
|
|
|
|
|
|
|
|
The following CGI values **may** present be in the Web3 environment.
|
|
|
|
|
Each key is a native string. Each value is a bytes instances.
|
|
|
|
|
|
|
|
|
|
``CONTENT_TYPE``
|
|
|
|
|
The contents of any ``Content-Type`` fields in the HTTP request.
|
|
|
|
|
|
|
|
|
|
``CONTENT_LENGTH``
|
|
|
|
|
The contents of any ``Content-Length`` fields in the HTTP request.
|
|
|
|
|
|
|
|
|
|
``HTTP_`` Variables
|
|
|
|
|
Variables corresponding to the client-supplied HTTP request headers
|
|
|
|
|
(i.e., variables whose names begin with ``"HTTP_"``). The presence
|
|
|
|
|
or absence of these variables should correspond with the presence or
|
|
|
|
|
absence of the appropriate HTTP header in the request.
|
|
|
|
|
|
|
|
|
|
A server or gateway **should** attempt to provide as many other CGI
|
|
|
|
|
variables as are applicable, each with a string for its key and a
|
|
|
|
|
bytes instance for its value. In addition, if SSL is in use, the
|
|
|
|
|
server or gateway **should** also provide as many of the Apache SSL
|
|
|
|
|
environment variables [5]_ as are applicable, such as ``HTTPS=on`` and
|
|
|
|
|
``SSL_PROTOCOL``. Note, however, that an application that uses any
|
|
|
|
|
CGI variables other than the ones listed above are necessarily
|
|
|
|
|
non-portable to web servers that do not support the relevant
|
|
|
|
|
extensions. (For example, web servers that do not publish files will
|
|
|
|
|
not be able to provide a meaningful ``DOCUMENT_ROOT`` or
|
|
|
|
|
``PATH_TRANSLATED``.)
|
|
|
|
|
|
|
|
|
|
A Web3-compliant server or gateway **should** document what variables
|
|
|
|
|
it provides, along with their definitions as appropriate.
|
|
|
|
|
Applications **should** check for the presence of any variables they
|
|
|
|
|
require, and have a fallback plan in the event such a variable is
|
|
|
|
|
absent.
|
|
|
|
|
|
|
|
|
|
Note that CGI variable *values* must be bytes instances, if they are
|
|
|
|
|
present at all. It is a violation of this specification for a CGI
|
|
|
|
|
variable's value to be of any type other than ``bytes``. On Python 2,
|
|
|
|
|
this means they will be of type ``str``. On Python 3, this means they
|
|
|
|
|
will be of type ``bytes``.
|
|
|
|
|
|
|
|
|
|
They *keys* of all CGI and non-CGI variables in the environ, however,
|
|
|
|
|
must be "native strings" (on both Python 2 and Python 3, they will be
|
|
|
|
|
of type ``str``).
|
|
|
|
|
|
|
|
|
|
In addition to the CGI-defined variables, the ``environ`` dictionary
|
|
|
|
|
**may** also contain arbitrary operating-system "environment
|
|
|
|
|
variables", and **must** contain the following Web3-defined variables.
|
|
|
|
|
|
|
|
|
|
===================== ===============================================
|
|
|
|
|
Variable Value
|
|
|
|
|
===================== ===============================================
|
|
|
|
|
``web3.version`` The tuple ``(1, 0)``, representing Web3
|
|
|
|
|
version 1.0.
|
|
|
|
|
|
|
|
|
|
``web3.url_scheme`` A bytes value representing the "scheme" portion of
|
|
|
|
|
the URL at which the application is being
|
|
|
|
|
invoked. Normally, this will have the value
|
|
|
|
|
``b"http"`` or ``b"https"``, as appropriate.
|
|
|
|
|
|
|
|
|
|
``web3.input`` An input stream (file-like object) from which bytes
|
|
|
|
|
constituting the HTTP request body can be read.
|
|
|
|
|
(The server or gateway may perform reads
|
|
|
|
|
on-demand as requested by the application, or
|
|
|
|
|
it may pre- read the client's request body and
|
|
|
|
|
buffer it in-memory or on disk, or use any
|
|
|
|
|
other technique for providing such an input
|
|
|
|
|
stream, according to its preference.)
|
|
|
|
|
|
|
|
|
|
``web3.errors`` An output stream (file-like object) to which error
|
|
|
|
|
output text can be written, for the purpose of
|
|
|
|
|
recording program or other errors in a
|
|
|
|
|
standardized and possibly centralized location.
|
|
|
|
|
This should be a "text mode" stream; i.e.,
|
|
|
|
|
applications should use ``"\n"`` as a line
|
|
|
|
|
ending, and assume that it will be converted to
|
|
|
|
|
the correct line ending by the server/gateway.
|
|
|
|
|
Applications may *not* send bytes to the
|
|
|
|
|
'write' method of this stream; they may only
|
|
|
|
|
send text.
|
|
|
|
|
|
|
|
|
|
For many servers, ``web3.errors`` will be the
|
|
|
|
|
server's main error log. Alternatively, this
|
|
|
|
|
may be ``sys.stderr``, or a log file of some
|
|
|
|
|
sort. The server's documentation should
|
|
|
|
|
include an explanation of how to configure this
|
|
|
|
|
or where to find the recorded output. A server
|
|
|
|
|
or gateway may supply different error streams
|
|
|
|
|
to different applications, if this is desired.
|
|
|
|
|
|
|
|
|
|
``web3.multithread`` This value should evaluate true if the
|
|
|
|
|
application object may be simultaneously
|
|
|
|
|
invoked by another thread in the same process,
|
|
|
|
|
and should evaluate false otherwise.
|
|
|
|
|
|
|
|
|
|
``web3.multiprocess`` This value should evaluate true if an
|
|
|
|
|
equivalent application object may be
|
|
|
|
|
simultaneously invoked by another process, and
|
|
|
|
|
should evaluate false otherwise.
|
|
|
|
|
|
|
|
|
|
``web3.run_once`` This value should evaluate true if the server
|
|
|
|
|
or gateway expects (but does not guarantee!)
|
|
|
|
|
that the application will only be invoked this
|
|
|
|
|
one time during the life of its containing
|
|
|
|
|
process. Normally, this will only be true for
|
|
|
|
|
a gateway based on CGI (or something similar).
|
|
|
|
|
|
|
|
|
|
``web3.script_name`` The non-URL-decoded ``SCRIPT_NAME`` value.
|
|
|
|
|
Through a historical inequity, by virtue of the
|
|
|
|
|
CGI specification, ``SCRIPT_NAME`` is present
|
|
|
|
|
within the environment as an already
|
|
|
|
|
URL-decoded string. This is the original
|
|
|
|
|
URL-encoded value derived from the request URI.
|
|
|
|
|
If the server cannot provide this value, it
|
|
|
|
|
must omit it from the environ.
|
|
|
|
|
|
|
|
|
|
``web3.path_info`` The non-URL-decoded ``PATH_INFO`` value.
|
|
|
|
|
Through a historical inequity, by virtue of the
|
|
|
|
|
CGI specification, ``PATH_INFO`` is present
|
|
|
|
|
within the environment as an already
|
|
|
|
|
URL-decoded string. This is the original
|
|
|
|
|
URL-encoded value derived from the request URI.
|
|
|
|
|
If the server cannot provide this value, it
|
|
|
|
|
must omit it from the environ.
|
|
|
|
|
|
|
|
|
|
``web3.async`` This is ``True`` if the webserver supports
|
|
|
|
|
async invocation. In that case an application
|
|
|
|
|
is allowed to return a callable instead of a
|
|
|
|
|
tuple with the response. The exact semantics
|
|
|
|
|
are not specified by this specification.
|
|
|
|
|
|
|
|
|
|
===================== ===============================================
|
|
|
|
|
|
|
|
|
|
Finally, the ``environ`` dictionary may also contain server-defined
|
|
|
|
|
variables. These variables should have names which are native
|
|
|
|
|
strings, composed of only lower-case letters, numbers, dots, and
|
|
|
|
|
underscores, and should be prefixed with a name that is unique to the
|
|
|
|
|
defining server or gateway. For example, ``mod_web3`` might define
|
|
|
|
|
variables with names like ``mod_web3.some_variable``.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Input Stream
|
|
|
|
|
~~~~~~~~~~~~
|
|
|
|
|
|
|
|
|
|
The input stream (``web3.input``) provided by the server must support
|
|
|
|
|
the following methods:
|
|
|
|
|
|
|
|
|
|
===================== ========
|
|
|
|
|
Method Notes
|
|
|
|
|
===================== ========
|
|
|
|
|
``read(size)`` 1,4
|
|
|
|
|
``readline([size])`` 1,2,4
|
|
|
|
|
``readlines([size])`` 1,3,4
|
|
|
|
|
``__iter__()`` 4
|
|
|
|
|
===================== ========
|
|
|
|
|
|
|
|
|
|
The semantics of each method are as documented in the Python Library
|
|
|
|
|
Reference, except for these notes as listed in the table above:
|
|
|
|
|
|
|
|
|
|
1. The server is not required to read past the client's specified
|
|
|
|
|
``Content-Length``, and is allowed to simulate an end-of-file
|
|
|
|
|
condition if the application attempts to read past that point. The
|
|
|
|
|
application **should not** attempt to read more data than is
|
|
|
|
|
specified by the ``CONTENT_LENGTH`` variable.
|
|
|
|
|
|
|
|
|
|
2. The implementation must support the optional ``size`` argument to
|
|
|
|
|
``readline()``.
|
|
|
|
|
|
|
|
|
|
3. The application is free to not supply a ``size`` argument to
|
|
|
|
|
``readlines()``, and the server or gateway is free to ignore the
|
|
|
|
|
value of any supplied ``size`` argument.
|
|
|
|
|
|
|
|
|
|
4. The ``read``, ``readline`` and ``__iter__`` methods must return a
|
|
|
|
|
bytes instance. The ``readlines`` method must return a sequence
|
|
|
|
|
which contains instances of bytes.
|
|
|
|
|
|
|
|
|
|
The methods listed in the table above **must** be supported by all
|
|
|
|
|
servers conforming to this specification. Applications conforming to
|
|
|
|
|
this specification **must not** use any other methods or attributes of
|
|
|
|
|
the ``input`` object. In particular, applications **must not**
|
|
|
|
|
attempt to close this stream, even if it possesses a ``close()``
|
|
|
|
|
method.
|
|
|
|
|
|
|
|
|
|
The input stream should silently ignore attempts to read more than the
|
|
|
|
|
content length of the request. If no content length is specified the
|
|
|
|
|
stream must be a dummy stream that does not return anything.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Error Stream
|
|
|
|
|
~~~~~~~~~~~~
|
|
|
|
|
|
|
|
|
|
The error stream (``web3.errors``) provided by the server must support
|
|
|
|
|
the following methods:
|
|
|
|
|
|
|
|
|
|
=================== ========== ========
|
|
|
|
|
Method Stream Notes
|
|
|
|
|
=================== ========== ========
|
|
|
|
|
``flush()`` ``errors`` 1
|
|
|
|
|
``write(str)`` ``errors`` 2
|
|
|
|
|
``writelines(seq)`` ``errors`` 2
|
|
|
|
|
=================== ========== ========
|
|
|
|
|
|
|
|
|
|
The semantics of each method are as documented in the Python Library
|
|
|
|
|
Reference, except for these notes as listed in the table above:
|
|
|
|
|
|
|
|
|
|
1. Since the ``errors`` stream may not be rewound, servers and
|
|
|
|
|
gateways are free to forward write operations immediately, without
|
|
|
|
|
buffering. In this case, the ``flush()`` method may be a no-op.
|
|
|
|
|
Portable applications, however, cannot assume that output is
|
|
|
|
|
unbuffered or that ``flush()`` is a no-op. They must call
|
|
|
|
|
``flush()`` if they need to ensure that output has in fact been
|
|
|
|
|
written. (For example, to minimize intermingling of data from
|
|
|
|
|
multiple processes writing to the same error log.)
|
|
|
|
|
|
|
|
|
|
2. The ``write()`` method must accept a string argument, but needn't
|
|
|
|
|
necessarily accept a bytes argument. The ``writelines()`` method
|
|
|
|
|
must accept a sequence argument that consists entirely of strings,
|
|
|
|
|
but needn't necessarily accept any bytes instance as a member of
|
|
|
|
|
the sequence.
|
|
|
|
|
|
|
|
|
|
The methods listed in the table above **must** be supported by all
|
|
|
|
|
servers conforming to this specification. Applications conforming to
|
|
|
|
|
this specification **must not** use any other methods or attributes of
|
|
|
|
|
the ``errors`` object. In particular, applications **must not**
|
|
|
|
|
attempt to close this stream, even if it possesses a ``close()``
|
|
|
|
|
method.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Values Returned by A Web3 Application
|
|
|
|
|
-------------------------------------
|
|
|
|
|
|
2010-09-16 12:13:52 -04:00
|
|
|
|
Web3 applications return a tuple in the form (``status``, ``headers``,
|
|
|
|
|
``body``). If the server supports asynchronous applications
|
|
|
|
|
(``web3.async``), the response may be a callable object (which accepts no
|
|
|
|
|
arguments).
|
2010-09-15 18:40:38 -04:00
|
|
|
|
|
|
|
|
|
The ``status`` value is assumed by a gateway or server to be an HTTP
|
|
|
|
|
"status" bytes instance like ``b'200 OK'`` or ``b'404 Not Found'``.
|
|
|
|
|
That is, it is a string consisting of a Status-Code and a
|
|
|
|
|
Reason-Phrase, in that order and separated by a single space, with no
|
|
|
|
|
surrounding whitespace or other characters. (See RFC 2616, Section
|
|
|
|
|
6.1.1 for more information.) The string **must not** contain control
|
|
|
|
|
characters, and must not be terminated with a carriage return,
|
|
|
|
|
linefeed, or combination thereof.
|
|
|
|
|
|
|
|
|
|
The ``headers`` value is assumed by a gateway or server to be a
|
|
|
|
|
literal Python list of ``(header_name, header_value)`` tuples. Each
|
|
|
|
|
``header_name`` must be a bytes instance representing a valid HTTP
|
|
|
|
|
header field-name (as defined by RFC 2616, Section 4.2), without a
|
|
|
|
|
trailing colon or other punctuation. Each ``header_value`` must be a
|
|
|
|
|
bytes instance and **must not** include any control characters,
|
|
|
|
|
including carriage returns or linefeeds, either embedded or at the
|
|
|
|
|
end. (These requirements are to minimize the complexity of any
|
|
|
|
|
parsing that must be performed by servers, gateways, and intermediate
|
|
|
|
|
response processors that need to inspect or modify response headers.)
|
|
|
|
|
|
|
|
|
|
In general, the server or gateway is responsible for ensuring that
|
|
|
|
|
correct headers are sent to the client: if the application omits a
|
|
|
|
|
header required by HTTP (or other relevant specifications that are in
|
|
|
|
|
effect), the server or gateway **must** add it. For example, the HTTP
|
|
|
|
|
``Date:`` and ``Server:`` headers would normally be supplied by the
|
|
|
|
|
server or gateway. The gateway must however not override values with
|
|
|
|
|
the same name if they are emitted by the application.
|
|
|
|
|
|
|
|
|
|
(A reminder for server/gateway authors: HTTP header names are
|
|
|
|
|
case-insensitive, so be sure to take that into consideration when
|
|
|
|
|
examining application-supplied headers!)
|
|
|
|
|
|
|
|
|
|
Applications and middleware are forbidden from using HTTP/1.1
|
|
|
|
|
"hop-by-hop" features or headers, any equivalent features in HTTP/1.0,
|
|
|
|
|
or any headers that would affect the persistence of the client's
|
|
|
|
|
connection to the web server. These features are the exclusive
|
|
|
|
|
province of the actual web server, and a server or gateway **should**
|
|
|
|
|
consider it a fatal error for an application to attempt sending them,
|
|
|
|
|
and raise an error if they are supplied as return values from an
|
|
|
|
|
application in the ``headers`` structure. (For more specifics on
|
|
|
|
|
"hop-by-hop" features and headers, please see the `Other HTTP
|
|
|
|
|
Features`_ section below.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Dealing with Compatibility Across Python Versions
|
|
|
|
|
-------------------------------------------------
|
|
|
|
|
|
|
|
|
|
Creating Web3 code that runs under both Python 2.6/2.7 and Python 3.1+
|
|
|
|
|
requires some care on the part of the developer. In general, the Web3
|
|
|
|
|
specification assumes a certain level of equivalence between the
|
|
|
|
|
Python 2 ``str`` type and the Python 3 ``bytes`` type. For example,
|
|
|
|
|
under Python 2, the values present in the Web3 ``environ`` will be
|
|
|
|
|
instances of the ``str`` type; in Python 3, these will be instances of
|
|
|
|
|
the ``bytes`` type. The Python 3 ``bytes`` type does not possess all
|
|
|
|
|
the methods of the Python 2 ``str`` type, and some methods which it
|
|
|
|
|
does possess behave differently than the Python 2 ``str`` type.
|
|
|
|
|
Effectively, to ensure that Web3 middleware and applications work
|
|
|
|
|
across Python versions, developers must do these things:
|
|
|
|
|
|
|
|
|
|
#) Do not assume comparison equivalence between text values and bytes
|
|
|
|
|
values. If you do so, your code may work under Python 2, but it
|
|
|
|
|
will not work properly under Python 3. For example, don't write
|
|
|
|
|
``somebytes == 'abc'``. This will sometimes be true on Python 2
|
|
|
|
|
but it will never be true on Python 3, because a sequence of bytes
|
|
|
|
|
never compares equal to a string under Python 3. Instead, always
|
|
|
|
|
compare a bytes value with a bytes value, e.g. "somebytes ==
|
|
|
|
|
b'abc'". Code which does this is compatible with and works the
|
|
|
|
|
same in Python 2.6, 2.7, and 3.1. The ``b`` in front of ``'abc'``
|
|
|
|
|
signals to Python 3 that the value is a literal bytes instance;
|
|
|
|
|
under Python 2 it's a forward compatibility placebo.
|
|
|
|
|
|
|
|
|
|
#) Don't use the ``__contains__`` method (directly or indirectly) of
|
|
|
|
|
items that are meant to be byteslike without ensuring that its
|
|
|
|
|
argument is also a bytes instance. If you do so, your code may
|
|
|
|
|
work under Python 2, but it will not work properly under Python 3.
|
|
|
|
|
For example, ``'abc' in somebytes'`` will raise a ``TypeError``
|
|
|
|
|
under Python 3, but it will return ``True`` under Python 2.6 and
|
|
|
|
|
2.7. However, ``b'abc' in somebytes`` will work the same on both
|
|
|
|
|
versions. In Python 3.2, this restriction may be partially
|
|
|
|
|
removed, as it's rumored that bytes types may obtain a ``__mod__``
|
|
|
|
|
implementation.
|
|
|
|
|
|
|
|
|
|
#) ``__getitem__`` should not be used.
|
|
|
|
|
|
|
|
|
|
.. XXX
|
|
|
|
|
|
2016-07-11 11:14:08 -04:00
|
|
|
|
#) Don't try to use the ``format`` method or the ``__mod__`` method of
|
2010-09-15 18:40:38 -04:00
|
|
|
|
instances of bytes (directly or indirectly). In Python 2, the
|
|
|
|
|
``str`` type which we treat equivalently to Python 3's ``bytes``
|
|
|
|
|
supports these method but actual Python 3's ``bytes`` instances
|
|
|
|
|
don't support these methods. If you use these methods, your code
|
|
|
|
|
will work under Python 2, but not under Python 3.
|
|
|
|
|
|
|
|
|
|
#) Do not try to concatenate a bytes value with a string value. This
|
|
|
|
|
may work under Python 2, but it will not work under Python 3. For
|
|
|
|
|
example, doing ``'abc' + somebytes`` will work under Python 2, but
|
|
|
|
|
it will result in a ``TypeError`` under Python 3. Instead, always
|
|
|
|
|
make sure you're concatenating two items of the same type,
|
|
|
|
|
e.g. ``b'abc' + somebytes``.
|
|
|
|
|
|
|
|
|
|
Web3 expects byte values in other places, such as in all the values
|
|
|
|
|
returned by an application.
|
|
|
|
|
|
|
|
|
|
In short, to ensure compatibility of Web3 application code between
|
|
|
|
|
Python 2 and Python 3, in Python 2, treat CGI and server variable
|
|
|
|
|
values in the environment as if they had the Python 3 ``bytes`` API
|
|
|
|
|
even though they actually have a more capable API. Likewise for all
|
|
|
|
|
stringlike values returned by a Web3 application.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Buffering and Streaming
|
|
|
|
|
-----------------------
|
|
|
|
|
|
|
|
|
|
Generally speaking, applications will achieve the best throughput by
|
|
|
|
|
buffering their (modestly-sized) output and sending it all at once.
|
|
|
|
|
This is a common approach in existing frameworks: the output is
|
|
|
|
|
buffered in a StringIO or similar object, then transmitted all at
|
|
|
|
|
once, along with the response headers.
|
|
|
|
|
|
|
|
|
|
The corresponding approach in Web3 is for the application to simply
|
|
|
|
|
return a single-element ``body`` iterable (such as a list) containing
|
|
|
|
|
the response body as a single string. This is the recommended
|
|
|
|
|
approach for the vast majority of application functions, that render
|
|
|
|
|
HTML pages whose text easily fits in memory.
|
|
|
|
|
|
|
|
|
|
For large files, however, or for specialized uses of HTTP streaming
|
|
|
|
|
(such as multipart "server push"), an application may need to provide
|
|
|
|
|
output in smaller blocks (e.g. to avoid loading a large file into
|
|
|
|
|
memory). It's also sometimes the case that part of a response may be
|
|
|
|
|
time-consuming to produce, but it would be useful to send ahead the
|
|
|
|
|
portion of the response that precedes it.
|
|
|
|
|
|
|
|
|
|
In these cases, applications will usually return a ``body`` iterator
|
|
|
|
|
(often a generator-iterator) that produces the output in a
|
|
|
|
|
block-by-block fashion. These blocks may be broken to coincide with
|
|
|
|
|
mulitpart boundaries (for "server push"), or just before
|
|
|
|
|
time-consuming tasks (such as reading another block of an on-disk
|
|
|
|
|
file).
|
|
|
|
|
|
|
|
|
|
Web3 servers, gateways, and middleware **must not** delay the
|
|
|
|
|
transmission of any block; they **must** either fully transmit the
|
|
|
|
|
block to the client, or guarantee that they will continue transmission
|
|
|
|
|
even while the application is producing its next block. A
|
|
|
|
|
server/gateway or middleware may provide this guarantee in one of
|
|
|
|
|
three ways:
|
|
|
|
|
|
|
|
|
|
1. Send the entire block to the operating system (and request that any
|
|
|
|
|
O/S buffers be flushed) before returning control to the
|
|
|
|
|
application, OR
|
|
|
|
|
|
|
|
|
|
2. Use a different thread to ensure that the block continues to be
|
|
|
|
|
transmitted while the application produces the next block.
|
|
|
|
|
|
|
|
|
|
3. (Middleware only) send the entire block to its parent
|
|
|
|
|
gateway/server.
|
|
|
|
|
|
|
|
|
|
By providing this guarantee, Web3 allows applications to ensure that
|
|
|
|
|
transmission will not become stalled at an arbitrary point in their
|
|
|
|
|
output data. This is critical for proper functioning of
|
|
|
|
|
e.g. multipart "server push" streaming, where data between multipart
|
|
|
|
|
boundaries should be transmitted in full to the client.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Unicode Issues
|
|
|
|
|
--------------
|
|
|
|
|
|
|
|
|
|
HTTP does not directly support Unicode, and neither does this
|
|
|
|
|
interface. All encoding/decoding must be handled by the
|
|
|
|
|
**application**; all values passed to or from the server must be of
|
|
|
|
|
the Python 3 type ``bytes`` or instances of the Python 2 type ``str``,
|
|
|
|
|
not Python 2 ``unicode`` or Python 3 ``str`` objects.
|
|
|
|
|
|
|
|
|
|
All "bytes instances" referred to in this specification **must**:
|
|
|
|
|
|
|
|
|
|
- On Python 2, be of type ``str``.
|
|
|
|
|
|
|
|
|
|
- On Python 3, be of type ``bytes``.
|
|
|
|
|
|
|
|
|
|
All "bytes instances" **must not** :
|
|
|
|
|
|
|
|
|
|
- On Python 2, be of type ``unicode``.
|
|
|
|
|
|
|
|
|
|
- On Python 3, be of type ``str``.
|
|
|
|
|
|
|
|
|
|
The result of using a textlike object where a byteslike object is
|
|
|
|
|
required is undefined.
|
|
|
|
|
|
|
|
|
|
Values returned from a Web3 app as a status or as response headers
|
|
|
|
|
**must** follow RFC 2616 with respect to encoding. That is, the bytes
|
|
|
|
|
returned must contain a character stream of ISO-8859-1 characters, or
|
|
|
|
|
the character stream should use RFC 2047 MIME encoding.
|
|
|
|
|
|
|
|
|
|
On Python platforms which do not have a native bytes-like type
|
|
|
|
|
(e.g. IronPython, etc.), but instead which generally use textlike
|
|
|
|
|
strings to represent bytes data, the definition of "bytes instance"
|
|
|
|
|
can be changed: their "bytes instances" must be native strings that
|
|
|
|
|
contain only code points representable in ISO-8859-1 encoding
|
|
|
|
|
(``\u0000`` through ``\u00FF``, inclusive). It is a fatal error for
|
|
|
|
|
an application on such a platform to supply strings containing any
|
|
|
|
|
other Unicode character or code point. Similarly, servers and
|
|
|
|
|
gateways on those platforms **must not** supply strings to an
|
|
|
|
|
application containing any other Unicode characters.
|
|
|
|
|
|
|
|
|
|
.. XXX (armin: Jython now has a bytes type, we might remove this
|
|
|
|
|
section after seeing about IronPython)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
HTTP 1.1 Expect/Continue
|
|
|
|
|
------------------------
|
|
|
|
|
|
|
|
|
|
Servers and gateways that implement HTTP 1.1 **must** provide
|
|
|
|
|
transparent support for HTTP 1.1's "expect/continue" mechanism. This
|
|
|
|
|
may be done in any of several ways:
|
|
|
|
|
|
|
|
|
|
1. Respond to requests containing an ``Expect: 100-continue`` request
|
|
|
|
|
with an immediate "100 Continue" response, and proceed normally.
|
|
|
|
|
|
|
|
|
|
2. Proceed with the request normally, but provide the application with
|
|
|
|
|
a ``web3.input`` stream that will send the "100 Continue" response
|
|
|
|
|
if/when the application first attempts to read from the input
|
|
|
|
|
stream. The read request must then remain blocked until the client
|
|
|
|
|
responds.
|
|
|
|
|
|
|
|
|
|
3. Wait until the client decides that the server does not support
|
|
|
|
|
expect/continue, and sends the request body on its own. (This is
|
|
|
|
|
suboptimal, and is not recommended.)
|
|
|
|
|
|
|
|
|
|
Note that these behavior restrictions do not apply for HTTP 1.0
|
|
|
|
|
requests, or for requests that are not directed to an application
|
|
|
|
|
object. For more information on HTTP 1.1 Expect/Continue, see RFC
|
|
|
|
|
2616, sections 8.2.3 and 10.1.1.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Other HTTP Features
|
|
|
|
|
-------------------
|
|
|
|
|
|
|
|
|
|
In general, servers and gateways should "play dumb" and allow the
|
|
|
|
|
application complete control over its output. They should only make
|
|
|
|
|
changes that do not alter the effective semantics of the application's
|
|
|
|
|
response. It is always possible for the application developer to add
|
|
|
|
|
middleware components to supply additional features, so server/gateway
|
|
|
|
|
developers should be conservative in their implementation. In a
|
|
|
|
|
sense, a server should consider itself to be like an HTTP "gateway
|
|
|
|
|
server", with the application being an HTTP "origin server". (See RFC
|
|
|
|
|
2616, section 1.3, for the definition of these terms.)
|
|
|
|
|
|
|
|
|
|
However, because Web3 servers and applications do not communicate via
|
|
|
|
|
HTTP, what RFC 2616 calls "hop-by-hop" headers do not apply to Web3
|
|
|
|
|
internal communications. Web3 applications **must not** generate any
|
|
|
|
|
"hop-by-hop" headers [4]_, attempt to use HTTP features that would
|
|
|
|
|
require them to generate such headers, or rely on the content of any
|
|
|
|
|
incoming "hop-by-hop" headers in the ``environ`` dictionary. Web3
|
|
|
|
|
servers **must** handle any supported inbound "hop-by-hop" headers on
|
|
|
|
|
their own, such as by decoding any inbound ``Transfer-Encoding``,
|
|
|
|
|
including chunked encoding if applicable.
|
|
|
|
|
|
|
|
|
|
Applying these principles to a variety of HTTP features, it should be
|
|
|
|
|
clear that a server **may** handle cache validation via the
|
|
|
|
|
``If-None-Match`` and ``If-Modified-Since`` request headers and the
|
|
|
|
|
``Last-Modified`` and ``ETag`` response headers. However, it is not
|
|
|
|
|
required to do this, and the application **should** perform its own
|
|
|
|
|
cache validation if it wants to support that feature, since the
|
|
|
|
|
server/gateway is not required to do such validation.
|
|
|
|
|
|
|
|
|
|
Similarly, a server **may** re-encode or transport-encode an
|
|
|
|
|
application's response, but the application **should** use a suitable
|
|
|
|
|
content encoding on its own, and **must not** apply a transport
|
|
|
|
|
encoding. A server **may** transmit byte ranges of the application's
|
|
|
|
|
response if requested by the client, and the application doesn't
|
|
|
|
|
natively support byte ranges. Again, however, the application
|
|
|
|
|
**should** perform this function on its own if desired.
|
|
|
|
|
|
|
|
|
|
Note that these restrictions on applications do not necessarily mean
|
|
|
|
|
that every application must reimplement every HTTP feature; many HTTP
|
|
|
|
|
features can be partially or fully implemented by middleware
|
|
|
|
|
components, thus freeing both server and application authors from
|
|
|
|
|
implementing the same features over and over again.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Thread Support
|
|
|
|
|
--------------
|
|
|
|
|
|
|
|
|
|
Thread support, or lack thereof, is also server-dependent. Servers
|
|
|
|
|
that can run multiple requests in parallel, **should** also provide
|
|
|
|
|
the option of running an application in a single-threaded fashion, so
|
|
|
|
|
that applications or frameworks that are not thread-safe may still be
|
|
|
|
|
used with that server.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Implementation/Application Notes
|
|
|
|
|
================================
|
|
|
|
|
|
|
|
|
|
Server Extension APIs
|
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
|
|
Some server authors may wish to expose more advanced APIs, that
|
|
|
|
|
application or framework authors can use for specialized purposes.
|
|
|
|
|
For example, a gateway based on ``mod_python`` might wish to expose
|
|
|
|
|
part of the Apache API as a Web3 extension.
|
|
|
|
|
|
|
|
|
|
In the simplest case, this requires nothing more than defining an
|
|
|
|
|
``environ`` variable, such as ``mod_python.some_api``. But, in many
|
|
|
|
|
cases, the possible presence of middleware can make this difficult.
|
|
|
|
|
For example, an API that offers access to the same HTTP headers that
|
|
|
|
|
are found in ``environ`` variables, might return different data if
|
|
|
|
|
``environ`` has been modified by middleware.
|
|
|
|
|
|
|
|
|
|
In general, any extension API that duplicates, supplants, or bypasses
|
|
|
|
|
some portion of Web3 functionality runs the risk of being incompatible
|
|
|
|
|
with middleware components. Server/gateway developers should *not*
|
|
|
|
|
assume that nobody will use middleware, because some framework
|
|
|
|
|
developers specifically organize their frameworks to function almost
|
|
|
|
|
entirely as middleware of various kinds.
|
|
|
|
|
|
|
|
|
|
So, to provide maximum compatibility, servers and gateways that
|
|
|
|
|
provide extension APIs that replace some Web3 functionality, **must**
|
|
|
|
|
design those APIs so that they are invoked using the portion of the
|
|
|
|
|
API that they replace. For example, an extension API to access HTTP
|
|
|
|
|
request headers must require the application to pass in its current
|
|
|
|
|
``environ``, so that the server/gateway may verify that HTTP headers
|
|
|
|
|
accessible via the API have not been altered by middleware. If the
|
|
|
|
|
extension API cannot guarantee that it will always agree with
|
|
|
|
|
``environ`` about the contents of HTTP headers, it must refuse service
|
|
|
|
|
to the application, e.g. by raising an error, returning ``None``
|
|
|
|
|
instead of a header collection, or whatever is appropriate to the API.
|
|
|
|
|
|
|
|
|
|
These guidelines also apply to middleware that adds information such
|
|
|
|
|
as parsed cookies, form variables, sessions, and the like to
|
|
|
|
|
``environ``. Specifically, such middleware should provide these
|
|
|
|
|
features as functions which operate on ``environ``, rather than simply
|
|
|
|
|
stuffing values into ``environ``. This helps ensure that information
|
|
|
|
|
is calculated from ``environ`` *after* any middleware has done any URL
|
|
|
|
|
rewrites or other ``environ`` modifications.
|
|
|
|
|
|
|
|
|
|
It is very important that these "safe extension" rules be followed by
|
|
|
|
|
both server/gateway and middleware developers, in order to avoid a
|
|
|
|
|
future in which middleware developers are forced to delete any and all
|
|
|
|
|
extension APIs from ``environ`` to ensure that their mediation isn't
|
|
|
|
|
being bypassed by applications using those extensions!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Application Configuration
|
|
|
|
|
-------------------------
|
|
|
|
|
|
|
|
|
|
This specification does not define how a server selects or obtains an
|
|
|
|
|
application to invoke. These and other configuration options are
|
|
|
|
|
highly server-specific matters. It is expected that server/gateway
|
|
|
|
|
authors will document how to configure the server to execute a
|
|
|
|
|
particular application object, and with what options (such as
|
|
|
|
|
threading options).
|
|
|
|
|
|
|
|
|
|
Framework authors, on the other hand, should document how to create an
|
|
|
|
|
application object that wraps their framework's functionality. The
|
|
|
|
|
user, who has chosen both the server and the application framework,
|
|
|
|
|
must connect the two together. However, since both the framework and
|
|
|
|
|
the server have a common interface, this should be merely a mechanical
|
|
|
|
|
matter, rather than a significant engineering effort for each new
|
|
|
|
|
server/framework pair.
|
|
|
|
|
|
|
|
|
|
Finally, some applications, frameworks, and middleware may wish to use
|
|
|
|
|
the ``environ`` dictionary to receive simple string configuration
|
|
|
|
|
options. Servers and gateways **should** support this by allowing an
|
|
|
|
|
application's deployer to specify name-value pairs to be placed in
|
|
|
|
|
``environ``. In the simplest case, this support can consist merely of
|
|
|
|
|
copying all operating system-supplied environment variables from
|
|
|
|
|
``os.environ`` into the ``environ`` dictionary, since the deployer in
|
|
|
|
|
principle can configure these externally to the server, or in the CGI
|
|
|
|
|
case they may be able to be set via the server's configuration files.
|
|
|
|
|
|
|
|
|
|
Applications **should** try to keep such required variables to a
|
|
|
|
|
minimum, since not all servers will support easy configuration of
|
|
|
|
|
them. Of course, even in the worst case, persons deploying an
|
|
|
|
|
application can create a script to supply the necessary configuration
|
|
|
|
|
values::
|
|
|
|
|
|
|
|
|
|
from the_app import application
|
|
|
|
|
|
|
|
|
|
def new_app(environ):
|
|
|
|
|
environ['the_app.configval1'] = b'something'
|
|
|
|
|
return application(environ)
|
|
|
|
|
|
|
|
|
|
But, most existing applications and frameworks will probably only need
|
|
|
|
|
a single configuration value from ``environ``, to indicate the
|
|
|
|
|
location of their application or framework-specific configuration
|
|
|
|
|
file(s). (Of course, applications should cache such configuration, to
|
|
|
|
|
avoid having to re-read it upon each invocation.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
URL Reconstruction
|
|
|
|
|
------------------
|
|
|
|
|
|
|
|
|
|
If an application wishes to reconstruct a request's complete URL (as a
|
|
|
|
|
bytes object), it may do so using the following algorithm::
|
|
|
|
|
|
|
|
|
|
host = environ.get('HTTP_HOST')
|
|
|
|
|
|
|
|
|
|
scheme = environ['web3.url_scheme']
|
|
|
|
|
port = environ['SERVER_PORT']
|
|
|
|
|
query = environ['QUERY_STRING']
|
|
|
|
|
|
|
|
|
|
url = scheme + b'://'
|
|
|
|
|
|
|
|
|
|
if host:
|
|
|
|
|
url += host
|
|
|
|
|
else:
|
|
|
|
|
url += environ['SERVER_NAME']
|
|
|
|
|
|
|
|
|
|
if scheme == b'https':
|
|
|
|
|
if port != b'443':
|
|
|
|
|
url += b':' + port
|
|
|
|
|
else:
|
|
|
|
|
if port != b'80':
|
|
|
|
|
url += b':' + port
|
|
|
|
|
|
|
|
|
|
if 'web3.script_name' in url:
|
|
|
|
|
url += url_quote(environ['web3.script_name'])
|
|
|
|
|
else:
|
|
|
|
|
url += environ['SCRIPT_NAME']
|
|
|
|
|
if 'web3.path_info' in environ:
|
|
|
|
|
url += url_quote(environ['web3.path_info'])
|
|
|
|
|
else:
|
|
|
|
|
url += environ['PATH_INFO']
|
|
|
|
|
if query:
|
|
|
|
|
url += b'?' + query
|
|
|
|
|
|
|
|
|
|
Note that such a reconstructed URL may not be precisely the same URI
|
|
|
|
|
as requested by the client. Server rewrite rules, for example, may
|
|
|
|
|
have modified the client's originally requested URL to place it in a
|
|
|
|
|
canonical form.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Open Questions
|
|
|
|
|
==============
|
|
|
|
|
|
|
|
|
|
- ``file_wrapper`` replacement. Currently nothing is specified here
|
|
|
|
|
but it's clear that the old system of in-band signalling is broken
|
|
|
|
|
if it does not provide a way to figure out as a middleware in the
|
|
|
|
|
process if the response is a file wrapper.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Points of Contention
|
|
|
|
|
====================
|
|
|
|
|
|
|
|
|
|
Outlined below are potential points of contention regarding this
|
|
|
|
|
specification.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
WSGI 1.0 Compatibility
|
|
|
|
|
----------------------
|
|
|
|
|
|
|
|
|
|
Components written using the WSGI 1.0 specification will not
|
|
|
|
|
transparently interoperate with components written using this
|
|
|
|
|
specification. That's because the goals of this proposal and the
|
|
|
|
|
goals of WSGI 1.0 are not directly aligned.
|
|
|
|
|
|
|
|
|
|
WSGI 1.0 is obliged to provide specification-level backwards
|
|
|
|
|
compatibility with versions of Python between 2.2 and 2.7. This
|
|
|
|
|
specification, however, ditches Python 2.5 and lower compatibility in
|
|
|
|
|
order to provide compatibility between relatively recent versions of
|
|
|
|
|
Python 2 (2.6 and 2.7) as well as relatively recent versions of Python
|
|
|
|
|
3 (3.1).
|
|
|
|
|
|
|
|
|
|
It is currently impossible to write components which work reliably
|
|
|
|
|
under both Python 2 and Python 3 using the WSGI 1.0 specification,
|
|
|
|
|
because the specification implicitly posits that CGI and server
|
|
|
|
|
variable values in the environ and values returned via
|
|
|
|
|
``start_response`` represent a sequence of bytes that can be addressed
|
|
|
|
|
using the Python 2 string API. It posits such a thing because that
|
|
|
|
|
sort of data type was the sensible way to represent bytes in all
|
|
|
|
|
Python 2 versions, and WSGI 1.0 was conceived before Python 3 existed.
|
|
|
|
|
|
|
|
|
|
Python 3's ``str`` type supports the full API provided by the Python 2
|
|
|
|
|
``str`` type, but Python 3's ``str`` type does not represent a
|
|
|
|
|
sequence of bytes, it instead represents text. Therefore, using it to
|
|
|
|
|
represent environ values also requires that the environ byte sequence
|
|
|
|
|
be decoded to text via some encoding. We cannot decode these bytes to
|
|
|
|
|
text (at least in any way where the decoding has any meaning other
|
|
|
|
|
than as a tunnelling mechanism) without widening the scope of WSGI to
|
|
|
|
|
include server and gateway knowledge of decoding policies and
|
|
|
|
|
mechanics. WSGI 1.0 never concerned itself with encoding and
|
|
|
|
|
decoding. It made statements about allowable transport values, and
|
|
|
|
|
suggested that various values might be best decoded as one encoding or
|
|
|
|
|
another, but it never required a server to *perform* any decoding
|
|
|
|
|
before
|
|
|
|
|
|
|
|
|
|
Python 3 does not have a stringlike type that can be used instead to
|
|
|
|
|
represent bytes: it has a ``bytes`` type. A bytes type operates quite
|
|
|
|
|
a bit like a Python 2 ``str`` in Python 3.1+, but it lacks behavior
|
|
|
|
|
equivalent to ``str.__mod__`` and its iteration protocol, and
|
|
|
|
|
containment, sequence treatment, and equivalence comparisons are
|
|
|
|
|
different.
|
|
|
|
|
|
|
|
|
|
In either case, there is no type in Python 3 that behaves just like
|
|
|
|
|
the Python 2 ``str`` type, and a way to create such a type doesn't
|
|
|
|
|
exist because there is no such thing as a "String ABC" which would
|
|
|
|
|
allow a suitable type to be built. Due to this design
|
|
|
|
|
incompatibility, existing WSGI 1.0 servers, middleware, and
|
|
|
|
|
applications will not work under Python 3, even after they are run
|
|
|
|
|
through ``2to3``.
|
|
|
|
|
|
|
|
|
|
Existing Web-SIG discussions about updating the WSGI specification so
|
|
|
|
|
that it is possible to write a WSGI application that runs in both
|
|
|
|
|
Python 2 and Python 3 tend to revolve around creating a
|
|
|
|
|
specification-level equivalence between the Python 2 ``str`` type
|
|
|
|
|
(which represents a sequence of bytes) and the Python 3 ``str`` type
|
|
|
|
|
(which represents text). Such an equivalence becomes strained in
|
|
|
|
|
various areas, given the different roles of these types. An arguably
|
|
|
|
|
more straightforward equivalence exists between the Python 3 ``bytes``
|
|
|
|
|
type API and a subset of the Python 2 ``str`` type API. This
|
|
|
|
|
specification exploits this subset equivalence.
|
|
|
|
|
|
|
|
|
|
In the meantime, aside from any Python 2 vs. Python 3 compatibility
|
|
|
|
|
issue, as various discussions on Web-SIG have pointed out, the WSGI
|
|
|
|
|
1.0 specification is too general, providing support (via ``.write``)
|
|
|
|
|
for asynchronous applications at the expense of implementation
|
|
|
|
|
complexity. This specification uses the fundamental incompatibility
|
|
|
|
|
between WSGI 1.0 and Python 3 as a natural divergence point to create
|
|
|
|
|
a specification with reduced complexity by changing specialized
|
|
|
|
|
support for asynchronous applications.
|
|
|
|
|
|
|
|
|
|
To provide backwards compatibility for older WSGI 1.0 applications, so
|
|
|
|
|
that they may run on a Web3 stack, it is presumed that Web3 middleware
|
|
|
|
|
will be created which can be used "in front" of existing WSGI 1.0
|
|
|
|
|
applications, allowing those existing WSGI 1.0 applications to run
|
|
|
|
|
under a Web3 stack. This middleware will require, when under Python
|
|
|
|
|
3, an equivalence to be drawn between Python 3 ``str`` types and the
|
|
|
|
|
bytes values represented by the HTTP request and all the attendant
|
|
|
|
|
encoding-guessing (or configuration) it implies.
|
|
|
|
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
|
|
|
|
Such middleware *might* in the future, instead of drawing an
|
|
|
|
|
equivalence between Python 3 ``str`` and HTTP byte values, make use
|
|
|
|
|
of a yet-to-be-created "ebytes" type (aka "bytes-with-benefits"),
|
|
|
|
|
particularly if a String ABC proposal is accepted into the Python
|
|
|
|
|
core and implemented.
|
|
|
|
|
|
|
|
|
|
Conversely, it is presumed that WSGI 1.0 middleware will be created
|
|
|
|
|
which will allow a Web3 application to run behind a WSGI 1.0 stack on
|
|
|
|
|
the Python 2 platform.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Environ and Response Values as Bytes
|
|
|
|
|
------------------------------------
|
|
|
|
|
|
|
|
|
|
Casual middleware and application writers may consider the use of
|
|
|
|
|
bytes as environment values and response values inconvenient. In
|
|
|
|
|
particular, they won't be able to use common string formatting
|
|
|
|
|
functions such as ``('%s' % bytes_val)`` or
|
|
|
|
|
``bytes_val.format('123')`` because bytes don't have the same API as
|
|
|
|
|
strings on platforms such as Python 3 where the two types differ.
|
|
|
|
|
Likewise, on such platforms, stdlib HTTP-related API support for using
|
|
|
|
|
bytes interchangeably with text can be spotty. In places where bytes
|
|
|
|
|
are inconvenient or incompatible with library APIs, middleware and
|
|
|
|
|
application writers will have to decode such bytes to text explicitly.
|
|
|
|
|
This is particularly inconvenient for middleware writers: to work with
|
|
|
|
|
environment values as strings, they'll have to decode them from an
|
|
|
|
|
implied encoding and if they need to mutate an environ value, they'll
|
|
|
|
|
then need to encode the value into a byte stream before placing it
|
|
|
|
|
into the environ. While the use of bytes by the specification as
|
|
|
|
|
environ values might be inconvenient for casual developers, it
|
|
|
|
|
provides several benefits.
|
|
|
|
|
|
|
|
|
|
Using bytes types to represent HTTP and server values to an
|
|
|
|
|
application most closely matches reality because HTTP is fundamentally
|
|
|
|
|
a bytes-oriented protocol. If the environ values are mandated to be
|
|
|
|
|
strings, each server will need to use heuristics to guess about the
|
|
|
|
|
encoding of various values provided by the HTTP environment. Using
|
|
|
|
|
all strings might increase casual middleware writer convenience, but
|
|
|
|
|
will also lead to ambiguity and confusion when a value cannot be
|
|
|
|
|
decoded to a meaningful non-surrogate string.
|
|
|
|
|
|
|
|
|
|
Use of bytes as environ values avoids any potential for the need for
|
|
|
|
|
the specification to mandate that a participating server be informed
|
|
|
|
|
of encoding configuration parameters. If environ values are treated
|
|
|
|
|
as strings, and so must be decoded from bytes, configuration
|
|
|
|
|
parameters may eventually become necessary as policy clues from the
|
|
|
|
|
application deployer. Such a policy would be used to guess an
|
|
|
|
|
appropriate decoding strategy in various circumstances, effectively
|
|
|
|
|
placing the burden for enforcing a particular application encoding
|
|
|
|
|
policy upon the server. If the server must serve more than one
|
|
|
|
|
application, such configuration would quickly become complex. Many
|
|
|
|
|
policies would also be impossible to express declaratively.
|
|
|
|
|
|
|
|
|
|
In reality, HTTP is a complicated and legacy-fraught protocol which
|
|
|
|
|
requires a complex set of heuristics to make sense of. It would be
|
|
|
|
|
nice if we could allow this protocol to protect us from this
|
|
|
|
|
complexity, but we cannot do so reliably while still providing to
|
|
|
|
|
application writers a level of control commensurate with reality.
|
|
|
|
|
Python applications must often deal with data embedded in the
|
|
|
|
|
environment which not only must be parsed by legacy heuristics, but
|
|
|
|
|
*does not conform even to any existing HTTP specification*. While
|
|
|
|
|
these eventualities are unpleasant, they crop up with regularity,
|
|
|
|
|
making it impossible and undesirable to hide them from application
|
|
|
|
|
developers, as application developers are the only people who are able
|
|
|
|
|
to decide upon an appropriate action when an HTTP specification
|
|
|
|
|
violation is detected.
|
|
|
|
|
|
|
|
|
|
Some have argued for mixed use of bytes and string values as environ
|
|
|
|
|
*values*. This proposal avoids that strategy. Sole use of bytes as
|
|
|
|
|
environ values makes it possible to fit this specification entirely in
|
|
|
|
|
one's head; you won't need to guess about which values are strings and
|
|
|
|
|
which are bytes.
|
|
|
|
|
|
|
|
|
|
This protocol would also fit in a developer's head if all environ
|
|
|
|
|
values were strings, but this specification doesn't use that strategy.
|
|
|
|
|
This will likely be the point of greatest contention regarding the use
|
|
|
|
|
of bytes. In defense of bytes: developers often prefer protocols with
|
|
|
|
|
consistent contracts, even if the contracts themselves are suboptimal.
|
|
|
|
|
If we hide encoding issues from a developer until a value that
|
|
|
|
|
contains surrogates causes problems after it has already reached
|
|
|
|
|
beyond the I/O boundary of their application, they will need to do a
|
|
|
|
|
lot more work to fix assumptions made by their application than if we
|
|
|
|
|
were to just present the problem much earlier in terms of "here's some
|
|
|
|
|
bytes, you decode them". This is also a counter-argument to the
|
|
|
|
|
"bytes are inconvenient" assumption: while presenting bytes to an
|
|
|
|
|
application developer may be inconvenient for a casual application
|
|
|
|
|
developer who doesn't care about edge cases, they are extremely
|
|
|
|
|
convenient for the application developer who needs to deal with
|
|
|
|
|
complex, dirty eventualities, because use of bytes allows him the
|
|
|
|
|
appropriate level of control with a clear separation of
|
|
|
|
|
responsibility.
|
|
|
|
|
|
|
|
|
|
If the protocol uses bytes, it is presumed that libraries will be
|
|
|
|
|
created to make working with bytes-only in the environ and within
|
|
|
|
|
return values more pleasant; for example, analogues of the WSGI 1.0
|
|
|
|
|
libraries named "WebOb" and "Werkzeug". Such libraries will fill the
|
|
|
|
|
gap between convenience and control, allowing the spec to remain
|
|
|
|
|
simple and regular while still allowing casual authors a convenient
|
|
|
|
|
way to create Web3 middleware and application components. This seems
|
|
|
|
|
to be a reasonable alternative to baking encoding policy into the
|
|
|
|
|
protocol, because many such libraries can be created independently
|
|
|
|
|
from the protocol, and application developers can choose the one that
|
|
|
|
|
provides them the appropriate levels of control and convenience for a
|
|
|
|
|
particular job.
|
|
|
|
|
|
|
|
|
|
Here are some alternatives to using all bytes:
|
|
|
|
|
|
|
|
|
|
- Have the server decode all values representing CGI and server
|
|
|
|
|
environ values into strings using the ``latin-1`` encoding, which is
|
|
|
|
|
lossless. Smuggle any undecodable bytes within the resulting
|
|
|
|
|
string.
|
|
|
|
|
|
|
|
|
|
- Encode all CGI and server environ values to strings using the
|
|
|
|
|
``utf-8`` encoding with the ``surrogateescape`` error handler. This
|
|
|
|
|
does not work under any existing Python 2.
|
|
|
|
|
|
|
|
|
|
- Encode some values into bytes and other values into strings, as
|
|
|
|
|
decided by their typical usages.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Applications Should be Allowed to Read ``web3.input`` Past ``CONTENT_LENGTH``
|
|
|
|
|
-----------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
At [6]_, Graham Dumpleton makes the assertion that ``wsgi.input``
|
|
|
|
|
should be required to return the empty string as a signifier of
|
|
|
|
|
out-of-data, and that applications should be allowed to read past the
|
|
|
|
|
number of bytes specified in ``CONTENT_LENGTH``, depending only upon
|
|
|
|
|
the empty string as an EOF marker. WSGI relies on an application
|
|
|
|
|
"being well behaved and once all data specified by ``CONTENT_LENGTH``
|
|
|
|
|
is read, that it processes the data and returns any response. That
|
|
|
|
|
same socket connection could then be used for a subsequent request."
|
|
|
|
|
Graham would like WSGI adapters to be required to wrap raw socket
|
|
|
|
|
connections: "this wrapper object will need to count how much data has
|
|
|
|
|
been read, and when the amount of data reaches that as defined by
|
|
|
|
|
``CONTENT_LENGTH``, any subsequent reads should return an empty string
|
|
|
|
|
instead." This may be useful to support chunked encoding and input
|
|
|
|
|
filters.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
``web3.input`` Unknown Length
|
|
|
|
|
-----------------------------
|
|
|
|
|
|
|
|
|
|
There's no documented way to indicate that there is content in
|
|
|
|
|
``environ['web3.input']``, but the content length is unknown.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
``read()`` of ``web3.input`` Should Support No-Size Calling Convention
|
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
At [6]_, Graham Dumpleton makes the assertion that the ``read()``
|
|
|
|
|
method of ``wsgi.input`` should be callable without arguments, and
|
|
|
|
|
that the result should be "all available request content". Needs
|
|
|
|
|
discussion.
|
|
|
|
|
|
|
|
|
|
Comment Armin: I changed the spec to require that from an
|
|
|
|
|
implementation. I had too much pain with that in the past already.
|
|
|
|
|
Open for discussions though.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Input Filters should set environ ``CONTENT_LENGTH`` to -1
|
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
|
|
At [6]_, Graham Dumpleton suggests that an input filter might set
|
|
|
|
|
``environ['CONTENT_LENGTH']`` to -1 to indicate that it mutated the
|
|
|
|
|
input.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
``headers`` as Literal List of Two-Tuples
|
|
|
|
|
-----------------------------------------
|
|
|
|
|
|
|
|
|
|
Why do we make applications return a ``headers`` structure that is a
|
|
|
|
|
literal list of two-tuples? I think the iterability of ``headers``
|
|
|
|
|
needs to be maintained while it moves up the stack, but I don't think
|
|
|
|
|
we need to be able to mutate it in place at all times. Could we
|
|
|
|
|
loosen that requirement?
|
|
|
|
|
|
|
|
|
|
Comment Armin: Strong yes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Removed Requirement that Middleware Not Block
|
|
|
|
|
---------------------------------------------
|
|
|
|
|
|
|
|
|
|
This requirement was removed: "middleware components **must not**
|
|
|
|
|
block iteration waiting for multiple values from an application
|
|
|
|
|
iterable. If the middleware needs to accumulate more data from the
|
|
|
|
|
application before it can produce any output, it **must** yield an
|
|
|
|
|
empty string." This requirement existed to support asynchronous
|
|
|
|
|
applications and servers (see PEP 333's "Middleware Handling of Block
|
|
|
|
|
Boundaries"). Asynchronous applications are now serviced explicitly
|
|
|
|
|
by ``web3.async`` capable protocol (a Web3 application callable may
|
|
|
|
|
itself return a callable).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
``web3.script_name`` and ``web3.path_info``
|
|
|
|
|
-------------------------------------------
|
|
|
|
|
|
|
|
|
|
These values are required to be placed into the environment by an
|
|
|
|
|
origin server under this specification. Unlike ``SCRIPT_NAME`` and
|
|
|
|
|
``PATH_INFO``, these must be the original *URL-encoded* variants
|
|
|
|
|
derived from the request URI. We probably need to figure out how
|
|
|
|
|
these should be computed originally, and what their values should be
|
|
|
|
|
if the server performs URL rewriting.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Long Response Headers
|
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
|
|
Bob Brewer notes on Web-SIG [7]_:
|
|
|
|
|
|
|
|
|
|
Each header_value must not include any control characters,
|
|
|
|
|
including carriage returns or linefeeds, either embedded or at the
|
|
|
|
|
end. (These requirements are to minimize the complexity of any
|
|
|
|
|
parsing that must be performed by servers, gateways, and
|
|
|
|
|
intermediate response processors that need to inspect or modify
|
|
|
|
|
response headers.) [1]_
|
|
|
|
|
|
|
|
|
|
That's understandable, but HTTP headers are defined as (mostly)
|
|
|
|
|
\*TEXT, and "words of \*TEXT MAY contain characters from character
|
|
|
|
|
sets other than ISO-8859-1 only when encoded according to the rules of
|
|
|
|
|
RFC 2047." [2]_ And RFC 2047 specifies that "an 'encoded-word' may
|
|
|
|
|
not be more than 75 characters long... If it is desirable to encode
|
|
|
|
|
more text than will fit in an 'encoded-word' of 75 characters,
|
|
|
|
|
multiple 'encoded-word's (separated by CRLF SPACE) may be used." [3]_
|
|
|
|
|
This satisfies HTTP header folding rules, as well: "Header fields can
|
|
|
|
|
be extended over multiple lines by preceding each extra line with at
|
|
|
|
|
least one SP or HT." [1]_
|
|
|
|
|
|
|
|
|
|
So in my reading of HTTP, some code somewhere should introduce
|
|
|
|
|
newlines in longish, encoded response header values. I see three
|
|
|
|
|
options:
|
|
|
|
|
|
|
|
|
|
1. Keep things as they are and disallow response header values if they
|
|
|
|
|
contain words over 75 chars that are outside the ISO-8859-1
|
|
|
|
|
character set.
|
|
|
|
|
|
|
|
|
|
2. Allow newline characters in WSGI response headers.
|
|
|
|
|
|
|
|
|
|
3. Require/strongly suggest WSGI servers to do the encoding and
|
|
|
|
|
folding before sending the value over HTTP.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Request Trailers and Chunked Transfer Encoding
|
|
|
|
|
----------------------------------------------
|
|
|
|
|
|
|
|
|
|
When using chunked transfer encoding on request content, the RFCs
|
|
|
|
|
allow there to be request trailers. These are like request headers
|
|
|
|
|
but come after the final null data chunk. These trailers are only
|
|
|
|
|
available when the chunked data stream is finite length and when it
|
|
|
|
|
has all been read in. Neither WSGI nor Web3 currently supports them.
|
|
|
|
|
|
|
|
|
|
.. XXX (armin) yield from application iterator should be specify write
|
|
|
|
|
plus flush by server.
|
|
|
|
|
|
|
|
|
|
.. XXX (armin) websocket API.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
.. [1] PEP 333: Python Web Services Gateway Interface
|
|
|
|
|
(http://www.python.org/dev/peps/pep-0333/)
|
|
|
|
|
|
|
|
|
|
.. [2] The Common Gateway Interface Specification, v 1.1, 3rd Draft
|
|
|
|
|
(http://cgi-spec.golux.com/draft-coar-cgi-v11-03.txt)
|
|
|
|
|
|
|
|
|
|
.. [3] "Chunked Transfer Coding" -- HTTP/1.1, section 3.6.1
|
|
|
|
|
(http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1)
|
|
|
|
|
|
|
|
|
|
.. [4] "End-to-end and Hop-by-hop Headers" -- HTTP/1.1, Section 13.5.1
|
|
|
|
|
(http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.5.1)
|
|
|
|
|
|
|
|
|
|
.. [5] mod_ssl Reference, "Environment Variables"
|
|
|
|
|
(http://www.modssl.org/docs/2.8/ssl_reference.html#ToC25)
|
|
|
|
|
|
|
|
|
|
.. [6] Details on WSGI 1.0 amendments/clarifications.
|
|
|
|
|
(http://blog.dscpl.com.au/2009/10/details-on-wsgi-10-amendmentsclarificat.html)
|
|
|
|
|
|
|
|
|
|
.. [7] [Web-SIG] WSGI and long response header values
|
2017-06-11 15:02:39 -04:00
|
|
|
|
https://mail.python.org/pipermail/web-sig/2006-September/002244.html
|
2010-09-15 18:40:38 -04:00
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
..
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
|
|
|
|
coding: utf-8
|
|
|
|
|
End:
|