New spec for newline= parameter to open() and TextIOBase().

This commit is contained in:
Guido van Rossum 2007-08-16 21:21:30 +00:00
parent 42966cb9b0
commit 911566c261
1 changed files with 50 additions and 15 deletions

View File

@ -342,20 +342,55 @@ signature:
``.__init__(self, buffer, encoding=None, newline=None)``
``buffer`` is a reference to the ``BufferedIOBase`` object to
be wrapped with the ``TextIOWrapper``. ``encoding`` refers to
an encoding to be used for translating between the
byte-representation and character-representation. If it is
``None``, then the system's locale setting will be used as the
default. ``newline`` can be ``None``, ``'\n'``, ``'\r'``, or
``'\r\n'`` (all other values are illegal); it indicates the
translation for ``'\n'`` characters written. If ``None``, a
system-specific default is chosen, i.e., ``'\r\n'`` on Windows
and ``'\n'`` on Unix/Linux. Setting ``newline='\n'`` on input
means that no CRLF translation is done; lines ending in
``'\r\n'`` will be returned as ``'\r\n'``. (``'\r'`` support
is still needed for some OSX applications that produce files
using ``'\r'`` line endings; Excel (when exporting to text)
and Adobe Illustrator EPS files are the most common examples.
be wrapped with the ``TextIOWrapper``.
``encoding`` refers to an encoding to be used for translating
between the byte-representation and character-representation.
If it is ``None``, then the system's locale setting will be
used as the default.
``newline`` can be ``None``, ``''``, ``'\n'``, ``'\r'``, or
``'\r\n'``; all other values are illegal. It controls the
handling of line endings. It works as follows:
* On input, if ``newline`` is ``None``, universal newlines
mode is enabled. Lines in the input can end in ``'\n'``,
``'\r'``, or ``'\r\n'``, and these are translated into
``'\n'`` before being returned to the caller. If it is
``''``, universal newline mode is enabled, but line endings
are returned to the caller untranslated. If it has any of
the other legal values, input lines are only terminated by
the given string, and the line ending is returned to the
caller translated to ``'\n'``.
* On output, if ``newline`` is ``None``, any ``'\n'``
characters written are translated to the system default
line separator, ``os.linesep``. If ``newline`` is ``''``,
no translation takes place. If ``newline`` is any of the
other legal values, any ``'\n'`` characters written are
translated to the given string.
Further notes on the ``newline`` parameter:
* ``'\r'`` support is still needed for some OSX applications
that produce files using ``'\r'`` line endings; Excel (when
exporting to text) and Adobe Illustrator EPS files are the
most common examples.
* If translation is enabled, it happens regardless of which
method is called for reading or writing. For example,
{{{f.read()}}} will always produce the same result as
{{{''.join(f.readlines())}}}.
* If universal newlines without translation are requested on
input (i.e. ``newline=''``), if a system read operation
returns a buffer ending in ``'\r'``, another system read
operation is done to determine whether it is followed by
``'\n'`` or not. In universal newlines mode with
translation, the second system read operation may be
postponed until the next read request, and if the following
system read operation returns a buffer starting with
``'\n'``, that character is simply discarded.
Another implementation, ``StringIO``, creates a file-like ``TextIO``
implementation without an underlying Buffered I/O object. While
@ -422,7 +457,7 @@ pseudo-code::
assert isinstance(mode, str)
assert buffering is None or isinstance(buffering, int)
assert encoding is None or isinstance(encoding, str)
assert newline in (None, "\n", "\r", "\r\n")
assert newline in (None, "", "\n", "\r", "\r\n")
modes = set(mode)
if modes - set("arwb+t") or len(mode) > len(modes):
raise ValueError("invalid mode: %r" % mode)