2002-01-23 08:24:26 -05:00
|
|
|
PEP: 278
|
|
|
|
Title: Universal Newline Support
|
|
|
|
Version: $Revision$
|
|
|
|
Last-Modified: $Date$
|
|
|
|
Author: jack@cwi.nl (Jack Jansen)
|
2005-01-29 13:24:59 -05:00
|
|
|
Status: Final
|
2002-01-23 08:24:26 -05:00
|
|
|
Type: Standards Track
|
2017-01-24 15:47:22 -05:00
|
|
|
Content-Type: text/x-rst
|
2002-01-23 08:24:26 -05:00
|
|
|
Created: 14-Jan-2002
|
|
|
|
Python-Version: 2.3
|
|
|
|
Post-History:
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
2017-01-24 15:47:22 -05:00
|
|
|
========
|
2002-01-23 08:24:26 -05:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
This PEP discusses a way in which Python can support I/O on files
|
|
|
|
which have a newline format that is not the native format on the
|
|
|
|
platform, so that Python on each platform can read and import
|
|
|
|
files with CR (Macintosh), LF (Unix) or CR LF (Windows) line
|
|
|
|
endings.
|
2002-01-23 08:24:26 -05:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
It is more and more common to come across files that have an end
|
|
|
|
of line that does not match the standard on the current platform:
|
|
|
|
files downloaded over the net, remotely mounted filesystems on a
|
|
|
|
different platform, Mac OS X with its double standard of Mac and
|
|
|
|
Unix line endings, etc.
|
|
|
|
|
|
|
|
Many tools such as editors and compilers already handle this
|
|
|
|
gracefully, it would be good if Python did so too.
|
2002-01-23 08:24:26 -05:00
|
|
|
|
|
|
|
|
|
|
|
Specification
|
2017-01-24 15:47:22 -05:00
|
|
|
=============
|
|
|
|
|
|
|
|
Universal newline support is enabled by default,
|
|
|
|
but can be disabled during the configure of Python.
|
|
|
|
|
|
|
|
In a Python with universal newline support the feature is
|
|
|
|
automatically enabled for all import statements and ``execfile()``
|
|
|
|
calls. There is no special support for ``eval()`` or exec.
|
|
|
|
|
|
|
|
In a Python with universal newline support ``open()`` the mode
|
|
|
|
parameter can also be "U", meaning "open for input as a text file
|
|
|
|
with universal newline interpretation". Mode "rU" is also allowed,
|
|
|
|
for symmetry with "rb". Mode "U" cannot be
|
|
|
|
combined with other mode flags such as "+". Any line ending in the
|
|
|
|
input file will be seen as a '\n' in Python, so little other code has
|
|
|
|
to change to handle universal newlines.
|
|
|
|
|
|
|
|
Conversion of newlines happens in all calls that read data: ``read()``,
|
|
|
|
``readline()``, ``readlines()``, etc.
|
|
|
|
|
|
|
|
There is no special support for output to file with a different
|
|
|
|
newline convention, and so mode "wU" is also illegal.
|
|
|
|
|
|
|
|
A file object that has been opened in universal newline mode gets
|
|
|
|
a new attribute "newlines" which reflects the newline convention
|
|
|
|
used in the file. The value for this attribute is one of None (no
|
|
|
|
newline read yet), "\r", "\n", "\r\n" or a tuple containing all the
|
|
|
|
newline types seen.
|
|
|
|
|
2002-01-23 08:24:26 -05:00
|
|
|
|
|
|
|
Rationale
|
2017-01-24 15:47:22 -05:00
|
|
|
=========
|
|
|
|
|
|
|
|
Universal newline support is implemented in C, not in Python.
|
|
|
|
This is done because we want files with a foreign newline
|
|
|
|
convention to be import-able, so a Python Lib directory can be
|
|
|
|
shared over a remote file system connection, or between MacPython
|
|
|
|
and Unix-Python on Mac OS X. For this to be feasible the
|
|
|
|
universal newline convention needs to have a reasonably small
|
|
|
|
impact on performance, which means a Python implementation is not
|
|
|
|
an option as it would bog down all imports. And because of files
|
|
|
|
with multiple newline conventions, which Visual C++ and other
|
|
|
|
Windows tools will happily produce, doing a quick check for the
|
|
|
|
newlines used in a file (handing off the import to C code if a
|
|
|
|
platform-local newline is seen) will not work. Finally, a C
|
|
|
|
implementation also allows tracebacks and such (which open the
|
|
|
|
Python source module) to be handled easily.
|
|
|
|
|
|
|
|
There is no output implementation of universal newlines, Python
|
|
|
|
programs are expected to handle this by themselves or write files
|
|
|
|
with platform-local convention otherwise. The reason for this is
|
|
|
|
that input is the difficult case, outputting different newlines to
|
|
|
|
a file is already easy enough in Python.
|
|
|
|
|
|
|
|
Also, an output implementation would be much more difficult than an
|
|
|
|
input implementation, surprisingly: a lot of output is done through
|
|
|
|
``PyXXX_Print()`` methods, and at this point the file object is not
|
|
|
|
available anymore, only a ``FILE *``. So, an output implementation would
|
|
|
|
need to somehow go from the ``FILE*`` to the file object, because that
|
|
|
|
is where the current newline delimiter is stored.
|
|
|
|
|
|
|
|
The input implementation has no such problem: there are no cases in
|
|
|
|
the Python source tree where files are partially read from C,
|
|
|
|
partially from Python, and such cases are expected to be rare in
|
|
|
|
extension modules. If such cases exist the only problem is that the
|
|
|
|
newlines attribute of the file object is not updated during the
|
|
|
|
``fread()`` or ``fgets()`` calls that are done direct from C.
|
|
|
|
|
|
|
|
A partial output implementation, where strings passed to ``fp.write()``
|
|
|
|
would be converted to use fp.newlines as their line terminator but
|
|
|
|
all other output would not is far too surprising, in my view.
|
|
|
|
|
|
|
|
Because there is no output support for universal newlines there is
|
|
|
|
also no support for a mode "rU+": the surprise factor of the
|
|
|
|
previous paragraph would hold to an even stronger degree.
|
|
|
|
|
|
|
|
There is no support for universal newlines in strings passed to
|
|
|
|
``eval()`` or ``exec``. It is envisioned that such strings always have the
|
|
|
|
standard ``\n`` line feed, if the strings come from a file that file can
|
|
|
|
be read with universal newlines.
|
|
|
|
|
|
|
|
I think there are no special issues with unicode. utf-16 shouldn't
|
|
|
|
pose any new problems, as such files need to be opened in binary
|
|
|
|
mode anyway. Interaction with utf-8 is fine too: values 0x0a and 0x0d
|
|
|
|
cannot occur as part of a multibyte sequence.
|
|
|
|
|
|
|
|
Universal newline files should work fine with iterators and
|
|
|
|
``xreadlines()`` as these eventually call the normal file
|
|
|
|
readline/readlines methods.
|
|
|
|
|
|
|
|
|
|
|
|
While universal newlines are automatically enabled for import they
|
|
|
|
are not for opening, where you have to specifically say open(...,
|
|
|
|
"U"). This is open to debate, but here are a few reasons for this
|
|
|
|
design:
|
|
|
|
|
|
|
|
- Compatibility. Programs which already do their own
|
|
|
|
interpretation of ``\r\n`` in text files would break. Examples of such
|
|
|
|
programs would be editors which warn you when you open a file with
|
|
|
|
a different newline convention. If universal newlines was made the
|
|
|
|
default such an editor would silently convert your line endings to
|
|
|
|
the local convention on save. Programs which open binary files as
|
|
|
|
text files on Unix would also break (but it could be argued they
|
|
|
|
deserve it :-).
|
|
|
|
|
|
|
|
- Interface clarity. Universal newlines are only supported for
|
|
|
|
input files, not for input/output files, as the semantics would
|
|
|
|
become muddy. Would you write Mac newlines if all reads so far
|
|
|
|
had encountered Mac newlines? But what if you then later read a
|
|
|
|
Unix newline?
|
|
|
|
|
|
|
|
The newlines attribute is included so that programs that really
|
|
|
|
care about the newline convention, such as text editors, can
|
|
|
|
examine what was in a file. They can then save (a copy of) the
|
|
|
|
file with the same newline convention (or, in case of a file with
|
|
|
|
mixed newlines, ask the user what to do, or output in platform
|
|
|
|
convention).
|
|
|
|
|
|
|
|
Feedback is explicitly solicited on one item in the reference
|
|
|
|
implementation: whether or not the universal newlines routines
|
|
|
|
should grab the global interpreter lock. Currently they do not,
|
|
|
|
but this could be considered living dangerously, as they may
|
|
|
|
modify fields in a ``FileObject``. But as these routines are
|
|
|
|
replacements for ``fgets()`` and ``fread()`` as well it may be difficult
|
|
|
|
to decide whether or not the lock is held when the routine is
|
|
|
|
called. Moreover, the only danger is that if two threads read the
|
|
|
|
same ``FileObject`` at the same time an extraneous newline may be seen
|
|
|
|
or the "newlines" attribute may inadvertently be set to mixed. I
|
|
|
|
would argue that if you read the same ``FileObject`` in two threads
|
|
|
|
simultaneously you are asking for trouble anyway.
|
|
|
|
|
|
|
|
Note that no globally accessible pointers are manipulated in the
|
|
|
|
``fgets()`` or ``fread()`` replacement routines, just some integer-valued
|
|
|
|
flags, so the chances of core dumps are zero (he said:-).
|
|
|
|
|
|
|
|
Universal newline support can be disabled during configure because it does
|
|
|
|
have a small performance penalty, and moreover the implementation has
|
|
|
|
not been tested on all conceivable platforms yet. It might also be silly
|
|
|
|
on some platforms (WinCE or Palm devices, for instance). If universal
|
|
|
|
newline support is not enabled then file objects do not have the "newlines"
|
|
|
|
attribute, so testing whether the current Python has it can be done with a
|
|
|
|
simple::
|
|
|
|
|
|
|
|
if hasattr(open, 'newlines'):
|
|
|
|
print 'We have universal newline support'
|
|
|
|
|
|
|
|
Note that this test uses the ``open()`` function rather than the file
|
|
|
|
type so that it won't fail for versions of Python where the file
|
|
|
|
type was not available (the file type was added to the built-in
|
|
|
|
namespace in the same release as the universal newline feature was
|
|
|
|
added).
|
|
|
|
|
|
|
|
Additionally, note that this test fails again on Python versions
|
|
|
|
>= 2.5, when ``open()`` was made a function again and is not synonymous
|
|
|
|
with the file type anymore.
|
|
|
|
|
2002-01-23 08:24:26 -05:00
|
|
|
|
|
|
|
Reference Implementation
|
2017-01-24 15:47:22 -05:00
|
|
|
========================
|
2002-01-23 08:24:26 -05:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
A reference implementation is available in SourceForge patch
|
|
|
|
#476814: http://www.python.org/sf/476814
|
2002-01-23 08:24:26 -05:00
|
|
|
|
|
|
|
|
|
|
|
References
|
2017-01-24 15:47:22 -05:00
|
|
|
==========
|
2002-01-23 08:24:26 -05:00
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
None.
|
2002-01-23 08:24:26 -05:00
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
2017-01-24 15:47:22 -05:00
|
|
|
=========
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
2002-01-23 08:24:26 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
2017-01-24 15:47:22 -05:00
|
|
|
..
|
|
|
|
Local Variables:
|
|
|
|
mode: indented-text
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
fill-column: 70
|
|
|
|
End:
|