Added PEP 278, Universal Newline Support, Jack Jansen
This commit is contained in:
parent
5071ad8ef0
commit
3d0a5402a1
|
@ -88,6 +88,7 @@ Index by Category
|
|||
S 275 Switching on Multiple Values Lemburg
|
||||
S 276 Simple Iterator for ints Althoff
|
||||
S 277 Unicode file name support for Windows NT Hodgson
|
||||
S 278 Universal Newline Support Jansen
|
||||
|
||||
Finished PEPs (done, implemented in CVS)
|
||||
|
||||
|
@ -237,6 +238,7 @@ Numerical Index
|
|||
S 275 Switching on Multiple Values Lemburg
|
||||
S 276 Simple Iterator for ints Althoff
|
||||
S 277 Unicode file name support for Windows NT Hodgson
|
||||
S 278 Universal Newline Support Jansen
|
||||
SR 666 Reject Foolish Indentation Creighton
|
||||
|
||||
|
||||
|
@ -270,6 +272,7 @@ Owners
|
|||
Hodgson, Neil neilh@scintilla.org
|
||||
Hudson, Michael mwh@python.net
|
||||
Hylton, Jeremy jeremy@zope.com
|
||||
Jansen, Jack jack@cwi.nl
|
||||
Kuchling, Andrew akuchlin@mems-exchange.org
|
||||
Lemburg, Marc-Andre mal@lemburg.com
|
||||
Lielens, Gregory gregory.lielens@fft.be
|
||||
|
|
|
@ -0,0 +1,139 @@
|
|||
PEP: 278
|
||||
Title: Universal Newline Support
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: jack@cwi.nl (Jack Jansen)
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Created: 14-Jan-2002
|
||||
Python-Version: 2.3
|
||||
Post-History:
|
||||
|
||||
|
||||
Abstract
|
||||
|
||||
This PEP discusses a way in which Python can support I/O on files
|
||||
which have a newline format that is not the native format on the
|
||||
platform, so that Python on each platform can read and import
|
||||
files with CR (Macintosh), LF (Unix) or CR LF (Windows) line
|
||||
endings.
|
||||
|
||||
It is more and more common to come across files that have an end
|
||||
of line that does not match the standard on the current platform:
|
||||
files downloaded over the net, remotely mounted filesystems on a
|
||||
different platform, Mac OS X with its double standard of Mac and
|
||||
Unix line endings, etc.
|
||||
|
||||
Many tools such as editors and compilers already handle this
|
||||
gracefully, it would be good if Python did so too.
|
||||
|
||||
|
||||
Specification
|
||||
|
||||
Universal newline support needs to be enabled during the configure
|
||||
of Python.
|
||||
|
||||
In a Python with universal newline support the feature is
|
||||
automatically enabled for all import statements and source()
|
||||
calls.
|
||||
|
||||
In a Python with universal newline support open() the mode
|
||||
parameter can also be "t", meaning "open for input as a text file
|
||||
with universal newline interpretation". Mode "t" cannot be
|
||||
combined with other mode flags such as "+".
|
||||
|
||||
There is no special support for output to file with a different
|
||||
newline convention.
|
||||
|
||||
A file object that has been opened in universal newline mode gets
|
||||
a new attribute "newlines" which reflects the newline convention
|
||||
used in the file. The value for this attribute is one of None (no
|
||||
newline read yet), "\r", "\n", "\r\n" or "mixed" (multiple
|
||||
different types of newlines seen).
|
||||
|
||||
|
||||
Rationale
|
||||
|
||||
Universal newline support is implemented in C, not in Python.
|
||||
This is done because we want files with a foreign newline
|
||||
convention to be import-able, so a Python Lib directory can be
|
||||
shared over a remote file system connection, or between MacPython
|
||||
and Unix-Python on Mac OS X. For this to be feasible the
|
||||
universal newline convention needs to have a reasonably small
|
||||
impact on performance, which means a Python implementation is not
|
||||
an option as it would bog down all imports. And because of files
|
||||
with multiple newline conventions, which Visual C++ and other
|
||||
Windows tools will happily produce, doing a quick check for the
|
||||
newlines used in a file (handing off the import to C code if a
|
||||
platform-local newline is seen) will not work. Finally, a C
|
||||
implementation also allows tracebacks and such (which open the
|
||||
Python source module) to be handled easily.
|
||||
|
||||
Universal newline support is implemented (for this release) as a
|
||||
compile time option because there is a performance penalty, even
|
||||
though it should be a small one.
|
||||
|
||||
There is no output implementation of universal newlines, Python
|
||||
programs are expected to handle this by themselves or write files
|
||||
with platform-local convention otherwise. The reason for this is
|
||||
that input is the difficult case, outputting different newlines to
|
||||
a file is already easy enough in Python.
|
||||
|
||||
While universal newlines are automatically enabled for import they
|
||||
are not for opening, where you have to specifically say open(...,
|
||||
"t"). This is open to debate, but here are a few reasons for this
|
||||
design:
|
||||
|
||||
- Compatibility. Programs which already do their own
|
||||
interpretation of \r\n in text files would break. Programs
|
||||
which open binary files as text files on Unix would also break
|
||||
(but it could be argued they deserve it :-).
|
||||
|
||||
- Interface clarity. Universal newlines are only supported for
|
||||
input files, not for input/output files, as the semantics would
|
||||
become muddy. Would you write Mac newlines if all reads so far
|
||||
had encountered Mac newlines? But what if you then later read a
|
||||
Unix newline?
|
||||
|
||||
The newlines attribute is included so that programs that really
|
||||
care about the newline convention, such as text editors, can
|
||||
examine what was in a file. They can then save (a copy of) the
|
||||
file with the same newline convention (or, in case of a file with
|
||||
mixed newlines, ask the user what to do, or output in platform
|
||||
convention).
|
||||
|
||||
Feedback is explicitly solicited on one item in the reference
|
||||
implementation: whether or not the universal newlines routines
|
||||
should grab the global interpreter lock. Currently they do not,
|
||||
but this could be considered living dangerously, as they may
|
||||
modify fields in a FileObject. But as these routines are
|
||||
replacements for fgets() and fread() as well it may be difficult
|
||||
to decide whether or not the lock is held when the routine is
|
||||
called. Moreover, the only danger is that if two threads read the
|
||||
same FileObject at the same time an extraneous newline may be seen
|
||||
or the "newlines" attribute may inadvertently be set to mixed. I
|
||||
would argue that if you read the same FileObject in two threads
|
||||
simultaneously you are asking for trouble anyway.
|
||||
|
||||
|
||||
Reference Implementation
|
||||
|
||||
A reference implementation is available in SourceForge patch #476814.
|
||||
|
||||
|
||||
References
|
||||
|
||||
None.
|
||||
|
||||
|
||||
Copyright
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
fill-column: 70
|
||||
End:
|
Loading…
Reference in New Issue