- --with-universal-newlines is now the default
- U is the mode character that signal universal newlines in open in stead of t. - Clarified why output isn't handled. - Clarified various questions on unicode, exec, locks, etc.
This commit is contained in:
parent
4834b3a6a8
commit
69316725d1
75
pep-0278.txt
75
pep-0278.txt
|
@ -30,16 +30,17 @@ Abstract
|
|||
|
||||
Specification
|
||||
|
||||
Universal newline support needs to be enabled during the configure
|
||||
of Python.
|
||||
Universal newline support needs to be enabled by default,
|
||||
but can be disabled during the configure of Python.
|
||||
|
||||
In a Python with universal newline support the feature is
|
||||
automatically enabled for all import statements and source()
|
||||
calls.
|
||||
automatically enabled for all import statements and execfile()
|
||||
calls. There is no special support for eval() or exec.
|
||||
|
||||
In a Python with universal newline support open() the mode
|
||||
parameter can also be "t", meaning "open for input as a text file
|
||||
with universal newline interpretation". Mode "t" cannot be
|
||||
parameter can also be "U", meaning "open for input as a text file
|
||||
with universal newline interpretation". Mode "rU" is also allowed,
|
||||
for symmetry with "rb". Mode "U" cannot be
|
||||
combined with other mode flags such as "+". Any line ending in the
|
||||
input file will be seen as a '\n' in Python, so little other code has
|
||||
to change to handle universal newlines.
|
||||
|
@ -71,16 +72,52 @@ Rationale
|
|||
implementation also allows tracebacks and such (which open the
|
||||
Python source module) to be handled easily.
|
||||
|
||||
Universal newline support is implemented (for this release) as a
|
||||
compile time option because there is a performance penalty, even
|
||||
though it should be a small one.
|
||||
|
||||
There is no output implementation of universal newlines, Python
|
||||
programs are expected to handle this by themselves or write files
|
||||
with platform-local convention otherwise. The reason for this is
|
||||
that input is the difficult case, outputting different newlines to
|
||||
a file is already easy enough in Python. It would also slow down
|
||||
all "normal" Python output, even if only a little.
|
||||
a file is already easy enough in Python.
|
||||
|
||||
Also, an output implementation would be much more difficult than an
|
||||
input implementation, surprisingly: a lot of output is done through
|
||||
PyXXX_Print() methods, and at this point the file object is not
|
||||
available anymore, only a FILE *. So, an output implementation would
|
||||
need to somehow go from the FILE* to the file object, because that
|
||||
is where the current newline delimiter is stored.
|
||||
|
||||
The input implementation has no such problem: there are no cases in
|
||||
the Python source tree where files are partially read from C,
|
||||
partially from Python, and such cases are expected to be rare in
|
||||
extension modules. If such cases exist the only problem is that the
|
||||
newlines attribute of the file object is not updated during the
|
||||
fread() or fgets() calls that are done direct from C.
|
||||
|
||||
A partial output implementation, where strings passed to fp.write()
|
||||
would be converted to use fp.newlines as their line terminated but
|
||||
all other output would not is far too surprising, in my view.
|
||||
|
||||
Because there is no output support for universal newlines there is
|
||||
also no support for a mode "rU+": the surprise factor of the
|
||||
previous paragraph would hold to an even stronger degree.
|
||||
|
||||
There is no support for universal newlines in strings passed to
|
||||
eval() or exec. It is envisioned that such strings always have the
|
||||
standard \n line feed, if the strings come from a file that file can
|
||||
be read with universal newlines.
|
||||
|
||||
I think there are no special issues with unicode. utf-16 shouldn't
|
||||
pose any new problems, as such files need to be opened in binary
|
||||
mode anyway. Interaction with utf-8 I am not 100% sure about: is it
|
||||
possible for a 0x0a or 0x0d byte to occur as part of a multibyte
|
||||
escape without the stadnard meaning of CR or LF? I assume not,
|
||||
because if such bytes are allowed it would mean that readline() on
|
||||
Unix would terminate the read on a 0x0d (and on MacOS on a 0x0a)
|
||||
without a full line being read.
|
||||
|
||||
Universal newline files should work fine with iterators and
|
||||
xreadlines() as these eventually call the normal file
|
||||
readline/readlines methods.
|
||||
|
||||
|
||||
While universal newlines are automatically enabled for import they
|
||||
are not for opening, where you have to specifically say open(...,
|
||||
|
@ -88,9 +125,13 @@ Rationale
|
|||
design:
|
||||
|
||||
- Compatibility. Programs which already do their own
|
||||
interpretation of \r\n in text files would break. Programs
|
||||
which open binary files as text files on Unix would also break
|
||||
(but it could be argued they deserve it :-).
|
||||
interpretation of \r\n in text files would break. Examples of such
|
||||
programs would be editors which warn you when you open a file with
|
||||
a different newline convention. If universal newlines was made the
|
||||
default such an editor would silently convert your line endings to
|
||||
the local convention on save. Programs which open binary files as
|
||||
text files on Unix would also break (but it could be argued they
|
||||
deserve it :-).
|
||||
|
||||
- Interface clarity. Universal newlines are only supported for
|
||||
input files, not for input/output files, as the semantics would
|
||||
|
@ -118,6 +159,10 @@ Rationale
|
|||
would argue that if you read the same FileObject in two threads
|
||||
simultaneously you are asking for trouble anyway.
|
||||
|
||||
Note that no globally accessible pointers are manipulated in the
|
||||
fgets() or fread() replacement routines, just some integer-valued
|
||||
flags, so the chances of core dumps are zero (he said:-).
|
||||
|
||||
|
||||
Reference Implementation
|
||||
|
||||
|
|
Loading…
Reference in New Issue