- --with-universal-newlines is now the default

- U is the mode character that signal universal newlines in open in stead of t. - Clarified why output isn't handled. - Clarified various questions on unicode, exec, locks, etc.
2002-03-13 22:50:54 +00:00 · 2002-03-13 22:50:54 +00:00 · 69316725d1
parent 4834b3a6a8
commit 69316725d1
1 changed files with 60 additions and 15 deletions
--- a/pep-0278.txt
+++ b/pep-0278.txt
@ -30,16 +30,17 @@ Abstract

 Specification

-    Universal newline support needs to be enabled during the configure
-    of Python.
+    Universal newline support needs to be enabled by default,
+    but can be disabled during the configure of Python.
    
    In a Python with universal newline support the feature is
-    automatically enabled for all import statements and source()
-    calls.
+    automatically enabled for all import statements and execfile()
+    calls. There is no special support for eval() or exec.
    
    In a Python with universal newline support open() the mode
-    parameter can also be "t", meaning "open for input as a text file
-    with universal newline interpretation".  Mode "t" cannot be
+    parameter can also be "U", meaning "open for input as a text file
+    with universal newline interpretation".  Mode "rU" is also allowed,
+    for symmetry with "rb". Mode "U" cannot be
    combined with other mode flags such as "+". Any line ending in the
    input file will be seen as a '\n' in Python, so little other code has
    to change to handle universal newlines.
@ -71,16 +72,52 @@ Rationale
    implementation also allows tracebacks and such (which open the
    Python source module) to be handled easily.
    
-    Universal newline support is implemented (for this release) as a
-    compile time option because there is a performance penalty, even
-    though it should be a small one.
-    
    There is no output implementation of universal newlines, Python
    programs are expected to handle this by themselves or write files
    with platform-local convention otherwise.  The reason for this is
    that input is the difficult case, outputting different newlines to
-    a file is already easy enough in Python. It would also slow down
-    all "normal" Python output, even if only a little.
+    a file is already easy enough in Python.
+    
+    Also, an output implementation would be much more difficult than an
+    input implementation, surprisingly: a lot of output is done through
+    PyXXX_Print() methods, and at this point the file object is not
+    available anymore, only a FILE *. So, an output implementation would
+    need to somehow go from the FILE* to the file object, because that
+    is where the current newline delimiter is stored.
+
+    The input implementation has no such problem: there are no cases in
+    the Python source tree where files are partially read from C,
+    partially from Python, and such cases are expected to be rare in
+    extension modules. If such cases exist the only problem is that the
+    newlines attribute of the file object is not updated during the
+    fread() or fgets() calls that are done direct from C.
+
+    A partial output implementation, where strings passed to fp.write()
+    would be converted to use fp.newlines as their line terminated but
+    all other output would not is far too surprising, in my view.
+
+    Because there is no output support for universal newlines there is
+    also no support for a mode "rU+": the surprise factor of the
+    previous paragraph would hold to an even stronger degree.
+
+    There is no support for universal newlines in strings passed to
+    eval() or exec. It is envisioned that such strings always have the
+    standard \n line feed, if the strings come from a file that file can
+    be read with universal newlines.
+
+    I think there are no special issues with unicode. utf-16 shouldn't
+    pose any new problems, as such files need to be opened in binary
+    mode anyway. Interaction with utf-8 I am not 100% sure about: is it
+    possible for a 0x0a or 0x0d byte to occur as part of a multibyte
+    escape without the stadnard meaning of CR or LF? I assume not,
+    because if such bytes are allowed it would mean that readline() on
+    Unix would terminate the read on a 0x0d (and on MacOS on a 0x0a)
+    without a full line being read.
+
+    Universal newline files should work fine with iterators and
+    xreadlines() as these eventually call the normal file
+    readline/readlines methods.
+
    
    While universal newlines are automatically enabled for import they
    are not for opening, where you have to specifically say open(...,
@ -88,9 +125,13 @@ Rationale
    design:

    - Compatibility.  Programs which already do their own
-      interpretation of \r\n in text files would break.  Programs
-      which open binary files as text files on Unix would also break
-      (but it could be argued they deserve it :-).
+      interpretation of \r\n in text files would break. Examples of such
+      programs would be editors which warn you when you open a file with
+      a different newline convention. If universal newlines was made the
+      default such an editor would silently convert your line endings to
+      the local convention on save. Programs which open binary files as
+      text files on Unix would also break (but it could be argued they
+      deserve it :-).
      
    - Interface clarity.  Universal newlines are only supported for
      input files, not for input/output files, as the semantics would
@ -117,6 +158,10 @@ Rationale
    or the "newlines" attribute may inadvertently be set to mixed.  I
    would argue that if you read the same FileObject in two threads
    simultaneously you are asking for trouble anyway.
+    
+    Note that no globally accessible pointers are manipulated in the
+    fgets() or fread() replacement routines, just some integer-valued
+    flags, so the chances of core dumps are zero (he said:-).

    
 Reference Implementation