- --with-universal-newlines is now the default

- U is the mode character that signal universal newlines in open in stead
  of t.
- Clarified why output isn't handled.
- Clarified various questions on unicode, exec, locks, etc.
This commit is contained in:
Jack Jansen 2002-03-13 22:50:54 +00:00
parent 4834b3a6a8
commit 69316725d1
1 changed files with 60 additions and 15 deletions

View File

@ -30,16 +30,17 @@ Abstract
Specification
Universal newline support needs to be enabled during the configure
of Python.
Universal newline support needs to be enabled by default,
but can be disabled during the configure of Python.
In a Python with universal newline support the feature is
automatically enabled for all import statements and source()
calls.
automatically enabled for all import statements and execfile()
calls. There is no special support for eval() or exec.
In a Python with universal newline support open() the mode
parameter can also be "t", meaning "open for input as a text file
with universal newline interpretation". Mode "t" cannot be
parameter can also be "U", meaning "open for input as a text file
with universal newline interpretation". Mode "rU" is also allowed,
for symmetry with "rb". Mode "U" cannot be
combined with other mode flags such as "+". Any line ending in the
input file will be seen as a '\n' in Python, so little other code has
to change to handle universal newlines.
@ -71,16 +72,52 @@ Rationale
implementation also allows tracebacks and such (which open the
Python source module) to be handled easily.
Universal newline support is implemented (for this release) as a
compile time option because there is a performance penalty, even
though it should be a small one.
There is no output implementation of universal newlines, Python
programs are expected to handle this by themselves or write files
with platform-local convention otherwise. The reason for this is
that input is the difficult case, outputting different newlines to
a file is already easy enough in Python. It would also slow down
all "normal" Python output, even if only a little.
a file is already easy enough in Python.
Also, an output implementation would be much more difficult than an
input implementation, surprisingly: a lot of output is done through
PyXXX_Print() methods, and at this point the file object is not
available anymore, only a FILE *. So, an output implementation would
need to somehow go from the FILE* to the file object, because that
is where the current newline delimiter is stored.
The input implementation has no such problem: there are no cases in
the Python source tree where files are partially read from C,
partially from Python, and such cases are expected to be rare in
extension modules. If such cases exist the only problem is that the
newlines attribute of the file object is not updated during the
fread() or fgets() calls that are done direct from C.
A partial output implementation, where strings passed to fp.write()
would be converted to use fp.newlines as their line terminated but
all other output would not is far too surprising, in my view.
Because there is no output support for universal newlines there is
also no support for a mode "rU+": the surprise factor of the
previous paragraph would hold to an even stronger degree.
There is no support for universal newlines in strings passed to
eval() or exec. It is envisioned that such strings always have the
standard \n line feed, if the strings come from a file that file can
be read with universal newlines.
I think there are no special issues with unicode. utf-16 shouldn't
pose any new problems, as such files need to be opened in binary
mode anyway. Interaction with utf-8 I am not 100% sure about: is it
possible for a 0x0a or 0x0d byte to occur as part of a multibyte
escape without the stadnard meaning of CR or LF? I assume not,
because if such bytes are allowed it would mean that readline() on
Unix would terminate the read on a 0x0d (and on MacOS on a 0x0a)
without a full line being read.
Universal newline files should work fine with iterators and
xreadlines() as these eventually call the normal file
readline/readlines methods.
While universal newlines are automatically enabled for import they
are not for opening, where you have to specifically say open(...,
@ -88,9 +125,13 @@ Rationale
design:
- Compatibility. Programs which already do their own
interpretation of \r\n in text files would break. Programs
which open binary files as text files on Unix would also break
(but it could be argued they deserve it :-).
interpretation of \r\n in text files would break. Examples of such
programs would be editors which warn you when you open a file with
a different newline convention. If universal newlines was made the
default such an editor would silently convert your line endings to
the local convention on save. Programs which open binary files as
text files on Unix would also break (but it could be argued they
deserve it :-).
- Interface clarity. Universal newlines are only supported for
input files, not for input/output files, as the semantics would
@ -117,6 +158,10 @@ Rationale
or the "newlines" attribute may inadvertently be set to mixed. I
would argue that if you read the same FileObject in two threads
simultaneously you are asking for trouble anyway.
Note that no globally accessible pointers are manipulated in the
fgets() or fread() replacement routines, just some integer-valued
flags, so the chances of core dumps are zero (he said:-).
Reference Implementation