* don't be strict on the version removing the code: 3.4 *or maybe later*
 * rephrase some sentences
 * mention that StreamReaderWriter is similar than io.BufferedRWPair
 * add internal links on the Appendix A
 * add links on issues an PEPs
This commit is contained in:
Victor Stinner 2011-07-27 23:45:52 +02:00
parent 5cdf6378e4
commit e1bde7af99
1 changed files with 25 additions and 20 deletions

View File

@ -19,9 +19,11 @@ StreamReaderWriter. Duplicate code means that bugs should be fixed
twice and that we may have subtle differences between the two
implementations.
The codecs module was introduced in Python 2.0, see the PEP 100. The
io module was introduced in Python 2.6 and 3.0 (see the PEP 3116), and
reimplemented in C in Python 2.7 and 3.1.
The codecs module was introduced in Python 2.0 (see the `PEP 100
<http://www.python.org/dev/peps/pep-0100/>`_). The io module was
introduced in Python 2.6 and 3.0 (see the `PEP 3116
<http://www.python.org/dev/peps/pep-3116/>`_), and reimplemented in C in
Python 2.7 and 3.1.
Motivation
@ -45,7 +47,7 @@ writers provide appropriate StreamReader and StreamWriter
implementations in addition to the core codec encode() and decode()
methods. This places a heavy burden on codec authors providing these
specialised implementations to correctly handle many of the corner
cases that have now been dealt with by io.TextIOWrapper. While deeper
cases (see `Appendix A`_) that have now been dealt with by io.TextIOWrapper. While deeper
integration between the codec and the stream allows for additional
optimisations in theory, these optimisations have in practice either
not been carried out and else the associated code duplication means
@ -70,17 +72,16 @@ StreamReader and StreamWriter issues
* StreamReader is unable to translate newlines.
* StreamReaderWriter handles reads using StreamReader and writes
using StreamWriter. These two classes may be inconsistent. To stay
consistent, flush() must be called after each write which slows
down interlaced read-write.
using StreamWriter (as io.BufferedRWPair). These two classes may be
inconsistent. To stay consistent, flush() must be called after each
write which slows down interlaced read-write.
* StreamWriter doesn't support "line buffering" (flush if the input
text contains a newline).
* StreamReader classes of the CJK encodings (e.g. GB18030) don't
support universal newlines, only UNIX newlines ('\\n').
* StreamReader classes of the CJK encodings (e.g. GB18030) only
supports UNIX newlines ('\\n').
* StreamReader and StreamWriter are stateful codecs but don't expose
functions to control their state (getstate() or setstate()). Each
codec has to implement corner cases, see "Issue with stateful
codecs".
codec has to handle corner cases, see `Appendix A`_.
* StreamReader and StreamWriter are very similar to IncrementalReader
and IncrementalEncoder, some code is duplicated for stateful codecs
(e.g. UTF-16).
@ -90,7 +91,7 @@ StreamReader and StreamWriter issues
* No codec implements an optimized method in StreamReader or
StreamWriter based on the specificities of the codec.
Other issues in the bug tracker:
Issues in the bug tracker:
* `Issue #5445 <http://bugs.python.org/issue5445>`_ (2009-03-08):
codecs.StreamWriter.writelines problem when passed generator
@ -120,7 +121,7 @@ TextIOWrapper features
* TextIOWrapper supports any kind of newline, including translating
newlines (to UNIX newlines), to read and write.
* TextIOWrapper reuses incremental encoders and decoders (no
* TextIOWrapper reuses codecs incremental encoders and decoders (no
duplication of code).
* The io module (TextIOWrapper) is faster than the codecs module
(StreamReader). It is implemented in C, whereas codecs is
@ -182,7 +183,8 @@ Keep the public API, codecs.open
''''''''''''''''''''''''''''''''
codecs.open() can be replaced by the builtin open() function. open()
has a similar API but has also more options.
has a similar API but has also more options. Both functions return
file-like objects (same API).
codecs.open() was the only way to open a text file in Unicode mode
until Python 2.6. Many Python 2 programs uses this function. Removing
@ -204,11 +206,12 @@ codecs.open() will be changed to reuse the builtin open() function
(TextIOWrapper).
EncodedFile(), StreamRandom, StreamReader, StreamReaderWriter and
StreamWriter will be removed in Python 3.4.
StreamWriter will be removed in Python 3.4 (or maybe later).
.. _Appendix A:
Issue with stateful codecs
==========================
Appendix A: Issues with stateful codecs
=======================================
It is difficult to use correctly a stateful codec with a stream. Some
cases are supported by the codecs module, while io has no more known
@ -279,7 +282,8 @@ seek(n)
assert f.read() == '###def'
The io module supports this usecase, whereas codecs fails because it
writes a new BOM on the second write (issue #12512).
writes a new BOM on the second write (`issue #12512
<http://bugs.python.org/issue12512>`_).
Append mode
'''''''''''
@ -294,7 +298,8 @@ Append mode
assert f.read() == 'abcdef'
The io module supports this usecase, whereas codecs fails because it
writes a new BOM on the second write (issue #12512).
writes a new BOM on the second write (`issue #12512
<http://bugs.python.org/issue12512>`_).
Links
@ -302,7 +307,7 @@ Links
* `PEP 100: Python Unicode Integration
<http://www.python.org/dev/peps/pep-0100/>`_
* `PEP 3116 <http://www.python.org/dev/peps/pep-3116/>`_
* `PEP 3116: New I/O <http://www.python.org/dev/peps/pep-3116/>`_
* `Issue #8796: Deprecate codecs.open()
<http://bugs.python.org/issue8796>`_
* `[python-dev] Deprecate codecs.open() and StreamWriter/StreamReader