Grammar fixes in the unicode vs bytes section of PEP 404

This commit is contained in:
Nick Coghlan 2011-11-15 20:32:12 +10:00
parent f2c1df7542
commit 6230377206
1 changed files with 8 additions and 7 deletions

View File

@ -92,13 +92,14 @@ they play a dual role in Python 2 as both ASCII text and as byte
sequences. While Python 2 also has a unicode string type, the
fundamental ambiguity of the core string type, coupled with Python 2's
default behavior of supporting automatic coercion from 8-bit strings
to unicodes when the two are combined, often leads to `UnicodeError`s.
Python 3's standard string type is a unicode, and Python 3 adds a
bytes type, but critically, no automatic coercion between bytes and
unicodes is provided (the closest we get are a few text-based APIs that
assume UTF-8 as the default encoding if no encoding is explicitly stated).
Thus, the core interpreter, its I/O libraries, module names, etc. are clear
in their distinction between unicode strings and bytes. Python 3's unicode
to unicode objects when the two are combined, often leads to
`UnicodeError`s. Python 3's standard string type is Unicode based, and
Python 3 adds a dedicated bytes type, but critically, no automatic coercion
between bytes and unicode strings is provided. The closest the language gets
to implicit coercion are a few text-based APIs that assume a default
encoding (usually UTF-8) if no encoding is explicitly stated. Thus, the core
interpreter, its I/O libraries, module names, etc. are clear in their
distinction between unicode strings and bytes. Python 3's unicode
support even extends to the filesystem, so that non-ASCII file names are
natively supported.