PEP 277, Unicode file name support for Windows NT, Neil Hodgson

This commit is contained in:
Barry Warsaw 2002-01-13 00:13:38 +00:00
parent 853d3b59d9
commit 5071ad8ef0
2 changed files with 121 additions and 0 deletions

View File

@ -87,6 +87,7 @@ Index by Category
S 274 Dict Comprehensions Warsaw S 274 Dict Comprehensions Warsaw
S 275 Switching on Multiple Values Lemburg S 275 Switching on Multiple Values Lemburg
S 276 Simple Iterator for ints Althoff S 276 Simple Iterator for ints Althoff
S 277 Unicode file name support for Windows NT Hodgson
Finished PEPs (done, implemented in CVS) Finished PEPs (done, implemented in CVS)
@ -235,6 +236,7 @@ Numerical Index
S 274 Dict Comprehensions Warsaw S 274 Dict Comprehensions Warsaw
S 275 Switching on Multiple Values Lemburg S 275 Switching on Multiple Values Lemburg
S 276 Simple Iterator for ints Althoff S 276 Simple Iterator for ints Althoff
S 277 Unicode file name support for Windows NT Hodgson
SR 666 Reject Foolish Indentation Creighton SR 666 Reject Foolish Indentation Creighton
@ -265,6 +267,7 @@ Owners
Giacometti, Frédéric B. fred@arakne.com Giacometti, Frédéric B. fred@arakne.com
Goodger, David dgoodger@bigfoot.com Goodger, David dgoodger@bigfoot.com
Griffin, Grant g2@iowegian.com Griffin, Grant g2@iowegian.com
Hodgson, Neil neilh@scintilla.org
Hudson, Michael mwh@python.net Hudson, Michael mwh@python.net
Hylton, Jeremy jeremy@zope.com Hylton, Jeremy jeremy@zope.com
Kuchling, Andrew akuchlin@mems-exchange.org Kuchling, Andrew akuchlin@mems-exchange.org

118
pep-0277.txt Normal file
View File

@ -0,0 +1,118 @@
PEP: 277
Title: Unicode file name support for Windows NT
Version: $Revision$
Last-Modified: $Date$
Author: neilh@scintilla.org (Neil Hodgson)
Status: Draft
Type: Standards Track
Created: 11-Jan-2002
Python-Version: 2.3
Post-History:
Abstract
This PEP discusses supporting access to all files possible on
Windows NT by passing Unicode file names directly to the system's
wide-character functions.
Rationale
Python 2.2 on Win32 platforms converts Unicode file names passed
to open and to functions in the os module into the 'mbcs' encoding
before passing the result to the operating system. This is often
successful in the common case where the script is operating with
the locale set to the same value as when the file was created.
Most machines are set up as one locale and rarely if ever changed
from this locale. For some users, locale is changed more often
and on servers there are often files saved by users using
different locales.
On Windows NT and descendent operating systems, including Windows
2000 and Windows XP, wide-character APIs are available that
provide direct access to all file names, including those that are
not representable using the current locale. The purpose of this
proposal is to provide access to these wide-character APIs through
the standard Python file object and posix module and so provide
access to all files on Windows NT.
Specification
On Windows platforms which provide wide-character file APIs, when
Unicode arguments are provided to file APIs, wide-character calls
are made instead of the standard C library and posix calls.
The Python file object is extended to use a Unicode file name
argument directly rather than converting it. This affects the
file object constructor file(filename[, mode[, bufsize]]) and also
the open function which is an alias of this constructor. When a
Unicode filename argument is used here then the name attribute of
the file object will be Unicode. The representation of a file
object, repr(f) will display Unicode file names as an escaped
string in a similar manner to the representation of Unicode
strings.
The posix module contains functions that take file or directory
names: chdir, listdir, mkdir, open, remove, rename, rmdir, stat,
and _getfullpathname. These will use Unicode arguments directly
rather than converting them. For the rename function, this
behaviour is triggered when either of the arguments is Unicode and
the other argument converted to Unicode using the default
encoding.
The listdir function currently returns a list of strings. Under
this proposal, it will return a list of Unicode strings when its
path argument is Unicode.
To allow client code to determine that these features are
implemented, the unicodefilenames function is provided. This
function returns true when the underlying system supports file
names containing most Unicode characters and any valid file name
may be passed to open as a Unicode string.
Restrictions
On the consumer Windows operating systems, Windows 95, Windows 98,
and Windows ME, there are no wide-character file APIs so behaviour
is unchanged under this proposal. It may be possible in the
future to extend this proposal to cover these operating systems as
the VFAT-32 file system used by them does support Unicode file
names but access is difficult and so implementing this would
require much work. The "Microsoft Layer for Unicode" could be a
starting point for implementing this.
Python can be compiled with the size of Unicode characters set to
4 bytes rather than 2 by defining PY_UNICODE_TYPE to be a 4 byte
type and Py_UNICODE_SIZE to be 4. As the Windows API does not
accept 4 byte characters, the features described in this proposal
will not work in this mode so the implementation falls back to the
current 'mbcs' encoding technique.
Reference Implementation
An experimental implementation is available from
http://scintilla.sourceforge.net/winunichanges.zip
References
[1] Microsoft Windows APIs
http://msdn.microsoft.com/
Copyright
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
fill-column: 70
End: