PEP 277, Unicode file name support for Windows NT, Neil Hodgson
This commit is contained in:
parent
853d3b59d9
commit
5071ad8ef0
|
@ -87,6 +87,7 @@ Index by Category
|
||||||
S 274 Dict Comprehensions Warsaw
|
S 274 Dict Comprehensions Warsaw
|
||||||
S 275 Switching on Multiple Values Lemburg
|
S 275 Switching on Multiple Values Lemburg
|
||||||
S 276 Simple Iterator for ints Althoff
|
S 276 Simple Iterator for ints Althoff
|
||||||
|
S 277 Unicode file name support for Windows NT Hodgson
|
||||||
|
|
||||||
Finished PEPs (done, implemented in CVS)
|
Finished PEPs (done, implemented in CVS)
|
||||||
|
|
||||||
|
@ -235,6 +236,7 @@ Numerical Index
|
||||||
S 274 Dict Comprehensions Warsaw
|
S 274 Dict Comprehensions Warsaw
|
||||||
S 275 Switching on Multiple Values Lemburg
|
S 275 Switching on Multiple Values Lemburg
|
||||||
S 276 Simple Iterator for ints Althoff
|
S 276 Simple Iterator for ints Althoff
|
||||||
|
S 277 Unicode file name support for Windows NT Hodgson
|
||||||
SR 666 Reject Foolish Indentation Creighton
|
SR 666 Reject Foolish Indentation Creighton
|
||||||
|
|
||||||
|
|
||||||
|
@ -265,6 +267,7 @@ Owners
|
||||||
Giacometti, Frédéric B. fred@arakne.com
|
Giacometti, Frédéric B. fred@arakne.com
|
||||||
Goodger, David dgoodger@bigfoot.com
|
Goodger, David dgoodger@bigfoot.com
|
||||||
Griffin, Grant g2@iowegian.com
|
Griffin, Grant g2@iowegian.com
|
||||||
|
Hodgson, Neil neilh@scintilla.org
|
||||||
Hudson, Michael mwh@python.net
|
Hudson, Michael mwh@python.net
|
||||||
Hylton, Jeremy jeremy@zope.com
|
Hylton, Jeremy jeremy@zope.com
|
||||||
Kuchling, Andrew akuchlin@mems-exchange.org
|
Kuchling, Andrew akuchlin@mems-exchange.org
|
||||||
|
|
|
@ -0,0 +1,118 @@
|
||||||
|
PEP: 277
|
||||||
|
Title: Unicode file name support for Windows NT
|
||||||
|
Version: $Revision$
|
||||||
|
Last-Modified: $Date$
|
||||||
|
Author: neilh@scintilla.org (Neil Hodgson)
|
||||||
|
Status: Draft
|
||||||
|
Type: Standards Track
|
||||||
|
Created: 11-Jan-2002
|
||||||
|
Python-Version: 2.3
|
||||||
|
Post-History:
|
||||||
|
|
||||||
|
|
||||||
|
Abstract
|
||||||
|
|
||||||
|
This PEP discusses supporting access to all files possible on
|
||||||
|
Windows NT by passing Unicode file names directly to the system's
|
||||||
|
wide-character functions.
|
||||||
|
|
||||||
|
|
||||||
|
Rationale
|
||||||
|
|
||||||
|
Python 2.2 on Win32 platforms converts Unicode file names passed
|
||||||
|
to open and to functions in the os module into the 'mbcs' encoding
|
||||||
|
before passing the result to the operating system. This is often
|
||||||
|
successful in the common case where the script is operating with
|
||||||
|
the locale set to the same value as when the file was created.
|
||||||
|
Most machines are set up as one locale and rarely if ever changed
|
||||||
|
from this locale. For some users, locale is changed more often
|
||||||
|
and on servers there are often files saved by users using
|
||||||
|
different locales.
|
||||||
|
|
||||||
|
On Windows NT and descendent operating systems, including Windows
|
||||||
|
2000 and Windows XP, wide-character APIs are available that
|
||||||
|
provide direct access to all file names, including those that are
|
||||||
|
not representable using the current locale. The purpose of this
|
||||||
|
proposal is to provide access to these wide-character APIs through
|
||||||
|
the standard Python file object and posix module and so provide
|
||||||
|
access to all files on Windows NT.
|
||||||
|
|
||||||
|
|
||||||
|
Specification
|
||||||
|
|
||||||
|
On Windows platforms which provide wide-character file APIs, when
|
||||||
|
Unicode arguments are provided to file APIs, wide-character calls
|
||||||
|
are made instead of the standard C library and posix calls.
|
||||||
|
|
||||||
|
The Python file object is extended to use a Unicode file name
|
||||||
|
argument directly rather than converting it. This affects the
|
||||||
|
file object constructor file(filename[, mode[, bufsize]]) and also
|
||||||
|
the open function which is an alias of this constructor. When a
|
||||||
|
Unicode filename argument is used here then the name attribute of
|
||||||
|
the file object will be Unicode. The representation of a file
|
||||||
|
object, repr(f) will display Unicode file names as an escaped
|
||||||
|
string in a similar manner to the representation of Unicode
|
||||||
|
strings.
|
||||||
|
|
||||||
|
The posix module contains functions that take file or directory
|
||||||
|
names: chdir, listdir, mkdir, open, remove, rename, rmdir, stat,
|
||||||
|
and _getfullpathname. These will use Unicode arguments directly
|
||||||
|
rather than converting them. For the rename function, this
|
||||||
|
behaviour is triggered when either of the arguments is Unicode and
|
||||||
|
the other argument converted to Unicode using the default
|
||||||
|
encoding.
|
||||||
|
|
||||||
|
The listdir function currently returns a list of strings. Under
|
||||||
|
this proposal, it will return a list of Unicode strings when its
|
||||||
|
path argument is Unicode.
|
||||||
|
|
||||||
|
To allow client code to determine that these features are
|
||||||
|
implemented, the unicodefilenames function is provided. This
|
||||||
|
function returns true when the underlying system supports file
|
||||||
|
names containing most Unicode characters and any valid file name
|
||||||
|
may be passed to open as a Unicode string.
|
||||||
|
|
||||||
|
|
||||||
|
Restrictions
|
||||||
|
|
||||||
|
On the consumer Windows operating systems, Windows 95, Windows 98,
|
||||||
|
and Windows ME, there are no wide-character file APIs so behaviour
|
||||||
|
is unchanged under this proposal. It may be possible in the
|
||||||
|
future to extend this proposal to cover these operating systems as
|
||||||
|
the VFAT-32 file system used by them does support Unicode file
|
||||||
|
names but access is difficult and so implementing this would
|
||||||
|
require much work. The "Microsoft Layer for Unicode" could be a
|
||||||
|
starting point for implementing this.
|
||||||
|
|
||||||
|
Python can be compiled with the size of Unicode characters set to
|
||||||
|
4 bytes rather than 2 by defining PY_UNICODE_TYPE to be a 4 byte
|
||||||
|
type and Py_UNICODE_SIZE to be 4. As the Windows API does not
|
||||||
|
accept 4 byte characters, the features described in this proposal
|
||||||
|
will not work in this mode so the implementation falls back to the
|
||||||
|
current 'mbcs' encoding technique.
|
||||||
|
|
||||||
|
|
||||||
|
Reference Implementation
|
||||||
|
|
||||||
|
An experimental implementation is available from
|
||||||
|
http://scintilla.sourceforge.net/winunichanges.zip
|
||||||
|
|
||||||
|
|
||||||
|
References
|
||||||
|
|
||||||
|
[1] Microsoft Windows APIs
|
||||||
|
http://msdn.microsoft.com/
|
||||||
|
|
||||||
|
|
||||||
|
Copyright
|
||||||
|
|
||||||
|
This document has been placed in the public domain.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Local Variables:
|
||||||
|
mode: indented-text
|
||||||
|
indent-tabs-mode: nil
|
||||||
|
fill-column: 70
|
||||||
|
End:
|
||||||
|
|
Loading…
Reference in New Issue