PEP 277, Unicode file name support for Windows NT, Neil Hodgson
This commit is contained in:
parent
853d3b59d9
commit
5071ad8ef0
|
@ -87,6 +87,7 @@ Index by Category
|
|||
S 274 Dict Comprehensions Warsaw
|
||||
S 275 Switching on Multiple Values Lemburg
|
||||
S 276 Simple Iterator for ints Althoff
|
||||
S 277 Unicode file name support for Windows NT Hodgson
|
||||
|
||||
Finished PEPs (done, implemented in CVS)
|
||||
|
||||
|
@ -235,6 +236,7 @@ Numerical Index
|
|||
S 274 Dict Comprehensions Warsaw
|
||||
S 275 Switching on Multiple Values Lemburg
|
||||
S 276 Simple Iterator for ints Althoff
|
||||
S 277 Unicode file name support for Windows NT Hodgson
|
||||
SR 666 Reject Foolish Indentation Creighton
|
||||
|
||||
|
||||
|
@ -265,6 +267,7 @@ Owners
|
|||
Giacometti, Frédéric B. fred@arakne.com
|
||||
Goodger, David dgoodger@bigfoot.com
|
||||
Griffin, Grant g2@iowegian.com
|
||||
Hodgson, Neil neilh@scintilla.org
|
||||
Hudson, Michael mwh@python.net
|
||||
Hylton, Jeremy jeremy@zope.com
|
||||
Kuchling, Andrew akuchlin@mems-exchange.org
|
||||
|
|
|
@ -0,0 +1,118 @@
|
|||
PEP: 277
|
||||
Title: Unicode file name support for Windows NT
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: neilh@scintilla.org (Neil Hodgson)
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Created: 11-Jan-2002
|
||||
Python-Version: 2.3
|
||||
Post-History:
|
||||
|
||||
|
||||
Abstract
|
||||
|
||||
This PEP discusses supporting access to all files possible on
|
||||
Windows NT by passing Unicode file names directly to the system's
|
||||
wide-character functions.
|
||||
|
||||
|
||||
Rationale
|
||||
|
||||
Python 2.2 on Win32 platforms converts Unicode file names passed
|
||||
to open and to functions in the os module into the 'mbcs' encoding
|
||||
before passing the result to the operating system. This is often
|
||||
successful in the common case where the script is operating with
|
||||
the locale set to the same value as when the file was created.
|
||||
Most machines are set up as one locale and rarely if ever changed
|
||||
from this locale. For some users, locale is changed more often
|
||||
and on servers there are often files saved by users using
|
||||
different locales.
|
||||
|
||||
On Windows NT and descendent operating systems, including Windows
|
||||
2000 and Windows XP, wide-character APIs are available that
|
||||
provide direct access to all file names, including those that are
|
||||
not representable using the current locale. The purpose of this
|
||||
proposal is to provide access to these wide-character APIs through
|
||||
the standard Python file object and posix module and so provide
|
||||
access to all files on Windows NT.
|
||||
|
||||
|
||||
Specification
|
||||
|
||||
On Windows platforms which provide wide-character file APIs, when
|
||||
Unicode arguments are provided to file APIs, wide-character calls
|
||||
are made instead of the standard C library and posix calls.
|
||||
|
||||
The Python file object is extended to use a Unicode file name
|
||||
argument directly rather than converting it. This affects the
|
||||
file object constructor file(filename[, mode[, bufsize]]) and also
|
||||
the open function which is an alias of this constructor. When a
|
||||
Unicode filename argument is used here then the name attribute of
|
||||
the file object will be Unicode. The representation of a file
|
||||
object, repr(f) will display Unicode file names as an escaped
|
||||
string in a similar manner to the representation of Unicode
|
||||
strings.
|
||||
|
||||
The posix module contains functions that take file or directory
|
||||
names: chdir, listdir, mkdir, open, remove, rename, rmdir, stat,
|
||||
and _getfullpathname. These will use Unicode arguments directly
|
||||
rather than converting them. For the rename function, this
|
||||
behaviour is triggered when either of the arguments is Unicode and
|
||||
the other argument converted to Unicode using the default
|
||||
encoding.
|
||||
|
||||
The listdir function currently returns a list of strings. Under
|
||||
this proposal, it will return a list of Unicode strings when its
|
||||
path argument is Unicode.
|
||||
|
||||
To allow client code to determine that these features are
|
||||
implemented, the unicodefilenames function is provided. This
|
||||
function returns true when the underlying system supports file
|
||||
names containing most Unicode characters and any valid file name
|
||||
may be passed to open as a Unicode string.
|
||||
|
||||
|
||||
Restrictions
|
||||
|
||||
On the consumer Windows operating systems, Windows 95, Windows 98,
|
||||
and Windows ME, there are no wide-character file APIs so behaviour
|
||||
is unchanged under this proposal. It may be possible in the
|
||||
future to extend this proposal to cover these operating systems as
|
||||
the VFAT-32 file system used by them does support Unicode file
|
||||
names but access is difficult and so implementing this would
|
||||
require much work. The "Microsoft Layer for Unicode" could be a
|
||||
starting point for implementing this.
|
||||
|
||||
Python can be compiled with the size of Unicode characters set to
|
||||
4 bytes rather than 2 by defining PY_UNICODE_TYPE to be a 4 byte
|
||||
type and Py_UNICODE_SIZE to be 4. As the Windows API does not
|
||||
accept 4 byte characters, the features described in this proposal
|
||||
will not work in this mode so the implementation falls back to the
|
||||
current 'mbcs' encoding technique.
|
||||
|
||||
|
||||
Reference Implementation
|
||||
|
||||
An experimental implementation is available from
|
||||
http://scintilla.sourceforge.net/winunichanges.zip
|
||||
|
||||
|
||||
References
|
||||
|
||||
[1] Microsoft Windows APIs
|
||||
http://msdn.microsoft.com/
|
||||
|
||||
|
||||
Copyright
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
fill-column: 70
|
||||
End:
|
||||
|
Loading…
Reference in New Issue