119 lines
4.3 KiB
Plaintext
119 lines
4.3 KiB
Plaintext
PEP: 277
|
||
Title: Unicode file name support for Windows NT
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: neilh@scintilla.org (Neil Hodgson)
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Created: 11-Jan-2002
|
||
Python-Version: 2.3
|
||
Post-History:
|
||
|
||
|
||
Abstract
|
||
|
||
This PEP discusses supporting access to all files possible on
|
||
Windows NT by passing Unicode file names directly to the system's
|
||
wide-character functions.
|
||
|
||
|
||
Rationale
|
||
|
||
Python 2.2 on Win32 platforms converts Unicode file names passed
|
||
to open and to functions in the os module into the 'mbcs' encoding
|
||
before passing the result to the operating system. This is often
|
||
successful in the common case where the script is operating with
|
||
the locale set to the same value as when the file was created.
|
||
Most machines are set up as one locale and rarely if ever changed
|
||
from this locale. For some users, locale is changed more often
|
||
and on servers there are often files saved by users using
|
||
different locales.
|
||
|
||
On Windows NT and descendent operating systems, including Windows
|
||
2000 and Windows XP, wide-character APIs are available that
|
||
provide direct access to all file names, including those that are
|
||
not representable using the current locale. The purpose of this
|
||
proposal is to provide access to these wide-character APIs through
|
||
the standard Python file object and posix module and so provide
|
||
access to all files on Windows NT.
|
||
|
||
|
||
Specification
|
||
|
||
On Windows platforms which provide wide-character file APIs, when
|
||
Unicode arguments are provided to file APIs, wide-character calls
|
||
are made instead of the standard C library and posix calls.
|
||
|
||
The Python file object is extended to use a Unicode file name
|
||
argument directly rather than converting it. This affects the
|
||
file object constructor file(filename[, mode[, bufsize]]) and also
|
||
the open function which is an alias of this constructor. When a
|
||
Unicode filename argument is used here then the name attribute of
|
||
the file object will be Unicode. The representation of a file
|
||
object, repr(f) will display Unicode file names as an escaped
|
||
string in a similar manner to the representation of Unicode
|
||
strings.
|
||
|
||
The posix module contains functions that take file or directory
|
||
names: chdir, listdir, mkdir, open, remove, rename, rmdir, stat,
|
||
and _getfullpathname. These will use Unicode arguments directly
|
||
rather than converting them. For the rename function, this
|
||
behaviour is triggered when either of the arguments is Unicode and
|
||
the other argument converted to Unicode using the default
|
||
encoding.
|
||
|
||
The listdir function currently returns a list of strings. Under
|
||
this proposal, it will return a list of Unicode strings when its
|
||
path argument is Unicode.
|
||
|
||
|
||
Restrictions
|
||
|
||
On the consumer Windows operating systems, Windows 95, Windows 98,
|
||
and Windows ME, there are no wide-character file APIs so behaviour
|
||
is unchanged under this proposal. It may be possible in the
|
||
future to extend this proposal to cover these operating systems as
|
||
the VFAT-32 file system used by them does support Unicode file
|
||
names but access is difficult and so implementing this would
|
||
require much work. The "Microsoft Layer for Unicode" could be a
|
||
starting point for implementing this.
|
||
|
||
Python can be compiled with the size of Unicode characters set to
|
||
4 bytes rather than 2 by defining PY_UNICODE_TYPE to be a 4 byte
|
||
type and Py_UNICODE_SIZE to be 4. As the Windows API does not
|
||
accept 4 byte characters, the features described in this proposal
|
||
will not work in this mode so the implementation falls back to the
|
||
current 'mbcs' encoding technique. This restriction could be lifted
|
||
in the future by performing extra conversions using
|
||
PyUnicode_AsWideChar but for now that would add too much
|
||
complexity for a very rarely used feature.
|
||
|
||
|
||
Reference Implementation
|
||
|
||
An experimental implementation is available from
|
||
[2] http://scintilla.sourceforge.net/winunichanges.zip
|
||
|
||
[3] An updated version is available at
|
||
http://python.org/sf/594001
|
||
|
||
|
||
References
|
||
|
||
[1] Microsoft Windows APIs
|
||
http://msdn.microsoft.com/
|
||
|
||
|
||
Copyright
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
fill-column: 70
|
||
End:
|
||
|