2002-01-12 19:13:38 -05:00
|
|
|
|
PEP: 277
|
|
|
|
|
Title: Unicode file name support for Windows NT
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
|
|
|
|
Author: neilh@scintilla.org (Neil Hodgson)
|
|
|
|
|
Status: Draft
|
|
|
|
|
Type: Standards Track
|
|
|
|
|
Created: 11-Jan-2002
|
|
|
|
|
Python-Version: 2.3
|
|
|
|
|
Post-History:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
|
|
|
|
|
This PEP discusses supporting access to all files possible on
|
|
|
|
|
Windows NT by passing Unicode file names directly to the system's
|
|
|
|
|
wide-character functions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
|
|
|
|
|
|
Python 2.2 on Win32 platforms converts Unicode file names passed
|
|
|
|
|
to open and to functions in the os module into the 'mbcs' encoding
|
|
|
|
|
before passing the result to the operating system. This is often
|
|
|
|
|
successful in the common case where the script is operating with
|
|
|
|
|
the locale set to the same value as when the file was created.
|
|
|
|
|
Most machines are set up as one locale and rarely if ever changed
|
|
|
|
|
from this locale. For some users, locale is changed more often
|
|
|
|
|
and on servers there are often files saved by users using
|
|
|
|
|
different locales.
|
|
|
|
|
|
|
|
|
|
On Windows NT and descendent operating systems, including Windows
|
|
|
|
|
2000 and Windows XP, wide-character APIs are available that
|
|
|
|
|
provide direct access to all file names, including those that are
|
|
|
|
|
not representable using the current locale. The purpose of this
|
|
|
|
|
proposal is to provide access to these wide-character APIs through
|
|
|
|
|
the standard Python file object and posix module and so provide
|
|
|
|
|
access to all files on Windows NT.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Specification
|
|
|
|
|
|
|
|
|
|
On Windows platforms which provide wide-character file APIs, when
|
|
|
|
|
Unicode arguments are provided to file APIs, wide-character calls
|
|
|
|
|
are made instead of the standard C library and posix calls.
|
|
|
|
|
|
|
|
|
|
The Python file object is extended to use a Unicode file name
|
|
|
|
|
argument directly rather than converting it. This affects the
|
|
|
|
|
file object constructor file(filename[, mode[, bufsize]]) and also
|
|
|
|
|
the open function which is an alias of this constructor. When a
|
|
|
|
|
Unicode filename argument is used here then the name attribute of
|
|
|
|
|
the file object will be Unicode. The representation of a file
|
|
|
|
|
object, repr(f) will display Unicode file names as an escaped
|
|
|
|
|
string in a similar manner to the representation of Unicode
|
|
|
|
|
strings.
|
|
|
|
|
|
|
|
|
|
The posix module contains functions that take file or directory
|
|
|
|
|
names: chdir, listdir, mkdir, open, remove, rename, rmdir, stat,
|
|
|
|
|
and _getfullpathname. These will use Unicode arguments directly
|
|
|
|
|
rather than converting them. For the rename function, this
|
|
|
|
|
behaviour is triggered when either of the arguments is Unicode and
|
|
|
|
|
the other argument converted to Unicode using the default
|
|
|
|
|
encoding.
|
|
|
|
|
|
|
|
|
|
The listdir function currently returns a list of strings. Under
|
|
|
|
|
this proposal, it will return a list of Unicode strings when its
|
|
|
|
|
path argument is Unicode.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Restrictions
|
|
|
|
|
|
|
|
|
|
On the consumer Windows operating systems, Windows 95, Windows 98,
|
|
|
|
|
and Windows ME, there are no wide-character file APIs so behaviour
|
|
|
|
|
is unchanged under this proposal. It may be possible in the
|
|
|
|
|
future to extend this proposal to cover these operating systems as
|
|
|
|
|
the VFAT-32 file system used by them does support Unicode file
|
|
|
|
|
names but access is difficult and so implementing this would
|
|
|
|
|
require much work. The "Microsoft Layer for Unicode" could be a
|
|
|
|
|
starting point for implementing this.
|
|
|
|
|
|
|
|
|
|
Python can be compiled with the size of Unicode characters set to
|
|
|
|
|
4 bytes rather than 2 by defining PY_UNICODE_TYPE to be a 4 byte
|
|
|
|
|
type and Py_UNICODE_SIZE to be 4. As the Windows API does not
|
|
|
|
|
accept 4 byte characters, the features described in this proposal
|
|
|
|
|
will not work in this mode so the implementation falls back to the
|
2002-08-12 07:43:56 -04:00
|
|
|
|
current 'mbcs' encoding technique. This restriction could be lifted
|
2002-01-25 08:09:34 -05:00
|
|
|
|
in the future by performing extra conversions using
|
|
|
|
|
PyUnicode_AsWideChar but for now that would add too much
|
|
|
|
|
complexity for a very rarely used feature.
|
2002-01-12 19:13:38 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Reference Implementation
|
|
|
|
|
|
|
|
|
|
An experimental implementation is available from
|
2002-08-12 07:43:56 -04:00
|
|
|
|
[2] http://scintilla.sourceforge.net/winunichanges.zip
|
|
|
|
|
|
|
|
|
|
[3] An updated version is available at
|
|
|
|
|
http://python.org/sf/594001
|
2002-01-12 19:13:38 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
|
|
|
|
|
|
[1] Microsoft Windows APIs
|
|
|
|
|
http://msdn.microsoft.com/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
fill-column: 70
|
|
|
|
|
End:
|
|
|
|
|
|