114 lines
4.0 KiB
ReStructuredText
114 lines
4.0 KiB
ReStructuredText
PEP: 277
|
|
Title: Unicode file name support for Windows NT
|
|
Author: Neil Hodgson <neilh@scintilla.org>
|
|
Status: Final
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 11-Jan-2002
|
|
Python-Version: 2.3
|
|
Post-History:
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP discusses supporting access to all files possible on
|
|
Windows NT by passing Unicode file names directly to the system's
|
|
wide-character functions.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
Python 2.2 on Win32 platforms converts Unicode file names passed
|
|
to open and to functions in the ``os`` module into the 'mbcs' encoding
|
|
before passing the result to the operating system. This is often
|
|
successful in the common case where the script is operating with
|
|
the locale set to the same value as when the file was created.
|
|
Most machines are set up as one locale and rarely if ever changed
|
|
from this locale. For some users, locale is changed more often
|
|
and on servers there are often files saved by users using
|
|
different locales.
|
|
|
|
On Windows NT and descendent operating systems, including Windows
|
|
2000 and Windows XP, wide-character APIs are available that
|
|
provide direct access to all file names, including those that are
|
|
not representable using the current locale. The purpose of this
|
|
proposal is to provide access to these wide-character APIs through
|
|
the standard Python file object and posix module and so provide
|
|
access to all files on Windows NT.
|
|
|
|
|
|
Specification
|
|
=============
|
|
|
|
On Windows platforms which provide wide-character file APIs, when
|
|
Unicode arguments are provided to file APIs, wide-character calls
|
|
are made instead of the standard C library and posix calls.
|
|
|
|
The Python file object is extended to use a Unicode file name
|
|
argument directly rather than converting it. This affects the
|
|
file object constructor ``file(filename[, mode[, bufsize]])`` and also
|
|
the ``open`` function which is an alias of this constructor. When a
|
|
Unicode filename argument is used here then the ``name`` attribute of
|
|
the file object will be Unicode. The representation of a file
|
|
object, ``repr(f)`` will display Unicode file names as an escaped
|
|
string in a similar manner to the representation of Unicode
|
|
strings.
|
|
|
|
The ``posix`` module contains functions that take file or directory
|
|
names: ``chdir``, ``listdir``, ``mkdir``, ``open``, ``remove``, ``rename``,
|
|
``rmdir``, ``stat``, and ``_getfullpathname``. These will use Unicode
|
|
arguments directly rather than converting them. For the ``rename`` function, this
|
|
behaviour is triggered when either of the arguments is Unicode and
|
|
the other argument converted to Unicode using the default
|
|
encoding.
|
|
|
|
The ``listdir`` function currently returns a list of strings. Under
|
|
this proposal, it will return a list of Unicode strings when its
|
|
path argument is Unicode.
|
|
|
|
|
|
Restrictions
|
|
============
|
|
|
|
On the consumer Windows operating systems, Windows 95, Windows 98,
|
|
and Windows ME, there are no wide-character file APIs so behaviour
|
|
is unchanged under this proposal. It may be possible in the
|
|
future to extend this proposal to cover these operating systems as
|
|
the VFAT-32 file system used by them does support Unicode file
|
|
names but access is difficult and so implementing this would
|
|
require much work. The "Microsoft Layer for Unicode" could be a
|
|
starting point for implementing this.
|
|
|
|
Python can be compiled with the size of Unicode characters set to
|
|
4 bytes rather than 2 by defining ``PY_UNICODE_TYPE`` to be a 4 byte
|
|
type and ``Py_UNICODE_SIZE`` to be 4. As the Windows API does not
|
|
accept 4 byte characters, the features described in this proposal
|
|
will not work in this mode so the implementation falls back to the
|
|
current 'mbcs' encoding technique. This restriction could be lifted
|
|
in the future by performing extra conversions using
|
|
``PyUnicode_AsWideChar`` but for now that would add too much
|
|
complexity for a very rarely used feature.
|
|
|
|
|
|
Reference Implementation
|
|
========================
|
|
|
|
The implementation is available at [2]_.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
[1] Microsoft Windows APIs
|
|
\ https://msdn.microsoft.com/
|
|
|
|
.. [2] https://github.com/python/cpython/issues/37017
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|