Changes from Nick Coghlan:

Clarify PyUnicode_AsUTF8 usage. Rename PyUnicode_Finalize. Store representation form in state.
2011-01-27 21:37:25 +00:00 · 2011-01-27 21:37:25 +00:00 · d363de45dd
parent 55c04efb71
commit d363de45dd
1 changed files with 18 additions and 10 deletions
--- a/pep-0393.txt
+++ b/pep-0393.txt
@ -69,13 +69,21 @@ The Unicode object structure is changed to this definition::
 These fields have the following interpretations:

 - length: number of code points in the string (result of sq_length)
- str: shortest-form representation of the unicode string; the lower
-  two bits of the pointer indicate the specific form:
-  01 => 1 byte (Latin-1); 10 => 2 byte (UCS-2); 11 => 4 byte (UCS-4);
-  00 => null pointer
-
+- str: shortest-form representation of the unicode string
  The string is null-terminated (in its respective representation).
- hash, state: same as in Python 3.2
+- hash: same as in Python 3.2
+- state:
+
+  * lowest 2 bits (mask 0x03) - interned-state (SSTATE_*) as in 3.2
+  * next 2 bits (mask 0x0C) - form of str:
+
+    + 00 => reserved
+    + 01 => 1 byte (Latin-1)
+    + 10 => 2 byte (UCS-2)
+    + 11 => 4 byte (UCS-4);
+
+  * next bit (mask 0x10): 1 if str memory follows PyUnicodeObject  
+
 - utf8_length, utf8: UTF-8 representation (null-terminated)
 - wstr_length, wstr: representation in platform's wchar_t
  (null-terminated). If wchar_t is 16-bit, this form may use surrogate
@ -123,11 +131,11 @@ representation is not yet set for the string.
 PyUnicode_FromUnicode remains supported but is deprecated. If the
 Py_UNICODE pointer is non-null, the str representation is set. If the
 pointer is NULL, a properly-sized wstr representation is allocated,
-which can be modified until PyUnicode_Finalize() is called (explicitly
+which can be modified until PyUnicode_Ready() is called (explicitly
 or implicitly). Resizing a Unicode string remains possible until it
 is finalized.

-PyUnicode_Finalize() converts a string containing only a wstr
+PyUnicode_Ready() converts a string containing only a wstr
 representation into the canonical representation. Unless wstr and str
 can share the memory, the wstr representation is discarded after the
 conversion.
@ -139,7 +147,7 @@ The canonical representation can be accessed using two macros
 PyUnicode_Kind and PyUnicode_Data. PyUnicode_Kind gives one of the
 value PyUnicode_1BYTE (1), PyUnicode_2BYTE (2), or PyUnicode_4BYTE
 (3). PyUnicode_Data gives the void pointer to the data, masking out
-the pointer kind. All these functions call PyUnicode_Finalize
+the pointer kind. All these functions call PyUnicode_Ready
 in case the canonical representation hasn't been computed yet.

 A new function PyUnicode_AsUTF8 is provided to access the UTF-8
@ -150,7 +158,7 @@ consume memory until the string object is released, applications
 should use the existing PyUnicode_AsUTF8String where possible
 (which generates a new string object every time). API that implicitly
 converts a string to a char* (such as the ParseTuple functions) will
-use this function to compute a conversion.
+use PyUnicode_AsUTF8 to compute a conversion.

 PyUnicode_AsUnicode is deprecated; it computes the wstr representation
 on first use.