diff --git a/pep-0393.txt b/pep-0393.txt index f84d13837..145d515e4 100644 --- a/pep-0393.txt +++ b/pep-0393.txt @@ -41,7 +41,7 @@ One problem with the approach is support for existing applications may be computed. Applications are encouraged to phase out reliance on a specific internal representation if possible. As interaction with other libraries will often require some sort of internal -representation, the specification choses UTF-8 as the recommended way +representation, the specification chooses UTF-8 as the recommended way of exposing strings to C code. For many strings (e.g. ASCII), multiple representations may actually @@ -69,7 +69,7 @@ The Unicode object structure is changed to this definition:: These fields have the following interpretations: - length: number of code points in the string (result of sq_length) -- str: shortest-form representation of the unicode string +- str: shortest-form representation of the unicode string. The string is null-terminated (in its respective representation). - hash: same as in Python 3.2 - state: @@ -145,7 +145,7 @@ String Access The canonical representation can be accessed using two macros PyUnicode_Kind and PyUnicode_Data. PyUnicode_Kind gives one of the -value PyUnicode_1BYTE (1), PyUnicode_2BYTE (2), or PyUnicode_4BYTE +values PyUnicode_1BYTE (1), PyUnicode_2BYTE (2), or PyUnicode_4BYTE (3). PyUnicode_Data gives the void pointer to the data, masking out the pointer kind. All these functions call PyUnicode_Ready in case the canonical representation hasn't been computed yet. @@ -156,7 +156,7 @@ _PyUnicode_AsString, which is removed. The function will compute the utf8 representation when first called. Since this representation will consume memory until the string object is released, applications should use the existing PyUnicode_AsUTF8String where possible -(which generates a new string object every time). API that implicitly +(which generates a new string object every time). APIs that implicitly converts a string to a char* (such as the ParseTuple functions) will use PyUnicode_AsUTF8 to compute a conversion. @@ -187,7 +187,7 @@ Discussion Several concerns have been raised about the approach presented here: It makes the implementation more complex. That's true, but considered -worth given the gains. +worth it given the benefits. The Py_Unicode representation is not instantaneously available, slowing down applications that request it. While this is also true,