| Nicolas Lehuen 2005-04-30, 7:45 am |
| [Woops, forgot to put the list in the recipients.]
I think in this case the default conversion used is UTF8. Ideally, a
developer returning Unicode strings from functions should have a way
to decide in what encoding (UTF-8, iso-latin-1, etc.) the string
should be returned to the client.
One possible way to do that would be to parse the content-type header,
i.e. if the developer set the content type header to "text/html;
charset=3Diso-8859-1", then we know the developer expect the result to
be encoded in iso-8859-1, so we can do result =3D
object.encode('iso-8859-1').
Here is some tentative code for this :
re_charset =3D re.compile(r"charset\s*=3D\s*([^\s;]+)");
def publish_object(req, object):
if callable(object):
req.form =3D util.FieldStorage(req, keep_blank_values=3D1)
return publish_object(req,util.apply_fs_data(object, req.form, req=
=3Dreq))
elif hasattr(object,'__iter__'):
result =3D False
for item in object:
result |=3D publish_object(req,item)
return result
else:
if object is None:
return False
elif isinstance(object,UnicodeType):
# We try to detect the character encoding
# from the Content-Type header
if req._content_type_set:
charset =3D re_charset.search(req.content_type)
if charset:
charset =3D charset.group(1)
else:
charset =3D 'UTF8'
req.content_type +=3D '; charset=3DUTF8'
else:
charset =3D 'UTF8'
result =3D object.encode(charset)
else:
result =3D str(object)
[...]
Regards,
Nicolas
On 4/30/05, Graham Dumpleton <grahamd@dscpl.com.au> wrote:
>
> On 30/04/2005, at 6:37 PM, Nicolas Lehuen wrote:
>
> What do you see is the issue that required an explicit check for
> UnicodeType
> and avoidance of converting it with str().
>
> As the code is above, req.write() will be called with the
> UnicodeObject. This
> will work provided that the Unicode string can be converted into a
> normal
> string using the default encoding. Ie., in underlying C code
> PyArg_ParseTuple
> will use "s", meaning:
>
> "s" (string or Unicode object) [char *]
> Convert a Python string or Unicode object to a C pointer to a
> character
> string. You must not provide storage for the string itself; a pointer
> to an existing string is stored into the character pointer variable
> whose address you pass. The C string is null-terminated. The Python
> string must not contain embedded null bytes; if it does, a TypeError
> exception is raised. Unicode objects are converted to C strings using
> the default encoding. If this conversion fails, an UnicodeError is
> raised.
>
> I think though that applying str() in the Python code to the Unicode
> string
> probably yields the same result. Ie., str(u'123') results in encode()
> method
> of Unicode string object being called.
>
> S.encode([encoding[,errors]]) -> string
>
> Return an encoded string version of S. Default encoding is the current
> default string encoding. errors may be given to set a different error
> handling scheme. Default is 'strict' meaning that encoding errors raise
> a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
> 'xmlcharrefreplace' as well as any other name registered with
> codecs.register_error that can handle UnicodeEncodeErrors.
>
> In other words, I don't believe there is any difference between
> converting
> it using str() before the call to req.write() as there is passing
> Unicode
> string direct to req.write(). Thus, explicit check for UnicodeType
> probably
> not required.
>
> Graham
>
>
|