Apache Mod-Python - mod_python, unicode, utf-8, latin1

This is Interesting: Free IT Magazines  
Home > Archive > Apache Mod-Python > August 2006 > mod_python, unicode, utf-8, latin1





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author mod_python, unicode, utf-8, latin1
Earle Ady

2006-08-11, 7:12 pm

Aloha!

I've done some searching online regarding character encoding and =20
UTF-8 support within mod_python, but haven't been able to get the =20
proper functionality out of mod_python.

Here's the situation: I have changed my site.py in Python 2.4.3 to =20
use "utf-8" as the default encoding. I have a database with correct =20=

unicode representations in it. I execute routines from the =20
interpreter and get correct unicode objects out of the database. =20
When I run these exact routines from inside of a PSP page, the =20
unicode object has now been latin1 decoded. Please note that from =20
the examples below that I am using identical MySQLdb connection =20
settings.

I am still a bit unclear as to where exactly this is happening inside =20=

of mod_python, and any advice to a solution would be greatly =20
appreciated. It's pretty critical that a developer can provide UTF-8 =20=

support in order for mod_python to gain traction in enterprise =20
applications.

If this is a user error on my part, I'd greatly appreciate being =20
pointed to a proper solution.

Best,
earle.

------ THIS WORKS FROM WITHIN THE INTERPRETER:
(conn, cursor) =3D util.DBConnect(MySQLdb.cursors.DictCursor)

cursor.execute("SELECT * from unicode_test")
items =3D cursor.fetchall()

for item in items:
print item,

# RESULTS: correct unicode:
# (earle@www-1 14:55 266) Python utest.py
# {'data': u'\u9577\u5ca1', 'id': 35L}
# {'data': u'\u9577\u5ca1', 'id': 36L}


------- THIS DOES NOT WORK FROM .PSP, it produces a latin1 decoded =20
unicode object of the correct unicode (see below):

<%
req.content_type =3D 'text/html;charset=3DUTF-8;';
%>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" =20
"http://www.w3.org/TR/xhtml1/DTD/xhtm
l1-transitional.dtd">
<html xmlns=3D"http://www.w3.org/1999/xhtml" dir=3D"ltr" lang=3D"en">
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DUTF-8" =
/>
</head>
<body>
<%@ include file=3D"include/webglobals.psps" %>
<%
(conn, cursor) =3D util.DBConnect(MySQLdb.cursors.DictCursor)
req.write("MYSQL CONNECTION CHARSET: ")
req.write(conn.character_set_name())
req.write("<p/>")
req.write("SYS.DEFAULTENCODING: ")
req.write(sys.getdefaultencoding())
req.write("<p/>")

res =3D cursor.execute("SELECT * from unicode_test")
items =3D cursor.fetchall()

for i in items:
#
req.write("DATA: ")
req.write(i['data'])
req.write(", item: ")
%>
<%=3D i %>
<%
req.write(", BYTES: ")
req.write(i['data'].encode('unicode_escape'))

req.write("<p/>")
#
# end: items

req.write("SHOULD LOOK LIKE THIS: %s" % ( u'\u9577\u5ca1', ))
%>
</body>
</html>

---- RESULTS:

MYSQL CONNECTION CHARSET: utf8

SYS.DEFAULTENCODING: utf-8

DATA: =C3=A9=E2=80=A2=C2=B7=C3=A5=C2=B2=C2=A1,
item: {'data': =
u'\xe9\x95\xb7\xe5\xb2\xa1', 'id': =20
35L} , BYTES: \xe9\x95\xb7\xe5\xb2\xa1

DATA: =C3=A9=E2=80=A2=C2=B7=C3=A5=C2=B2=C2=A1,
item: {'data': =
u'\xe9\x95\xb7\xe5\xb2\xa1', 'id': =20
36L} , BYTES: \xe9\x95\xb7\xe5\xb2\xa1

SHOULD LOOK LIKE THIS: =E9=95=B7=E5=B2=A1


------- Notice if I latin1 decode the -correct- unicode object, i =20
get the exact
unicode object that is appearing inside of the PSP:
[vbcol=seagreen]
u'\xe9\x95\xb7\xe5\xb2\xa1'






Graham Dumpleton

2006-08-11, 7:12 pm

For future reference, a general question like this is better posted =20
to the
mod_python user mailing list and not the developer mailing list as it =20=

isn't
related to internal development of mod_python. There are also a lot more
people on the user mailing list with much more diverse knowledge and
thus you might get a quicker/better answer on the user mailing list.

Anyway, lets see if anyone comes up with anything on the developer
mailing list, but if you don't get an answer in a day or so, you =20
might instead
post it to the more general user mailing list.

The user mailing list is the one mentioned on the mod_python home page.

BTW, changing default character encoding in Python site.py is I =20
believe not
generally seen as a good idea. It would also help in future if you =20
specify which
version of mod_python you are using and in the case of PSP whether =20
you are
triggering PSP direct with mod_python.psp as the handler or whether =20
you are
manually using PSP objects from a mod_python.publisher handler.

Except for those comments I am not a Unicode person so don't know the =20=

ins
and outs of using Unicode with mod_python.

Graham

On 12/08/2006, at 8:33 AM, Earle Ady wrote:

> Aloha!
>
> I've done some searching online regarding character encoding and =20
> UTF-8 support within mod_python, but haven't been able to get the =20
> proper functionality out of mod_python.
>
> Here's the situation: I have changed my site.py in Python 2.4.3 to =20=


> use "utf-8" as the default encoding. I have a database with =20
> correct unicode representations in it. I execute routines from the =20=


> interpreter and get correct unicode objects out of the database. =20
> When I run these exact routines from inside of a PSP page, the =20
> unicode object has now been latin1 decoded. Please note that from =20
> the examples below that I am using identical MySQLdb connection =20
> settings.
>
> I am still a bit unclear as to where exactly this is happening =20
> inside of mod_python, and any advice to a solution would be greatly =20=


> appreciated. It's pretty critical that a developer can provide =20
> UTF-8 support in order for mod_python to gain traction in =20
> enterprise applications.
>
> If this is a user error on my part, I'd greatly appreciate being =20
> pointed to a proper solution.
>
> Best,
> earle.
>
> ------ THIS WORKS FROM WITHIN THE INTERPRETER:
> (conn, cursor) =3D util.DBConnect(MySQLdb.cursors.DictCursor)
>
> cursor.execute("SELECT * from unicode_test")
> items =3D cursor.fetchall()
>
> for item in items:
> print item,
>
> # RESULTS: correct unicode:
> # (earle@www-1 14:55 266) Python utest.py
> # {'data': u'\u9577\u5ca1', 'id': 35L}
> # {'data': u'\u9577\u5ca1', 'id': 36L}
>
>
> ------- THIS DOES NOT WORK FROM .PSP, it produces a latin1 decoded =20=


> unicode object of the correct unicode (see below):
>
> <%
> req.content_type =3D 'text/html;charset=3DUTF-8;';
> %>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" =20
> "http://www.w3.org/TR/xhtml1/DTD/xhtm
> l1-transitional.dtd">
> <html xmlns=3D"http://www.w3.org/1999/xhtml" dir=3D"ltr" lang=3D"en">
> <head>
> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DUTF-8"=

/>
> </head>
> <body>
> <%@ include file=3D"include/webglobals.psps" %>
> <%
> (conn, cursor) =3D util.DBConnect(MySQLdb.cursors.DictCursor)
> req.write("MYSQL CONNECTION CHARSET: ")
> req.write(conn.character_set_name())
> req.write("<p/>")
> req.write("SYS.DEFAULTENCODING: ")
> req.write(sys.getdefaultencoding())
> req.write("<p/>")
>
> res =3D cursor.execute("SELECT * from unicode_test")
> items =3D cursor.fetchall()
>
> for i in items:
> #
> req.write("DATA: ")
> req.write(i['data'])
> req.write(", item: ")
> %>
> <%=3D i %>
> <%
> req.write(", BYTES: ")
> req.write(i['data'].encode('unicode_escape'))
>
> req.write("<p/>")
> #
> # end: items
>
> req.write("SHOULD LOOK LIKE THIS: %s" % ( u'\u9577\u5ca1', ))
> %>
> </body>
> </html>
>
> ---- RESULTS:
>
> mysql CONNECTION CHARSET: utf8
>
> SYS.DEFAULTENCODING: utf-8
>
> DATA: =C3=A9=E2=80=A2=C2=B7=C3=A5=C2=B2=C2=A1,
item: {'data': =

u'\xe9\x95\xb7\xe5\xb2\xa1', =20
> 'id': 35L} , BYTES: \xe9\x95\xb7\xe5\xb2\xa1
>
> DATA: =C3=A9=E2=80=A2=C2=B7=C3=A5=C2=B2=C2=A1,
item: {'data': =

u'\xe9\x95\xb7\xe5\xb2\xa1', =20
> 'id': 36L} , BYTES: \xe9\x95\xb7\xe5\xb2\xa1
>
> SHOULD LOOK LIKE THIS: =E9=95=B7=E5=B2=A1
>
>
> ------- Notice if I latin1 decode the -correct- unicode object, i =20=


> get the exact
> unicode object that is appearing inside of the PSP:
>
> u'\xe9\x95\xb7\xe5\xb2\xa1'
>
>
>
>
>



Earle Ady

2006-08-11, 7:12 pm

Graham,

Thanks I didn't realize I had actually sent this one to the python-=20
dev list by mistake. I will go ahead and get it over to the right =20
list now.

I am also running 3.2.9 with two minor modifications (MySQL Sessions, =20=

and a
try: except: around a split in util.py to fix a bug with Flash 8 and =20
file uploads, which
we must support). Additionally we're using mod_python.psp as the =20
handler.

Mahalo,
earle.


On Aug 11, 2006, at 1:05 PM, Graham Dumpleton wrote:

> For future reference, a general question like this is better posted =20=


> to the
> mod_python user mailing list and not the developer mailing list as =20
> it isn't
> related to internal development of mod_python. There are also a lot =20=


> more
> people on the user mailing list with much more diverse knowledge and
> thus you might get a quicker/better answer on the user mailing list.
>
> Anyway, lets see if anyone comes up with anything on the developer
> mailing list, but if you don't get an answer in a day or so, you =20
> might instead
> post it to the more general user mailing list.
>
> The user mailing list is the one mentioned on the mod_python home =20
> page.
>
> BTW, changing default character encoding in Python site.py is I =20
> believe not
> generally seen as a good idea. It would also help in future if you =20
> specify which
> version of mod_python you are using and in the case of PSP whether =20
> you are
> triggering PSP direct with mod_python.psp as the handler or whether =20=


> you are
> manually using PSP objects from a mod_python.publisher handler.
>
> Except for those comments I am not a Unicode person so don't know =20
> the ins
> and outs of using Unicode with mod_python.
>
> Graham
>
> On 12/08/2006, at 8:33 AM, Earle Ady wrote:
>
[vbcol=seagreen]
charset=3DUTF-8" />[vbcol=seagreen]
u'\xe9\x95\xb7\xe5\xb2\xa1', =20[vbcol=seagreen]
u'\xe9\x95\xb7\xe5\xb2\xa1', =20[vbcol=seagreen]
>
>



Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com