mod_python.publisher : proposal for a few implementation changes
Web Server forum
Back To The Forum Home!Search!Private Messaging System

Web Server Talk Web Server Talk > Web Servers reviews > Apache Server configuration support > Apache Mod-Python > mod_python.publisher : proposal for a few implementation changes




  Last Thread   Next Thread Next
  Show Printable Version Email this Page Subscribe to this Thread      Post New Thread    Post A Reply      

    mod_python.publisher : proposal for a few implementation changes  
Nicolas Lehuen


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
04-30-05 12:45 PM

[Woops, forgot to put the list in the recipients.]

I think in this case the default conversion used is UTF8. Ideally, a
developer returning Unicode strings from functions should have a way
to decide in what encoding (UTF-8, iso-latin-1, etc.) the string
should be returned to the client.

One possible way to do that would be to parse the content-type header,
i.e. if the developer set the content type header to "text/html;
charset=3Diso-8859-1", then we know the developer expect the result to
be encoded in iso-8859-1, so we can do result =3D
object.encode('iso-8859-1').

Here is some tentative code for this :

re_charset =3D re.compile(r"charset\s*=3D\s*([^\s;]+)");

def publish_object(req, object):
if callable(object):
req.form =3D util.FieldStorage(req, keep_blank_values=3D1)
return publish_object(req,util.apply_fs_data(object, req.form, req=
=3Dreq))
elif hasattr(object,'__iter__'):
result =3D False
for item in object:
result |=3D publish_object(req,item)
return result
else:
if object is None:
return False
elif isinstance(object,UnicodeType):
# We try to detect the character encoding
# from the Content-Type header
if req._content_type_set:
charset =3D re_charset.search(req.content_type)
if charset:
charset =3D charset.group(1)
else:
charset =3D 'UTF8'
req.content_type +=3D '; charset=3DUTF8'
else:
charset =3D 'UTF8'

result =3D object.encode(charset)
else:
result =3D str(object)

[...]

Regards,
Nicolas

On 4/30/05, Graham Dumpleton <grahamd@dscpl.com.au> wrote:
>
> On 30/04/2005, at 6:37 PM, Nicolas Lehuen wrote: 
>
> What do you see is the issue that required an explicit check for
> UnicodeType
> and avoidance of converting it with str().
>
> As the code is above, req.write() will be called with the
> UnicodeObject. This
> will work provided that the Unicode string can be converted into a
> normal
> string using the default encoding. Ie., in underlying C code
> PyArg_ParseTuple
> will use "s", meaning:
>
> "s" (string or Unicode object) [char *]
>    Convert a Python string or Unicode object to a C pointer to a
> character
>    string. You must not provide storage for the string itself; a pointer
>    to an existing string is stored into the character pointer variable
>    whose address you pass. The C string is null-terminated. The Python
>    string must not contain embedded null bytes; if it does, a TypeError
>    exception is raised. Unicode objects are converted to C strings using
>    the default encoding. If this conversion fails, an UnicodeError is
> raised.
>
> I think though that applying str() in the Python code to the Unicode
> string
> probably yields the same result. Ie., str(u'123') results in encode()
> method
> of Unicode string object being called.
>
> S.encode([encoding[,errors]]) -> string
>
> Return an encoded string version of S. Default encoding is the current
> default string encoding. errors may be given to set a different error
> handling scheme. Default is 'strict' meaning that encoding errors raise
> a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
> 'xmlcharrefreplace' as well as any other name registered with
> codecs.register_error that can handle UnicodeEncodeErrors.
>
> In other words, I don't believe there is any difference between
> converting
> it using str() before the call to req.write() as there is passing
> Unicode
> string direct to req.write(). Thus, explicit check for UnicodeType
> probably
> not required.
>
> Graham
>
>






[ Post a follow-up to this message ]



    Sponsored Links  




 





   All times are GMT. The time now is 05:44 AM.      Post New Thread    Post A Reply      
  Last Thread   Next Thread Next


Most Popular forums 

Forum Jump:
Rate This Thread:

Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
 
Medical and Health forum | Computer Games Reviews | Graphics design forum

Back To The Top
Home | Usercp | Faq | Register