Debian Developers - python's gettext.gettext broken, use gettext.lgettext

This is Interesting: Free IT Magazines  
Home > Archive > Debian Developers > August 2005 > python's gettext.gettext broken, use gettext.lgettext





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author python's gettext.gettext broken, use gettext.lgettext
Junichi Uekawa

2005-08-07, 8:49 pm

Hi,

While I was hacking at debconf, I noticed that
python's gettext function returns strings encoded in the
original encoding; which will appear as garbage on
the screen.

With python2.4, lgettext is added, which seems to do
the right thing and return the string encoded in the
current CODESET.

#318578 (linda), #318581 (apt-listchanges) is a workaround
for 2.3, which looks slightly large, but workable.


+ gettext_encoding=locale.getpreferredencoding()
+ my_ugettext = gettext.translation('apt-listchanges').ugettext
+ def lgettext(msgid):
+ return my_ugettext(msgid).encode(gettext_encoding)
+ _ = lgettext



Correct me if I'm missing something, since Python is not
my best language.


regards,
junichi

--
Junichi Uekawa, Debian Developer http://www.netfort.gr.jp/~dancer/
183A 70FC 4732 1B87 57A5 CE82 D837 7D4E E81E 55C1


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Joe Wreschnig

2005-08-08, 2:51 am

On Mon, 2005-08-08 at 07:45 +0900, Junichi Uekawa wrote:
> Hi,
>
> While I was hacking at debconf, I noticed that
> python's gettext function returns strings encoded in the
> original encoding; which will appear as garbage on
> the screen.


The best way to do gettext in Python is to do:

gettext.install(textdomain, unicode=True)

Which installs ugettext as '_' function into the __builtin__ namespace.
That makes _ return Python 'unicode' objects, which is what programs
should be using internally anyway.

This is harder if you're trying to localize a module since then you
don't want to screw with __builtin__; you should use a local _
assignment instead (http://www.python.org/doc/current/lib/node329.html).
It's basically what you wrote.
--
Joe Wreschnig <piman@debian.org>

Martin v. Löwis

2005-08-08, 8:30 am

Joe Wreschnig wrote:
> Which installs ugettext as '_' function into the __builtin__ namespace.
> That makes _ return Python 'unicode' objects, which is what programs
> should be using internally anyway.
>
> This is harder if you're trying to localize a module since then you
> don't want to screw with __builtin__


It is also useless for the issues at hand: since linda and
apt-listchanges apparently use local strings, giving them Unicode
strings would break them. So Junichi's change looks right to me.

Regards,
Martin


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Steve Kowalik

2005-08-08, 8:30 am

On Mon, 08 Aug 2005 10:24:50 +0200, Martin v. L=F6wis uttered
> It is also useless for the issues at hand: since linda and
> apt-listchanges apparently use local strings, giving them Unicode
> strings would break them. So Junichi's change looks right to me.
>=20

Standing up for Linda, I am more than willing to fix her usage of
gettext, and I am currently investigating using Joe's suggestion, to
see what that gives me.=20

Anyway, in Python, unicode string objects behave the same as normal
string objects, so to my mind, the breakage should be minimal.

Cheers,
--=20
Steve
English is about as pure as a cribhouse whore. We don't just borrow
words; on occasion, English has pursued other languages down alleyways
to beat them unconscious and rifle their pockets for new vocabulary."
- James D. Nicoll, resident of rec.arts.sf.written
Junichi Uekawa

2005-08-08, 5:54 pm


Hi,

> Standing up for Linda, I am more than willing to fix her usage of
> gettext, and I am currently investigating using Joe's suggestion, to
> see what that gives me.
>
> Anyway, in Python, unicode string objects behave the same as normal
> string objects, so to my mind, the breakage should be minimal.


What's broken about linda currently, and what following Joe's
suggestion will still break linda is that linda doesn't
follow the current CODESET.

You'd expect iso-8859-1 output on stdout when the locale says so, and
utf-8 output on stdout when the locale says so.

'ugettext' is a python's invention of gettext which only
returns UTF-8; which you will have to call like:

print _("some string").encode(locale.nl_langinfo(CODESET))

as opposed to

print _("some string")
(if _ is bound to lgettext).


regards,
junichi
--
Junichi Uekawa, Debian Developer http://www.netfort.gr.jp/~dancer/
183A 70FC 4732 1B87 57A5 CE82 D837 7D4E E81E 55C1


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Steve Kowalik

2005-08-09, 7:51 am

On Mon, 08 Aug 2005 12:06:48 -0500, Joe Wreschnig uttered
> No, it doesn't return UTF-8, it returns unicode objects. They're
> automatically recoded when you try to print them (based on the same
> function lgettext uses, locale.getpreferredencoding()). As Steve said,
> unicode objects are basically like str objects, so code changes should
> be minimal. I'll take a look at Linda/Lintian soon to see what needs to
> be done, but I suspect it'll be trivial.


I have already had a look, and actually already ripped out
gettext.gettext, and switched to gettext.install.

If you'd like to look at the pre-release, it is available at:

http://wedontsleep.org/~steven/lind..._0.3.17_all.deb

Cheers,
--
Steve
"You have a fear of nothingness, or in laymen's terms, a fear of ...
nothingness"
- EMH, USS Voyager


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com