|
Home > Archive > Unix Programming > January 2004 > Displaying latin-1 characters
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Displaying latin-1 characters
|
|
|
| Hi, I am parsing hex encoded latin-1 characters and on printing them
to terminal i see different results.
my $str = " ABCD%79%7A%7b%7c%7d%7e%80%81%82%83%84%85
%86%87%88%89%8a%8b%8c%8d%8e%8f%90%91%92%
93%94%95%96%97%98%99%9a%9b%9c%9d%9e%9f%A
0%A1%a2
xyz"
$str =~ s/%([0-9a-fA-F]{2})/chr(hex($1))/eg; #Change hex to
latin-1
print "$str\n";
The output on one Sun box (SunOS 5.8 Generic_108528-13 sun4u sparc
SUNW,Sun-Fire) on secureCRT (SecureCRT version 3.2.1 (32-bit)) i get
the following,
---
ABCDyz{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘'""•–—˜™š›œžŸ ¡¢ xyz
--
and on another Sun box (5.8 Generic_108528-14 sun4u sparc
SUNW,Sun-Fire-880) on secureCRT (same version as above) i get the
following,
---
ABCDyz{|} ¡¢ xyz
---
The characters are not displayed but they exists in the second
display. The settings on both boxes for secureCRT match.
Any help is appreciated.
Tamas
| |
| Pascal Bourguignon 2004-01-29, 3:34 pm |
| tamashee@yahoo.com (Tamas) writes:
quote:
> Hi, I am parsing hex encoded latin-1 characters and on printing them
> to terminal i see different results.
>
> my $str = " ABCD%79%7A%7b%7c%7d%7e%80%81%82%83%84%85
%86%87%88%89%8a%8b%8c%8d%8e%8f%90%91%92%
93%94%95%96%97%98%99%9a%9b%9c%9d%9e%9f%A
0%A1%a2
> xyz"
> $str =~ s/%([0-9a-fA-F]{2})/chr(hex($1))/eg; #Change hex to
> latin-1
> print "$str\n";
>
> The output on one Sun box (SunOS 5.8 Generic_108528-13 sun4u sparc
> SUNW,Sun-Fire) on secureCRT (SecureCRT version 3.2.1 (32-bit)) i get
> the following,
> ---
> ABCDyz{|}~þÿ € ‚ ƒ „ … † ‡ ˆ ‰ Š ‹ Œ
Ž ‘'""þÿ • – — ˜ ™ š › œ ž Ÿ þÿ ¡ ¢ xyz
> --
>
> and on another Sun box (5.8 Generic_108528-14 sun4u sparc
> SUNW,Sun-Fire-880) on secureCRT (same version as above) i get the
> following,
> ---
> ABCDyz{|} þÿ ¡ ¢ xyz
> ---
> The characters are not displayed but they exists in the second
> display. The settings on both boxes for secureCRT match.
>
> Any help is appreciated.
> Tamas
ISO-Latin-1 does not allocate the codes between 128 and 159 inclusive.
The first ISO-Latin-1 character that is not in ASCII is the non
breakable space, code 160 (A0). Therefore the output you're seeing on
both terminals seems perfectly correct to me (þÿ ¡ ¢).
--
__Pascal_Bourguignon__ http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
| |
| Lorinczy Zsigmond / Domonyik Mariann 2004-01-30, 3:36 am |
| Tamas wrote:
quote:
> Hi, I am parsing hex encoded latin-1 characters and on printing them
> to terminal i see different results.
>
> my $str = " ABCD%79%7A%7b%7c%7d%7e%80%81%82%83%84%85
%86%87%88%89%8a%8b%8c%8d%8e%8f%90%91%92%
93%94%95%96%97%98%99%9a%9b%9c%9d%9e%9f%A
0%A1%a2
> xyz"
> $str =~ s/%([0-9a-fA-F]{2})/chr(hex($1))/eg; #Change hex to
> latin-1
> print "$str\n";
>
> The output on one Sun box (SunOS 5.8 Generic_108528-13 sun4u sparc
> SUNW,Sun-Fire) on secureCRT (SecureCRT version 3.2.1 (32-bit)) i get
> the following,
> ---
> ABCDyz{|}~€‚ƒ„…†‡ˆ‰©‹¦«®¬‘'""•–—˜™¹›¶»¾¼ ¡¢ xyz
> --
>
> and on another Sun box (5.8 Generic_108528-14 sun4u sparc
> SUNW,Sun-Fire-880) on secureCRT (same version as above) i get the
> following,
> ---
> ABCDyz{|} ¡¢ xyz
> ---
> The characters are not displayed but they exists in the second
> display. The settings on both boxes for secureCRT match.
Characters between 0x80 and 0x9f are not printable...
http://www.unicode.org/charts/PDF/U0080.pdf Page 8
| |
|
| Pascal Bourguignon <spam@thalassa.informatimago.com> wrote in message news:<871xpiqbzb.fsf@thalassa.informatimago.com>...quote:
> tamashee@yahoo.com (Tamas) writes:
>
>
> ISO-Latin-1 does not allocate the codes between 128 and 159 inclusive.
> The first ISO-Latin-1 character that is not in ASCII is the non
> breakable space, code 160 (A0). Therefore the output you're seeing on
> both terminals seems perfectly correct to me (þÿ
Oops! My original message when posted may have not displayed correctly
on your terminal (all latin-1 chars). In the first display i have
mentioned above it has all the latin-1 chars displayed in the SSH
window and also in IE 6.0, where i am currently posting this
ABCDyz{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘'""•–—˜™š›œžŸ_¡¢ xyz
i mean, displayed all (~50 in the above line) chars including 128 to
159. For example: 128 is displayed EURO sign (€). Similarly other
chars on one ssh window but not on other ssh window(2nd display. In
the original posting above). If i cut and paste output from either of
the SSH windows to MS Windows notepad i see all characters.
Another point, though original LATIN-1 did not define 128 to 159, HTML
entities use these code points to display the above characters. A
quick ref:
http://www.bbsinc.com/symbol.html.
So the questions remain
(1) it might be the setting or something that i might need to do to
display. If so what are those settings? If not what else should i be
looking into?
(2) with HTML entities using code points from 128 to 159 along with
other LATIN-1 chars, what does it mean to handle them in applications.
Say for example, if one parses the html entities and stores them, it
has to be stored in locale setting. right? On both the sun boxes i
have current locale as
$> locale -a
POSIX
common
en_US.UTF-8
C
iso_8859_1
$>
Thanks,
Tamas
| |
| Stephane CHAZELAS 2004-01-30, 9:35 am |
| 2004-01-30, 10:48(-08), Tamas:
[...]quote:
> i mean, displayed all (~50 in the above line) chars including 128 to
> 159. For example: 128 is displayed EURO sign (€).
[...]
There's no "euro" character in the iso-8859-1 (latin1) charset,
iso-8859-15 (latin9) was created especially for that.
The character 128 is not defined in latin1 either (nor in
latin9, where euro is char 164).
In the windows-1252 charset, character 128 is the euro sign,
that's probably the charset you are using. Ask a windows
newgroup for how to convert a copy paste selection from a
charset to another automatically.
--
Stéphane ["Stephane.Chazelas" at "free.fr"]
|
|
|
|
|