|
Home > Archive > Unix Programming > November 2004 > wcsftime output encoding
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
wcsftime output encoding
|
|
| Roger Leigh 2004-11-26, 7:50 am |
| -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
The program listed below demonstrates the use of wcsftime() and
std::time_put<wchar_t> which is a C++ wrapper around it. (I know this
isn't C; but the "problem" lies in the C library implementation of
wcsftime()). I'm not sure if this is a platform-dependent feature or
part of the C standard.
I've compiled with GCC 3.4.3 on GNU/Linux, and run in an en_GB UTF-8
locale. The output looks like this:
$ ./date3
asctime: Fri Nov 26 13:26:48 2004
strftime: Fri 26 Nov 2004 13:26:48 GMT
wcsftime: Fri 26 Nov 2004 13:26:48 GMT
std::time_put<char>: Fri 26 Nov 2004 13:26:48 GMT
std::time_put<wchar_t>: Fri 26 Nov 2004 13:26:48 GMT
Everything worked. It also works if I run in a different locale (all
locales use UTF-8 as their codeset):
$ LANG=de_DE LC_ALL=de_DE ./date3
asctime: Fri Nov 26 13:28:03 2004
strftime: Fr 26 Nov 2004 13:28:03 GMT
wcsftime: Fr 26 Nov 2004 13:28:03 GMT
std::time_put<char>: Fr 26 Nov 2004 13:28:03 GMT
std::time_put<wchar_t>: Fr 26 Nov 2004 13:28:03 GMT
$ LANG=pt_BR LC_ALL=pt_BR ./date3
asctime: Fri Nov 26 13:29:18 2004
strftime: Sex 26 Nov 2004 13:29:18 GMT
wcsftime: Sex 26 Nov 2004 13:29:18 GMT
std::time_put<char>: Sex 26 Nov 2004 13:29:18 GMT
std::time_put<wchar_t>: Sex 26 Nov 2004 13:29:18 GMT
However, if I use a locale where the output includes non-ASCII
characters, I get this:
asctime: Fri Nov 26 13:30:08 2004
strftime: Птн 26 Ноя 2004 13:30:08
wcsftime: ^_B= 26 ^]>O 2004 13:30:08
std::time_put<char>: Птн 26 Ноя 2004 13:30:08
std::time_put<wchar_t>: ^_B= 26 ^]>O 2004 13:30:08
In this case the "narrow" and "wide" outputs differ. The "narrow"
output is valid UTF-8, whereas the "wide" output is something
different entirely. What encoding does wcsftime() use when outputting
characters outside the ASCII range? UCS-4? Something
implementation-defined? I expected that both would result in readable
output; is this assumption incorrect?
My question is basically this: what is wcsftime() actually doing, and
how should I get printable output from the wide string it fills for
me?
Many thanks,
Roger
#include <iostream>
#include <locale>
#include <ctime>
#include <cwchar>
int main()
{
// Set up locale stuff...
std::locale::global(std::locale(""));
std::cout.imbue(std::locale());
std::wcout.imbue(std::locale());
// Get current time
time_t simpletime = time(0);
// Break down time.
std::tm brokentime;
localtime_r(&simpletime, &brokentime);
// Normalise.
mktime(&brokentime);
std::cout << "asctime: " << asctime(&brokentime);
// Print with strftime(3)
char buffer[40];
std::strftime(&buffer[0], 40, "%c", &brokentime);
std::cout << "strftime: " << &buffer[0] << '\n';
wchar_t wbuffer[40];
std::wcsftime(&wbuffer[0], 40, L"%c", &brokentime);
std::wcout << L"wcsftime: " << &wbuffer[0] << L'\n';
// Try again, but use proper locale facets...
const std::time_put<char>& tp =
std::use_facet<std::time_put<char> >(std::cout.getloc());
std::string pattern("std::time_put<char>: %c\n");
tp.put(std::cout, std::cout, std::cout.fill(),
&brokentime, &*pattern.begin(), &*pattern.end());
// And again, but using wchar_t...
const std::time_put<wchar_t>& wtp =
std::use_facet<std::time_put<wchar_t> >(std::wcout.getloc());
std::wstring wpattern(L"std::time_put<wchar_t>: %c\n");
wtp.put(std::wcout, std::wcout, std::wcout.fill(),
&brokentime, &*wpattern.begin(), &*wpattern.end());
return 0;
}
- --
Roger Leigh
Printing on GNU/Linux? http://gimp-print.sourceforge.net/
Debian GNU/Linux http://www.debian.org/
GPG Public Key: 0x25BFB848. Please sign and encrypt your mail.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>
iD8DBQFBpz0qVcFcaSW/ uEgRAjGMAKCusoGdSOupZEllYLA5eCh65pL6awCf
cnpu
sdoS5qoYLjBiULIarVOD5bE=
=BHQO
-----END PGP SIGNATURE-----
| |
| Roger Leigh 2004-11-26, 5:50 pm |
| -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Roger Leigh <${roger}@invalid.whinlatter.uklinux.net.invalid> writes:
>
> However, if I use a locale where the output includes non-ASCII
> characters, I get this:
>
> asctime: Fri Nov 26 13:30:08 2004
> strftime: Птн 26 Ноя 2004 13:30:08
> wcsftime: ^_B= 26 ^]>O 2004 13:30:08
> std::time_put<char>: Птн 26 Ноя 2004 13:30:08
> std::time_put<wchar_t>: ^_B= 26 ^]>O 2004 13:30:08
This occurs because I've mixed calls to std::cout and std::wcout. If
I only use one or the other, things work perfectly (I get valid UTF-8
in both cases).
I wrote a plain C testcase (below) that uses fprintf/wfprintf, and
this also works fine, but not if I mix them for the same FILE stream.
What is the reason for not allowing narrow and wide I/O to the same
stream?
Regards,
Roger
#define _GNU_SOURCE
#include <stdio.h>
#include <locale.h>
#include <time.h>
#include <wchar.h>
int main(void)
{
// Set up locale stuff...
setlocale(LC_ALL, "");
// Get current time
time_t simpletime = time(0);
// Break down time.
struct tm brokentime;
localtime_r(&simpletime, &brokentime);
// Normalise.
mktime(&brokentime);
fprintf (stdout, "asctime: %s", asctime(&brokentime));
// Print with strftime(3)
char buffer[40];
strftime(&buffer[0], 40, "%c", &brokentime);
fprintf (stdout, "strftime: %s\n", &buffer[0]);
wchar_t wbuffer[40];
wcsftime(&wbuffer[0], 40, L"%c", &brokentime);
fwide (stderr, 1);
fwprintf(stderr, L"wcsftime: %ls\n", &wbuffer[0]);
return 0;
}
- --
Roger Leigh
Printing on GNU/Linux? http://gimp-print.sourceforge.net/
Debian GNU/Linux http://www.debian.org/
GPG Public Key: 0x25BFB848. Please sign and encrypt your mail.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>
iD8DBQFBp7J2VcFcaSW/ uEgRAgxnAKCmj5TOtbeBvVaw1WpEvxeejyNIoACe
IFsU
ufebBdtactU0jyCFf1NF/ac=
=rB04
-----END PGP SIGNATURE-----
| |
| Jack Klein 2004-11-26, 8:46 pm |
| On Fri, 26 Nov 2004 22:47:34 +0000, Roger Leigh
<${roger}@invalid.whinlatter.uklinux.net.invalid> wrote in
comp.lang.c:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Roger Leigh <${roger}@invalid.whinlatter.uklinux.net.invalid> writes:
>
>
> This occurs because I've mixed calls to std::cout and std::wcout. If
Please stop posting C++ details to comp.lang.c. The fact that C++
claims to include some of the C standard library is a C++ issue. The
C standard and this newsgroup disclaim all responsibility for how C++
library functions that happen to have the same name as C library
functions behave in a C++ program. Or how anything at all behaves in
a C++ program.
As for your assertion that your problem only occurs when you output
'non-ASCII' characters, or whether your output is UTF-8 or not, be
aware that neither language specifies the encoding a wide characters,
this is completely compiler and operating system specific, and not a
language issue at all.
> I only use one or the other, things work perfectly (I get valid UTF-8
> in both cases).
>
> I wrote a plain C testcase (below) that uses fprintf/wfprintf, and
> this also works fine, but not if I mix them for the same FILE stream.
> What is the reason for not allowing narrow and wide I/O to the same
> stream?
>
> Regards,
> Roger
>
>
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <locale.h>
> #include <time.h>
> #include <wchar.h>
>
> int main(void)
> {
> // Set up locale stuff...
> setlocale(LC_ALL, "");
>
> // Get current time
> time_t simpletime = time(0);
>
> // Break down time.
> struct tm brokentime;
> localtime_r(&simpletime, &brokentime);
^^^^^^^^^^^
This is not a function in either the C or C++ standard library,
neither language states anything at all about what it might or might
not do.
>
> // Normalise.
> mktime(&brokentime);
>
> fprintf (stdout, "asctime: %s", asctime(&brokentime));
Here stdout becomes a byte-oriented stream by the act of calling a
character input/output function.
> // Print with strftime(3)
> char buffer[40];
> strftime(&buffer[0], 40, "%c", &brokentime);
>
> fprintf (stdout, "strftime: %s\n", &buffer[0]);
>
> wchar_t wbuffer[40];
> wcsftime(&wbuffer[0], 40, L"%c", &brokentime);
>
> fwide (stderr, 1);
The fwide() attempts to set the orientation of a stream. There is no
guarantee in the C standard library that it will succeed. Like most C
standard library functions, it returns a value indicating its result,
in this case the orientation, if any, of the stream after the call.
You are neglecting the returned value, yet it might have some bearing
on your issue.
> fwprintf(stderr, L"wcsftime: %ls\n", &wbuffer[0]);
>
> return 0;
> }
Above you said "this code works fine, but not if you mix them" for the
same stream. This code performs byte and wide output to the same
stream. Does it work or not?
--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~.../FAQ-acllc.html
| |
| Roger Leigh 2004-11-27, 7:46 am |
| -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Jack Klein <jackklein@spamcop.net> writes:
> On Fri, 26 Nov 2004 22:47:34 +0000, Roger Leigh
> <${roger}@invalid.whinlatter.uklinux.net.invalid> wrote in
> comp.lang.c:
> Please stop posting C++ details to comp.lang.c. The fact that C++
> claims to include some of the C standard library is a C++ issue. The
> C standard and this newsgroup disclaim all responsibility for how C++
> library functions that happen to have the same name as C library
> functions behave in a C++ program. Or how anything at all behaves in
> a C++ program.
My question was never about C++, it was solely about wcsftime(). C++
std::time_put<> wraps strftime() and wcsftime() in the C library
directly, and so it's not strictly a C++ issue either. Where would be
the correct place to ask, or does everyone absolve responsibility for
interoperability?
> As for your assertion that your problem only occurs when you output
> 'non-ASCII' characters, or whether your output is UTF-8 or not, be
> aware that neither language specifies the encoding a wide characters,
> this is completely compiler and operating system specific, and not a
> language issue at all.
I'm aware of that, but I had hoped for a more constructive response,
for example what the standard says wcsftime() should output, and if
there was some portable method for determining this (if I'm writing
portable code, I won't know what this will be). Since wchar_t may be
used to store characters of any encoding of the programmer's choice, I
did expect it to be documented somewhere. It actualy appears to be
UCS-4 in this case, but I obviously can't rely on that if I need to do
any character manipulation.
This non-mixing is apparently specified in the C standard, but I don't
have access to a copy to verify this. The C++ restrictions come about
because they apparently defer to the C standard.
[vbcol=seagreen]
> ^^^^^^^^^^^
>
> This is not a function in either the C or C++ standard library,
> neither language states anything at all about what it might or might
> not do.
It's a thread-safe localtime() equivalent, which has a nicer
interface. Replace with
struct tm *brokentime = localtime(&simpletime);
if you prefer.
> The fwide() attempts to set the orientation of a stream. There is no
> guarantee in the C standard library that it will succeed. Like most C
> standard library functions, it returns a value indicating its result,
> in this case the orientation, if any, of the stream after the call.
>
> You are neglecting the returned value, yet it might have some bearing
> on your issue.
That's very true, but in this case it's guaranteed to succeed, since
*stderr* has no orientation at this point.
>
> Above you said "this code works fine, but not if you mix them" for the
> same stream. This code performs byte and wide output to the same
> stream. Does it work or not?
I use stdout as a narrow stream, and stderr as a wide stream (i.e. no
mixing at all). It works perfectly (the wide UCS-4 is transcoded to
UTF-8 for output). If I use stdout for both, I fail to get output
(because fwide() fails, as you would expect, and nothing wide is
printed).
Thanks,
Roger
- --
Roger Leigh
Printing on GNU/Linux? http://gimp-print.sourceforge.net/
Debian GNU/Linux http://www.debian.org/
GPG Public Key: 0x25BFB848. Please sign and encrypt your mail.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>
iD8DBQFBqGs1VcFcaSW/ uEgRAsaMAJwOh+YTiTRnnoAMAilmZGrygW0WewCf
ZQvT
6M0DO/6tCg+PsNRpI6r+SAo=
=qEhw
-----END PGP SIGNATURE-----
| |
| CBFalconer 2004-11-28, 5:49 pm |
| Roger Leigh wrote:
> Jack Klein <jackklein@spamcop.net> writes:
>
>
> My question was never about C++, it was solely about wcsftime().
> C++ std::time_put<> wraps strftime() and wcsftime() in the C
> library directly, and so it's not strictly a C++ issue either.
> Where would be the correct place to ask, or does everyone absolve
> responsibility for interoperability?
The C standard (N869) says the following:
7.24.5.1 The wcsftime function
Synopsis
[#1]
#include <time.h>
#include <wchar.h>
size_t wcsftime(wchar_t * restrict s,
size_t maxsize,
const wchar_t * restrict format,
const struct tm * restrict timeptr);
Description
[#2] The wcsftime function is equivalent to the strftime
function, except that:
-- The argument s points to the initial element of an
array of wide characters into which the generated
output is to be placed.
-- The argument maxsize indicates the limiting number of
wide characters.
-- The argument format is a wide string and the conversion
specifiers are replaced by corresponding sequences of
wide characters.
-- The return value indicates the number of wide
characters.
Returns
[#3] If the total number of resulting wide characters
including the terminating null wide character is not more
than maxsize, the wcsftime function returns the number of
wide characters placed into the array pointed to by s not
including the terminating null wide character. Otherwise,
zero is returned and the contents of the array are
indeterminate.
Similarly, you can look up the description of strftime referenced
above. All of this has nothing whatsoever to to with C++, and
cross posting to C.L.C++ is completely off topic there. Follow-ups
set accordingly.
.... snip ...
>
> This non-mixing is apparently specified in the C standard, but I
> don't have access to a copy to verify this. The C++ restrictions
> come about because they apparently defer to the C standard.
Nonsense. Everybody has free access to the final draft N869. Just
google for it. You can also try the links in my sig block below.
Please also get rid of the following nonsense, which is totally
useless and annoying in newsgroups.
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.5 (GNU/Linux)
> Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>
>
> iD8DBQFBqGs1VcFcaSW/ uEgRAsaMAJwOh+YTiTRnnoAMAilmZGrygW0WewCf
ZQvT
> 6M0DO/6tCg+PsNRpI6r+SAo=
> =qEhw
> -----END PGP SIGNATURE-----
--
Some useful references:
<http://www.ungerhu.com/jxh/clc.welcome.txt>
<http://www.eskimo.com/~scs/C-faq/top.html>
<http://benpfaff.org/writings/clc/off-topic.html>
<http://anubis.dkuug.dk/jtc1/sc22/wg14/www/docs/n869/> (C99)
<http://www.dinkumware.com/refxc.html> C-library
|
|
|
|
|