wcsftime output encoding

R

Roger Leigh

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

The program listed below demonstrates the use of wcsftime() and
std::time_put<wchar_t> which is a C++ wrapper around it. (I know this
isn't C; but the "problem" lies in the C library implementation of
wcsftime()). I'm not sure if this is a platform-dependent feature or
part of the C standard.

I've compiled with GCC 3.4.3 on GNU/Linux, and run in an en_GB UTF-8
locale. The output looks like this:

$ ./date3
asctime: Fri Nov 26 13:26:48 2004
strftime: Fri 26 Nov 2004 13:26:48 GMT
wcsftime: Fri 26 Nov 2004 13:26:48 GMT
std::time_put<char>: Fri 26 Nov 2004 13:26:48 GMT
std::time_put<wchar_t>: Fri 26 Nov 2004 13:26:48 GMT

Everything worked. It also works if I run in a different locale (all
locales use UTF-8 as their codeset):

$ LANG=de_DE LC_ALL=de_DE ./date3
asctime: Fri Nov 26 13:28:03 2004
strftime: Fr 26 Nov 2004 13:28:03 GMT
wcsftime: Fr 26 Nov 2004 13:28:03 GMT
std::time_put<char>: Fr 26 Nov 2004 13:28:03 GMT
std::time_put<wchar_t>: Fr 26 Nov 2004 13:28:03 GMT

$ LANG=pt_BR LC_ALL=pt_BR ./date3
asctime: Fri Nov 26 13:29:18 2004
strftime: Sex 26 Nov 2004 13:29:18 GMT
wcsftime: Sex 26 Nov 2004 13:29:18 GMT
std::time_put<char>: Sex 26 Nov 2004 13:29:18 GMT
std::time_put<wchar_t>: Sex 26 Nov 2004 13:29:18 GMT

However, if I use a locale where the output includes non-ASCII
characters, I get this:

asctime: Fri Nov 26 13:30:08 2004
strftime: Птн 26 ÐÐ¾Ñ 2004 13:30:08
wcsftime: ^_B= 26 ^]>O 2004 13:30:08
std::time_put<char>: Птн 26 ÐÐ¾Ñ 2004 13:30:08
std::time_put<wchar_t>: ^_B= 26 ^]>O 2004 13:30:08

In this case the "narrow" and "wide" outputs differ. The "narrow"
output is valid UTF-8, whereas the "wide" output is something
different entirely. What encoding does wcsftime() use when outputting
characters outside the ASCII range? UCS-4? Something
implementation-defined? I expected that both would result in readable
output; is this assumption incorrect?

My question is basically this: what is wcsftime() actually doing, and
how should I get printable output from the wide string it fills for
me?


Many thanks,
Roger


#include <iostream>
#include <locale>
#include <ctime>
#include <cwchar>

int main()
{
// Set up locale stuff...
std::locale::global(std::locale(""));
std::cout.imbue(std::locale());
std::wcout.imbue(std::locale());

// Get current time
time_t simpletime = time(0);

// Break down time.
std::tm brokentime;
localtime_r(&simpletime, &brokentime);

// Normalise.
mktime(&brokentime);

std::cout << "asctime: " << asctime(&brokentime);

// Print with strftime(3)
char buffer[40];
std::strftime(&buffer[0], 40, "%c", &brokentime);

std::cout << "strftime: " << &buffer[0] << '\n';

wchar_t wbuffer[40];
std::wcsftime(&wbuffer[0], 40, L"%c", &brokentime);
std::wcout << L"wcsftime: " << &wbuffer[0] << L'\n';

// Try again, but use proper locale facets...
const std::time_put<char>& tp =
std::use_facet<std::time_put<char> >(std::cout.getloc());

std::string pattern("std::time_put<char>: %c\n");
tp.put(std::cout, std::cout, std::cout.fill(),
&brokentime, &*pattern.begin(), &*pattern.end());

// And again, but using wchar_t...
const std::time_put<wchar_t>& wtp =
std::use_facet<std::time_put<wchar_t> >(std::wcout.getloc());

std::wstring wpattern(L"std::time_put<wchar_t>: %c\n");
wtp.put(std::wcout, std::wcout, std::wcout.fill(),
&brokentime, &*wpattern.begin(), &*wpattern.end());

return 0;
}


- --
Roger Leigh
Printing on GNU/Linux? http://gimp-print.sourceforge.net/
Debian GNU/Linux http://www.debian.org/
GPG Public Key: 0x25BFB848. Please sign and encrypt your mail.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

iD8DBQFBpz0qVcFcaSW/uEgRAjGMAKCusoGdSOupZEllYLA5eCh65pL6awCfcnpu
sdoS5qoYLjBiULIarVOD5bE=
=BHQO
-----END PGP SIGNATURE-----
 
R

Roger Leigh

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Roger Leigh said:
However, if I use a locale where the output includes non-ASCII
characters, I get this:

asctime: Fri Nov 26 13:30:08 2004
strftime: Птн 26 ÐÐ¾Ñ 2004 13:30:08
wcsftime: ^_B= 26 ^]>O 2004 13:30:08
std::time_put<char>: Птн 26 ÐÐ¾Ñ 2004 13:30:08
std::time_put<wchar_t>: ^_B= 26 ^]>O 2004 13:30:08

This occurs because I've mixed calls to std::cout and std::wcout. If
I only use one or the other, things work perfectly (I get valid UTF-8
in both cases).

I wrote a plain C testcase (below) that uses fprintf/wfprintf, and
this also works fine, but not if I mix them for the same FILE stream.
What is the reason for not allowing narrow and wide I/O to the same
stream?

Regards,
Roger


#define _GNU_SOURCE
#include <stdio.h>
#include <locale.h>
#include <time.h>
#include <wchar.h>

int main(void)
{
// Set up locale stuff...
setlocale(LC_ALL, "");

// Get current time
time_t simpletime = time(0);

// Break down time.
struct tm brokentime;
localtime_r(&simpletime, &brokentime);

// Normalise.
mktime(&brokentime);

fprintf (stdout, "asctime: %s", asctime(&brokentime));

// Print with strftime(3)
char buffer[40];
strftime(&buffer[0], 40, "%c", &brokentime);

fprintf (stdout, "strftime: %s\n", &buffer[0]);

wchar_t wbuffer[40];
wcsftime(&wbuffer[0], 40, L"%c", &brokentime);

fwide (stderr, 1);
fwprintf(stderr, L"wcsftime: %ls\n", &wbuffer[0]);

return 0;
}

- --
Roger Leigh
Printing on GNU/Linux? http://gimp-print.sourceforge.net/
Debian GNU/Linux http://www.debian.org/
GPG Public Key: 0x25BFB848. Please sign and encrypt your mail.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

iD8DBQFBp7J2VcFcaSW/uEgRAgxnAKCmj5TOtbeBvVaw1WpEvxeejyNIoACeIFsU
ufebBdtactU0jyCFf1NF/ac=
=rB04
-----END PGP SIGNATURE-----
 
J

Jack Klein

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Roger Leigh said:
However, if I use a locale where the output includes non-ASCII
characters, I get this:

asctime: Fri Nov 26 13:30:08 2004
strftime: ??? 26 ??? 2004 13:30:08
wcsftime: ^_B= 26 ^]>O 2004 13:30:08
std::time_put<char>: ??? 26 ??? 2004 13:30:08
std::time_put<wchar_t>: ^_B= 26 ^]>O 2004 13:30:08

This occurs because I've mixed calls to std::cout and std::wcout. If

Please stop posting C++ details to comp.lang.c. The fact that C++
claims to include some of the C standard library is a C++ issue. The
C standard and this newsgroup disclaim all responsibility for how C++
library functions that happen to have the same name as C library
functions behave in a C++ program. Or how anything at all behaves in
a C++ program.

As for your assertion that your problem only occurs when you output
'non-ASCII' characters, or whether your output is UTF-8 or not, be
aware that neither language specifies the encoding a wide characters,
this is completely compiler and operating system specific, and not a
language issue at all.
I only use one or the other, things work perfectly (I get valid UTF-8
in both cases).

I wrote a plain C testcase (below) that uses fprintf/wfprintf, and
this also works fine, but not if I mix them for the same FILE stream.
What is the reason for not allowing narrow and wide I/O to the same
stream?

Regards,
Roger


#define _GNU_SOURCE
#include <stdio.h>
#include <locale.h>
#include <time.h>
#include <wchar.h>

int main(void)
{
// Set up locale stuff...
setlocale(LC_ALL, "");

// Get current time
time_t simpletime = time(0);

// Break down time.
struct tm brokentime;
localtime_r(&simpletime, &brokentime);
^^^^^^^^^^^

This is not a function in either the C or C++ standard library,
neither language states anything at all about what it might or might
not do.
// Normalise.
mktime(&brokentime);

fprintf (stdout, "asctime: %s", asctime(&brokentime));

Here stdout becomes a byte-oriented stream by the act of calling a
character input/output function.
// Print with strftime(3)
char buffer[40];
strftime(&buffer[0], 40, "%c", &brokentime);

fprintf (stdout, "strftime: %s\n", &buffer[0]);

wchar_t wbuffer[40];
wcsftime(&wbuffer[0], 40, L"%c", &brokentime);

fwide (stderr, 1);

The fwide() attempts to set the orientation of a stream. There is no
guarantee in the C standard library that it will succeed. Like most C
standard library functions, it returns a value indicating its result,
in this case the orientation, if any, of the stream after the call.

You are neglecting the returned value, yet it might have some bearing
on your issue.
fwprintf(stderr, L"wcsftime: %ls\n", &wbuffer[0]);

return 0;
}

Above you said "this code works fine, but not if you mix them" for the
same stream. This code performs byte and wide output to the same
stream. Does it work or not?
 
R

Roger Leigh

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Please stop posting C++ details to comp.lang.c. The fact that C++
claims to include some of the C standard library is a C++ issue. The
C standard and this newsgroup disclaim all responsibility for how C++
library functions that happen to have the same name as C library
functions behave in a C++ program. Or how anything at all behaves in
a C++ program.

My question was never about C++, it was solely about wcsftime(). C++
std::time_put<> wraps strftime() and wcsftime() in the C library
directly, and so it's not strictly a C++ issue either. Where would be
the correct place to ask, or does everyone absolve responsibility for
interoperability?
As for your assertion that your problem only occurs when you output
'non-ASCII' characters, or whether your output is UTF-8 or not, be
aware that neither language specifies the encoding a wide characters,
this is completely compiler and operating system specific, and not a
language issue at all.

I'm aware of that, but I had hoped for a more constructive response,
for example what the standard says wcsftime() should output, and if
there was some portable method for determining this (if I'm writing
portable code, I won't know what this will be). Since wchar_t may be
used to store characters of any encoding of the programmer's choice, I
did expect it to be documented somewhere. It actualy appears to be
UCS-4 in this case, but I obviously can't rely on that if I need to do
any character manipulation.

This non-mixing is apparently specified in the C standard, but I don't
have access to a copy to verify this. The C++ restrictions come about
because they apparently defer to the C standard.
^^^^^^^^^^^

This is not a function in either the C or C++ standard library,
neither language states anything at all about what it might or might
not do.

It's a thread-safe localtime() equivalent, which has a nicer
interface. Replace with

struct tm *brokentime = localtime(&simpletime);

if you prefer.
The fwide() attempts to set the orientation of a stream. There is no
guarantee in the C standard library that it will succeed. Like most C
standard library functions, it returns a value indicating its result,
in this case the orientation, if any, of the stream after the call.

You are neglecting the returned value, yet it might have some bearing
on your issue.

That's very true, but in this case it's guaranteed to succeed, since
*stderr* has no orientation at this point.
fwprintf(stderr, L"wcsftime: %ls\n", &wbuffer[0]);

return 0;
}

Above you said "this code works fine, but not if you mix them" for the
same stream. This code performs byte and wide output to the same
stream. Does it work or not?

I use stdout as a narrow stream, and stderr as a wide stream (i.e. no
mixing at all). It works perfectly (the wide UCS-4 is transcoded to
UTF-8 for output). If I use stdout for both, I fail to get output
(because fwide() fails, as you would expect, and nothing wide is
printed).


Thanks,
Roger

- --
Roger Leigh
Printing on GNU/Linux? http://gimp-print.sourceforge.net/
Debian GNU/Linux http://www.debian.org/
GPG Public Key: 0x25BFB848. Please sign and encrypt your mail.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

iD8DBQFBqGs1VcFcaSW/uEgRAsaMAJwOh+YTiTRnnoAMAilmZGrygW0WewCfZQvT
6M0DO/6tCg+PsNRpI6r+SAo=
=qEhw
-----END PGP SIGNATURE-----
 
R

Roger Leigh

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jack Klein said:
However, if I use a locale where the output includes non-ASCII
characters, I get this:

asctime: Fri Nov 26 13:30:08 2004
strftime: ??? 26 ??? 2004 13:30:08
wcsftime: ^_B= 26 ^]>O 2004 13:30:08
std::time_put<char>: ??? 26 ??? 2004 13:30:08
std::time_put<wchar_t>: ^_B= 26 ^]>O 2004 13:30:08

This occurs because I've mixed calls to std::cout and std::wcout. If

Please stop posting C++ details to comp.lang.c. The fact that C++
claims to include some of the C standard library is a C++ issue.

OK, here's a C++-free C example, that should be 100% Standard C:

#include <locale.h>
#include <stdio.h>
#include <string.h>
#include <wchar.h>

int main(void)
{
setlocale(LC_ALL, "");

const char *narrow = "Test Unicode (narrow): ïàý ÐоÑ!\n";
fprintf(stdout, "%s\n", narrow);

fprintf(stdout, "Narrow bytes:\n");
for (int i = 0; i< strlen(narrow); ++i)
fprintf(stdout, "%3d: %02X\n", i, (unsigned int) *((unsigned char *)narrow+i));

if (fwide (stderr, 1) <= 0)
fprintf(stdout, "Failed to set stderr to wide orientation\n");

const wchar_t *wide = L"Test Unicode (wide): ïàý ÐоÑ!\n";
fwprintf(stderr, L"\n%ls\n", wide);

fprintf(stdout, "Wide bytes:\n");
for (int i = 0; i< (wcslen(wide) * sizeof(wchar_t)); ++i)
fprintf(stdout, "%3d: %02X\n", i, (unsigned int) *((unsigned char *)wide+i));

return 0;
}

On my system, this exists on disc as UTF-8 encoded text:

$ file unicode.c
unicode.c: UTF-8 Unicode C program text

When I compile this on a GNU/Linux system with GCC 3.4 in C99 mode,
the narrow string exists in the compiled binary as UTF-8-encoded
bytes, while the wide string exists as UCS-4-encoded bytes. These
both appear to be output as UTF-8 when using a locale with a UTF-8
codeset.

My question is is this use of non-ASCII source code either standard or
portable? How portable would this code be to non-GNU systems and/or
compilers?

If a system uses other encodings for narrow and wide characters, are
there any macros/constants defined to determine these at compile time
or runtime?


Many thanks,
Roger

- --
Roger Leigh
Printing on GNU/Linux? http://gimp-print.sourceforge.net/
Debian GNU/Linux http://www.debian.org/
GPG Public Key: 0x25BFB848. Please sign and encrypt your mail.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

iD8DBQFBqHjRVcFcaSW/uEgRAjn1AKDNuAZgNrFZ2+Xw3QKwZm0yC1GECgCeO0Dh
IC6jle+B4ELhH2idiFJIbE0=
=RGac
-----END PGP SIGNATURE-----
 
M

Mark McIntyre

My question is is this use of non-ASCII source code either standard or
portable?

The standard defines a Translation Environment, in which the source code
lives as "units". The units must use the Source Character Set. This
consists of characters , and characters are defined in 3.7.1 as a bit
representation that fits in a single byte. There is however apparently
nothing that mandates the Source Character set to be restricted to only
single-byte chars. Indeed 3.7.2 says that a multibyte character can be part
of the source set too. Wide chars are not mentioned.
How portable would this code be to non-GNU systems and/or
compilers?

The standard doesn't say. I believe that it would be the responsibility of
any process which moved it from one system to another, to ensure it was
adequately translated for the new platform. Compare this to moving text
files from unix to windows/dos to mac - different ways of storing the
"unit" typically require it to be converted before it can be used on a
different platform.
If a system uses other encodings for narrow and wide characters, are
there any macros/constants defined to determine these at compile time
or runtime?

If there are, tehy're offtopic here as ISO C doesn't require them.
 
R

Roger Leigh

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Mark McIntyre said:
On Sat, 27 Nov 2004 12:53:58 +0000, in comp.lang.c , Roger Leigh


The standard doesn't say. I believe that it would be the responsibility of
any process which moved it from one system to another, to ensure it was
adequately translated for the new platform. Compare this to moving text
files from unix to windows/dos to mac - different ways of storing the
"unit" typically require it to be converted before it can be used on a
different platform.

OK, I can cope with that. At worst, it will need recoding with iconv
or similar.
If there are, tehy're offtopic here as ISO C doesn't require them.

What I meant by this is this:

const char *narrow = "foo";
const wchar_t *wide = L"bar";

printf("%ls\n", bar);
wprintf("%s\n", foo);

In this example, I've printed a wide string to a narrow stream and
vice versa. The strings are transparently recoded to the other form,
so the C implementation must know at some level what encoding
represents each form. What I want to know is: what are the wide and
narrow forms for a given implementation?

I found one constant:
/* wchar_t uses ISO 10646-1 (2nd ed., published 2000-09-15) / Unicode 3.1. */
#define __STDC_ISO_10646__ 200009L

Are there any others that might be defined?


Thanks!
Roger

- --
Roger Leigh
Printing on GNU/Linux? http://gimp-print.sourceforge.net/
Debian GNU/Linux http://www.debian.org/
GPG Public Key: 0x25BFB848. Please sign and encrypt your mail.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

iD8DBQFBqhGUVcFcaSW/uEgRAq3XAJ4xqu1Mfr9j+wchzJjDygegGrFXRACguB6S
T3L7K0Z3yA7vfv59yv3kdRg=
=fytC
-----END PGP SIGNATURE-----
 
C

CBFalconer

Roger said:
My question was never about C++, it was solely about wcsftime().
C++ std::time_put<> wraps strftime() and wcsftime() in the C
library directly, and so it's not strictly a C++ issue either.
Where would be the correct place to ask, or does everyone absolve
responsibility for interoperability?

The C standard (N869) says the following:

7.24.5.1 The wcsftime function

Synopsis

[#1]
#include <time.h>
#include <wchar.h>
size_t wcsftime(wchar_t * restrict s,
size_t maxsize,
const wchar_t * restrict format,
const struct tm * restrict timeptr);

Description

[#2] The wcsftime function is equivalent to the strftime
function, except that:

-- The argument s points to the initial element of an
array of wide characters into which the generated
output is to be placed.

-- The argument maxsize indicates the limiting number of
wide characters.

-- The argument format is a wide string and the conversion
specifiers are replaced by corresponding sequences of
wide characters.

-- The return value indicates the number of wide
characters.

Returns

[#3] If the total number of resulting wide characters
including the terminating null wide character is not more
than maxsize, the wcsftime function returns the number of
wide characters placed into the array pointed to by s not
including the terminating null wide character. Otherwise,
zero is returned and the contents of the array are
indeterminate.

Similarly, you can look up the description of strftime referenced
above. All of this has nothing whatsoever to to with C++, and
cross posting to C.L.C++ is completely off topic there. Follow-ups
set accordingly.

.... snip ...
This non-mixing is apparently specified in the C standard, but I
don't have access to a copy to verify this. The C++ restrictions
come about because they apparently defer to the C standard.

Nonsense. Everybody has free access to the final draft N869. Just
google for it. You can also try the links in my sig block below.

Please also get rid of the following nonsense, which is totally
useless and annoying in newsgroups.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

iD8DBQFBqGs1VcFcaSW/uEgRAsaMAJwOh+YTiTRnnoAMAilmZGrygW0WewCfZQvT
6M0DO/6tCg+PsNRpI6r+SAo=
=qEhw
-----END PGP SIGNATURE-----


--
Some useful references:
<http://www.ungerhu.com/jxh/clc.welcome.txt>
<http://www.eskimo.com/~scs/C-faq/top.html>
<http://benpfaff.org/writings/clc/off-topic.html>
<http://anubis.dkuug.dk/jtc1/sc22/wg14/www/docs/n869/> (C99)
<http://www.dinkumware.com/refxc.html> C-library
 
C

Charlie Gordon

Roger Leigh said:
#include <iostream>
#include <locale>
#include <ctime>
#include <cwchar>

int main()
{
// Set up locale stuff...
std::locale::global(std::locale(""));
std::cout.imbue(std::locale());
std::wcout.imbue(std::locale());

// Get current time
time_t simpletime = time(0);

// Break down time.
std::tm brokentime;
localtime_r(&simpletime, &brokentime);

// Normalise.
mktime(&brokentime);

std::cout << "asctime: " << asctime(&brokentime);

// Print with strftime(3)
char buffer[40];
std::strftime(&buffer[0], 40, "%c", &brokentime);

std::cout << "strftime: " << &buffer[0] << '\n';

wchar_t wbuffer[40];
std::wcsftime(&wbuffer[0], 40, L"%c", &brokentime);
std::wcout << L"wcsftime: " << &wbuffer[0] << L'\n';

// Try again, but use proper locale facets...
const std::time_put<char>& tp =
std::use_facet<std::time_put<char> >(std::cout.getloc());

std::string pattern("std::time_put<char>: %c\n");
tp.put(std::cout, std::cout, std::cout.fill(),
&brokentime, &*pattern.begin(), &*pattern.end());

// And again, but using wchar_t...
const std::time_put<wchar_t>& wtp =
std::use_facet<std::time_put<wchar_t> >(std::wcout.getloc());

std::wstring wpattern(L"std::time_put<wchar_t>: %c\n");
wtp.put(std::wcout, std::wcout, std::wcout.fill(),
&brokentime, &*wpattern.begin(), &*wpattern.end());

return 0;
}

For those who still thought C++ was close to C, look above.
Such nonsense makes me puke.
I have seen Perl scripts more readable than this.
Please keep comp.lang.c free of such pollution !

Thank you for re-posting using C.
 
K

Kevin Bracey

In message <[email protected]>
Roger Leigh said:
What I meant by this is this:
const char *narrow = "foo";
const wchar_t *wide = L"bar";
printf("%ls\n", bar);
wprintf("%s\n", foo);
In this example, I've printed a wide string to a narrow stream and
vice versa. The strings are transparently recoded to the other form,
so the C implementation must know at some level what encoding
represents each form.
Indeed.

What I want to know is: what are the wide and narrow forms for a given
implementation?

You'll have to check with your implementation's documentation. The C standard
unfortunately (from a programmer's point of view) specifies very little in
this area. It just puts in a framework on which an implementation can build
its facilities.

Personally, I find it all of rather dubious utility - the same "standard" C
functions might exist on all sorts of platforms, but exactly what encodings
any of them are going to use/support is unknown, making any practical code
using the functions effectively non-portable.
I found one constant:
/* wchar_t uses ISO 10646-1 (2nd ed., published 2000-09-15) / Unicode 3.1. */
#define __STDC_ISO_10646__ 200009L

Well, if that's defined then wchar_t contains Unicode/ISO 10646 code points.
That's a starting block. Then on a "reasonable" implementation, wprintf would
be translating from wide (hopefully 32-bit) Unicode to your system encoding.

On other platforms the wchar_t encoding may vary with locale - it may just be
a "wide" form of the current multibyte encoding, thus the recoding you
mention above would be very simple. If wchar_t is always Unicode, on the
other hand, then printf must contain iconv-like functionality.

I believe that setlocale() should do some of the configuration work,
depending on how your implementation handles it. If I recall the standard
correctly, then the locale in force at the time of fopen() is remembered
inside the FILE object. So if the locale determines encodings, then you're
set.
 
R

Roger Leigh

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

CBFalconer said:
The C standard (N869) says the following:

7.24.5.1 The wcsftime function
[...]

OK, that matched my understanding, but still didn't satisfy my
question. Reading the N869 draft may help me rephrase it, however.
- From the draft (5.2.1), it states there is

1) a source character set
2) an execution character set

(5.1.1.2 clause 5): the source character set members in character
constants and string literals are converted to the corresponding
member of the execution character set.

Are wide and narrow string literals two separate execution character
sets (or encodings thereof)? [GCC allows one to specify both.]

Is there any standard way of determining what the execution character
set is at compile-time or run-time? This is required if you need to
do fancier stuff such as i18n. If not, are there any other ways?

Is the output of wcsftime encoded in the [wide] execution character
set? Or the [widened] current locale codeset? Or is this completely
unspecified?

Similarly, is the input to scanf/wscanf converted to the narrow/wide
execution charset respectively, or does it remain in the original
input encoding?

Sorry if this seems dumb, but I'd like to know exectly where the
limits of the C standard are here, and which parts are
implementation-defined. If anything, it's confused me even more: I've
now got 4 charsets to contend with: source, exec, wide exec and
locale-specific, and I don't know which functions use which.
Nonsense. Everybody has free access to the final draft N869. Just
google for it. You can also try the links in my sig block below.

Wow, I wasn't aware of that. That will come in very useful, thanks.


Thanks,
Roger
Please also get rid of the following nonsense, which is totally
useless and annoying in newsgroups.
[PGP-signature]

I'm sorry you don't like it, but nowadays I sign everything as a
matter of routine.

- --
Roger Leigh
Printing on GNU/Linux? http://gimp-print.sourceforge.net/
Debian GNU/Linux http://www.debian.org/
GPG Public Key: 0x25BFB848. Please sign and encrypt your mail.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

iD8DBQFBq3jtVcFcaSW/uEgRAhcdAKDurI345dP23zCOr+anfkPzLdAPQQCfVyAd
5qVbqvPKzHiqaMTa2/H1DyU=
=1mXF
-----END PGP SIGNATURE-----
 
C

Chris Croughton

The program listed below demonstrates the use of wcsftime() and
std::time_put<wchar_t> which is a C++ wrapper around it. (I know this
isn't C; but the "problem" lies in the C library implementation of
wcsftime()). I'm not sure if this is a platform-dependent feature or
part of the C standard.

How about rewriting it in C?
However, if I use a locale where the output includes non-ASCII
characters, I get this:

asctime: Fri Nov 26 13:30:08 2004
strftime: ??? 26 ??? 2004 13:30:08
wcsftime: ^_B= 26 ^]>O 2004 13:30:08
std::time_put<char>: ??? 26 ??? 2004 13:30:08
std::time_put<wchar_t>: ^_B= 26 ^]>O 2004 13:30:08

In this case the "narrow" and "wide" outputs differ. The "narrow"
output is valid UTF-8, whereas the "wide" output is something
different entirely. What encoding does wcsftime() use when outputting
characters outside the ASCII range? UCS-4? Something
implementation-defined? I expected that both would result in readable
output; is this assumption incorrect?

Why should they? You've said that you want wide output. Have a look at
the definition of UTF-8 (the UTF-8 and Unicode FAQ should help, see
http://www.cl.cam.ac.uk/~mgk25/unicode.html).
My question is basically this: what is wcsftime() actually doing, and
how should I get printable output from the wide string it fills for
me?

All output is printable (unless it contains NUL or control characters,
which UTF-8 won't by definition -- I suspect that the things which look
like control characters in your output actually have the high bit set).
Whether you can read it depends on whether you have a display program
for that locale.

If you want a readable locale, ASCII and Latin1 (ISO-8859-1) are often
readable. Chinese locales are generally not readable unless you have
appropriate software loaded.

Chris C
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top