mbtowc question

N

Neil Booth

What is the behaviour of mbtowc following an attempt to convert an
invalid character sequence? My belief is that, if the encoding
is state-independent, then mbtowc should continue to work if given
a valid sequence in a subsequent call, and that if the encoding
is state-dependent, to have defined behaviour we need to reset
state to the initial state by passing a NULL pointer.

So the libc on my machine behaves as below. Is this non-conforming
like I tend to believe? If not, mbtowc would be pretty useless
in practice IMHO.

Neil.

#include <assert.h>
#include <locale.h>
#include <stdlib.h>

/* Valid 2-byte shift-JIS character, not valid UTF-8 sequence. */
const char sjis[] = "\x95\x5c";
/* Valid UTF-8, of course. */
const char space[] = " ";

int main (void)
{
wchar_t wc;

setlocale (LC_CTYPE, "ja_JP.UTF-8");

/* Assert it is not state-dependent. */
assert (mbtowc (&wc, 0, 1) == 0);

/* Assert my charset beliefs. */
assert (mbtowc (&wc, space, sizeof space) == 1);
assert (mbtowc (&wc, sjis, sizeof sjis) == -1);

/* Redundant assertion that we're not state-dependent, but
just in case some state needs resetting. */
assert (mbtowc (&wc, 0, 1) == 0);

/* This assertion fails - is this a bug? */
assert (mbtowc (&wc, space, sizeof space) == 1);

return 0;
}

$ ./a.out
assertion "mbtowc (&wc, space, sizeof space) == 1" failed: file
"/tmp/test.c", line 28, function "main"
Abort trap
$
 
B

Bart van Ingen Schenau

Neil said:
What is the behaviour of mbtowc following an attempt to convert an
invalid character sequence? My belief is that, if the encoding
is state-independent, then mbtowc should continue to work if given
a valid sequence in a subsequent call, and that if the encoding
is state-dependent, to have defined behaviour we need to reset
state to the initial state by passing a NULL pointer.

So the libc on my machine behaves as below. Is this non-conforming
like I tend to believe? If not, mbtowc would be pretty useless
in practice IMHO.

This looks like a bug in the standard C library that you have.
I tried the program on my system (Debian unstable) and it ran without a
hitch. Certainly no assertion failures.

As you are resetting the state prior to the second invocation of
mbtowc(space), it should even work if _did_ have a state-dependent
locale.
I think it is time for a bug-report towards the maintainers of your
standard C library.
Bart v Ingen Schenau
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,024
Latest member
ARDU_PROgrammER

Latest Threads

Top