S
Simon Morgan
Hi,
The following code is meant to validate a string of multibyte characters
by using mbcheck() to call mblen() on each character on the string passed
to it. The problem is that it isn't working how I expect. I've included in
the comments what I think mbcheck() should be returning for each string
given my understanding of how the multibyte system works.
#include <stdio.h>
#include <stdlib.h>
int mbcheck(const char *);
int main(void) {
char *a[] = {
"\x05\x87\x80\x36\xed\xaa", /* 0 */
"\x20\xe4\x50\x88\x3f", /* -1 */
"\xde\xad\xbe\xef", /* -1 */
"\x8a\x60\x92\x74\x41" /* 0 */
};
int i;
for (i = 0; i < sizeof(a) / sizeof(a[0]); i++) {
printf("%d\n", mbcheck(a));
puts("--");
}
return 0;
}
int mbcheck(const char *s) {
int n;
for (mblen(NULL, 0); ; s += n) {
printf("checking %#.8x\n", *s);
if ((n = mblen(s, MB_CUR_MAX)) <= 0)
return n;
printf("%d\n", n);
}
}
Does mblen() rely on a locale being set? Reading the man page it doesn't
look like it. This code is for an exercise in the book "C Programming: A
Modern Approach". The strings are supposedly Shift-JIS encoded kanji and I
have no idea which locale that relates to if there is one.
Also could somebody please explain to me what's with all the hexadecimal
f's in the output? As you've probably realised I'm still learning C but
seeing as s points to a char shouldn't printf() only be reading 1 byte and
padding the output with 0?
Many thanks.
The following code is meant to validate a string of multibyte characters
by using mbcheck() to call mblen() on each character on the string passed
to it. The problem is that it isn't working how I expect. I've included in
the comments what I think mbcheck() should be returning for each string
given my understanding of how the multibyte system works.
#include <stdio.h>
#include <stdlib.h>
int mbcheck(const char *);
int main(void) {
char *a[] = {
"\x05\x87\x80\x36\xed\xaa", /* 0 */
"\x20\xe4\x50\x88\x3f", /* -1 */
"\xde\xad\xbe\xef", /* -1 */
"\x8a\x60\x92\x74\x41" /* 0 */
};
int i;
for (i = 0; i < sizeof(a) / sizeof(a[0]); i++) {
printf("%d\n", mbcheck(a));
puts("--");
}
return 0;
}
int mbcheck(const char *s) {
int n;
for (mblen(NULL, 0); ; s += n) {
printf("checking %#.8x\n", *s);
if ((n = mblen(s, MB_CUR_MAX)) <= 0)
return n;
printf("%d\n", n);
}
}
Does mblen() rely on a locale being set? Reading the man page it doesn't
look like it. This code is for an exercise in the book "C Programming: A
Modern Approach". The strings are supposedly Shift-JIS encoded kanji and I
have no idea which locale that relates to if there is one.
Also could somebody please explain to me what's with all the hexadecimal
f's in the output? As you've probably realised I'm still learning C but
seeing as s points to a char shouldn't printf() only be reading 1 byte and
padding the output with 0?
Many thanks.