isprint() equivalent for ISO-8859-15?

I

Ian Chard

Hi,

I need to be able to tell if a character in the ISO-8859-15 codeset
(i.e. 8-bit ASCII, incorporating things like accented 'a' or euro currency
symbol) is printable. Ordinarily I'd use isprint(), but obviously this is
only going to work for 7-bit "true ASCII" characters.

I've thought of running the program in a different locale, but I want to
catch *all* printable characters, not only those used by a specific
locale. It'd be nice for the code to be as portable as possible, too.

I could just write a macro to range-check the character (it's not like the
standard's going to change), but I'd prefer a cleaner way if possible!

All help gratefully appreciated.

Cheers
- Ian
 
M

Martin Dickopp

Ian Chard said:
I need to be able to tell if a character in the ISO-8859-15 codeset
(i.e. 8-bit ASCII, incorporating things like accented 'a' or euro currency
symbol) is printable.

#define IS_ISO_8859_15(c) \
(((unsigned char)(c) >= 0x20 && (unsigned char)(c) < 0x7F) \
|| ((unsigned char)(c) >= 0xA0 && (unsigned char)(c) <= 0xFF))

or the function equivalent if a macro is not acceptable.
Ordinarily I'd use isprint(), but obviously this is only going to work
for 7-bit "true ASCII" characters.

Actually, it is locale dependent how isprint behaves.
I've thought of running the program in a different locale, but I want to
catch *all* printable characters, not only those used by a specific
locale. It'd be nice for the code to be as portable as possible, too.

If you restrict yourself to a specific character encoding (ISO-8859-15),
the code will obviously not run correctly on systems with incompatible
character encodings. Therefore, it is by definition not portable.

If you want the most portable solution, use `isprint'.
I could just write a macro to range-check the character (it's not like the
standard's going to change), but I'd prefer a cleaner way if possible!

If you want to check for a printable character in a portable way, use
`isprint'. If you want to check for a character printable in ISO-8859-15
encoding, use something like the macro above.

Martin
 
A

Alex

Ian Chard said:
I need to be able to tell if a character in the ISO-8859-15 codeset [snip]
I've thought of running the program in a different locale, but I want to
catch *all* printable characters, not only those used by a specific
locale.

A printable character in one locale may not be in another. In other words,
the concept of a printable character is inherently locale-specific.

You seem to be asking for the logical "or" of isprint() return values for
all possible locales - this does not make sense at all. Perhaps I
misunderstand what you are asking.

Alex
 
D

Dan Pop

In said:
#define IS_ISO_8859_15(c) \
(((unsigned char)(c) >= 0x20 && (unsigned char)(c) < 0x7F) \
|| ((unsigned char)(c) >= 0xA0 && (unsigned char)(c) <= 0xFF))

The casts to unsigned char are wrong, drop them. They make
IS_ISO_8859_15(288) return true on any implementation with
UCHAR_MAX == 256.

Dan
 
M

Micah Cowan

The casts to unsigned char are wrong, drop them. They make
IS_ISO_8859_15(288) return true on any implementation with
UCHAR_MAX == 256.

Why does that make them wrong? I would expect the above macro to
be documented similar to "expects a char [or unsigned char] as
argument", in which case a caller giving an argument of 288
should /expect/ undefined results in the situation you've
described.
 
D

Dik T. Winter

> (e-mail address removed) (Dan Pop) writes: ....
> >
> > The casts to unsigned char are wrong, drop them. They make
> > IS_ISO_8859_15(288) return true on any implementation with
> > UCHAR_MAX == 256.
>
> Why does that make them wrong? I would expect the above macro to
> be documented similar to "expects a char [or unsigned char] as
> argument",

That is *not* the wording for "isprint" and friends. There the argument is
an int with a value that is representable as unsigned char, or is EOF.
IS_ISO_8859_15(EOF) most likely will yield true.
> in which case a caller giving an argument of 288
> should /expect/ undefined results in the situation you've
> described.

Note that "isascii" is a common extension which expects an arbitrary
integer. Your macro more resembles isprint. And wouldn't it be easier
to just use "setlocale"?
 
D

Dan Pop

In said:
[email protected] (Dan Pop) said:
The casts to unsigned char are wrong, drop them. They make
IS_ISO_8859_15(288) return true on any implementation with
UCHAR_MAX == 256.

Why does that make them wrong? I would expect the above macro to
be documented similar to "expects a char [or unsigned char] as
argument", in which case a caller giving an argument of 288
should /expect/ undefined results in the situation you've
described.

Since I haven't seen the documentation of IS_ISO_8859_15, I would expect
it to do the job advertised by its name.

Dan
 
M

Micah Cowan

Dik T. Winter said:
(e-mail address removed) (Dan Pop) writes: ...
#define IS_ISO_8859_15(c) \
(((unsigned char)(c) >= 0x20 && (unsigned char)(c) < 0x7F) \
|| ((unsigned char)(c) >= 0xA0 && (unsigned char)(c) <= 0xFF))

The casts to unsigned char are wrong, drop them. They make
IS_ISO_8859_15(288) return true on any implementation with
UCHAR_MAX == 256.

Why does that make them wrong? I would expect the above macro to
be documented similar to "expects a char [or unsigned char] as
argument",

That is *not* the wording for "isprint" and friends. There the argument is
an int with a value that is representable as unsigned char, or is EOF.
IS_ISO_8859_15(EOF) most likely will yield true.

I didn't say it was. The above is not isprint(). I do agree that
it should probably act the same way as isprint()--but it wouldn't
have to. Handling 288 correctly, though (assuming it's out of
range for an unsigned char) is not something one should expect of
isprint(), either.
Note that "isascii" is a common extension which expects an arbitrary
integer. Your macro more resembles isprint. And wouldn't it be easier
to just use "setlocale"?

Not my macro. And I agree about setlocale().
 
M

Micah Cowan

In said:
Why does that make them wrong? I would expect the above macro to
be documented similar to "expects a char [or unsigned char] as
argument", in which case a caller giving an argument of 288
should /expect/ undefined results in the situation you've
described.

Since I haven't seen the documentation of IS_ISO_8859_15, I would expect
it to do the job advertised by its name.

Fair enough: perhaps this should serve as a reminder to all that,
no matter how small their example, they should always specify
exactly *how* it is meant be used, even if you think it should be
obvious from reading.
 
T

those who know me have no need of my name

in comp.lang.c i read:
I need to be able to tell if a character in the ISO-8859-15 codeset
(i.e. 8-bit ASCII, incorporating things like accented 'a' or euro currency
symbol) is printable. Ordinarily I'd use isprint(), but obviously this is
only going to work for 7-bit "true ASCII" characters.

isprint will work fine if the locale is set appropriately.
I've thought of running the program in a different locale, but I want to
catch *all* printable characters, not only those used by a specific
locale.

on most systems a uni/single-byte character sequence cannot compose all
possible printable characters, char having a range well below that
necessary for even the small number of unique characters in the iso-8859-x
repertoires. for this to be possible you need multi-byte character
sequences.

yet if your stream contains mbcs then isprint is totally inappropriate, you
need to convert to a wchar_t or wint_t and use iswprint, and the conversion
demands that the locale be set appropriately.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,190
Latest member
ClayE7480

Latest Threads

Top