[half OT] About the not-in-common range of signed and unsigned char

F

Francesco S. Carta

Hi there,
when I create some less-than-trivial console program that involves some
kind of pseudo-graphic interface I resort to using the glyphs that lie
in the range [-128, -1] - the simple "char" type is signed in my
implementation.

You know, all those single/double borders, corners, crosses,
pseudo-shadow (dithered) boxes and so on.

Since those characters mess up the encoding of my files, I cannot put
them straight into the source code as char-literals, I have to hard-code
their numeric values.

I noticed that, at least on my implementation, it doesn't make any
difference if I assign a negative value to an unsigned char - the
expected glyph shows up correctly - hence I think I wouldn't have to
worry if the same code is run on an implementation where char is unsigned.

My questions:

- what assumptions (if any) can I make about the presence of those
out-of-common-range characters and their (correct) correspondence with
the codes I use to hard-code?

- assuming it is possible to, how can I ensure that my program displays
the correct "graphics" regardless of the platform / implementation it is
compiled onto?

Note: resorting to an external library that "does the stuff for me" is
not an option here, I'm asking in order to learn, not just to solve an
issue.

Thank you for your attention.
 
F

Francesco S. Carta

Hi there,
when I create some less-than-trivial console program that involves some
kind of pseudo-graphic interface I resort to using the glyphs that lie
in the range [-128, -1] - the simple "char" type is signed in my
implementation.

You know, all those single/double borders, corners, crosses,
pseudo-shadow (dithered) boxes and so on.

Since those characters mess up the encoding of my files, I cannot put
them straight into the source code as char-literals, I have to hard-code
their numeric values.

I noticed that, at least on my implementation, it doesn't make any
difference if I assign a negative value to an unsigned char - the
expected glyph shows up correctly - hence I think I wouldn't have to
worry if the same code is run on an implementation where char is
unsigned.

My questions:

- what assumptions (if any) can I make about the presence of those
out-of-common-range characters and their (correct) correspondence with
the codes I use to hard-code?

You need to ask this in the newsgroup for your OS and/or your terminal
because those things are hardware- and platform-specific. Those
characters are not part of the basic character set, C++ knows nothing
about them.
- assuming it is possible to, how can I ensure that my program displays
the correct "graphics" regardless of the platform / implementation it is
compiled onto?

There is no way.
Note: resorting to an external library that "does the stuff for me" is
not an option here, I'm asking in order to learn, not just to solve an
issue.

<shrug> Whatever.

I'm sorry if my post disturbed you: I explicitly marked it as "[half
OT]" and I posted it here for a reason, which should be evident.

Nonetheless, thank you for your reply, Victor - that's just what I was
looking for: the confirmation that I cannot portably resort to those
graphics, so that I'll avoid struggling for something that isn't
achievable - this is "learning", for me.
 
F

Francesco S. Carta

On 7/13/2010 7:01 PM, Francesco S. Carta wrote:
Hi there,
when I create some less-than-trivial console program that involves some
kind of pseudo-graphic interface I resort to using the glyphs that lie
in the range [-128, -1] - the simple "char" type is signed in my
implementation.

You know, all those single/double borders, corners, crosses,
pseudo-shadow (dithered) boxes and so on.

Since those characters mess up the encoding of my files, I cannot put
them straight into the source code as char-literals, I have to
hard-code
their numeric values.

I noticed that, at least on my implementation, it doesn't make any
difference if I assign a negative value to an unsigned char - the
expected glyph shows up correctly - hence I think I wouldn't have to
worry if the same code is run on an implementation where char is
unsigned.

My questions:

- what assumptions (if any) can I make about the presence of those
out-of-common-range characters and their (correct) correspondence with
the codes I use to hard-code?

You need to ask this in the newsgroup for your OS and/or your terminal
because those things are hardware- and platform-specific. Those
characters are not part of the basic character set, C++ knows nothing
about them.

- assuming it is possible to, how can I ensure that my program displays
the correct "graphics" regardless of the platform / implementation
it is
compiled onto?

There is no way.

Note: resorting to an external library that "does the stuff for me" is
not an option here, I'm asking in order to learn, not just to solve an
issue.

<shrug> Whatever.

I'm sorry if my post disturbed you: I explicitly marked it as "[half
OT]" and I posted it here for a reason, which should be evident.

It didn't disturb me. I am sorry you thought I did (why did you think
that?).

Your last line above ("<shrug> Whatever.") made me think that the whole
post disturbed or at least annoyed you. I'm glad to discover that I
misinterpreted your post :)
And the only reason evident to me is that you asked a valid
question on C++. What other reason would one need?

That was a "combined" reply, relative to my misinterpretation of your
post /and/ to the fact that you pointed me to another group. The reason
for posting it here is exactly the one you noted: it's about C++ - even
though it was likely to be a platform-specific issue - "half OT", as I
said ;-)
Well, you seemed to post when you already knew the answer (although I
can still be mistaken). You either need to use somebody else's library
(which will represent an abstraction layer for you, and behind the
scenes its code is platform-specific, regardless what language it is
implemented in) or implement that functionality yourself, essentially
reproducing the same library.

Technically no, I didn't "know" the answer, I just suspected it, hence I
asked for confirmation (although I didn't express my question as such).

Although it is true that I could have just relied on my understanding of
the standard, I was also hoping to get a "real life" reply on the lines
of "on windows and linux you're pretty much safe assuming those
characters [are|aren't] available and [have|haven't] the same values,
I've tried [this] and [that], and [that other] gave me problems, YMMV,
do some tests".

[ besides: the threads here happen to see people dropping in with
not-strictly-related comments which are precious, at times, because they
lead me to investigate new things - posting stuff like this is (also)
another chance to see those kind of "lateral" follow-ups ]

Thank you for your clarification and for the further details.
 
J

Jonathan Lee

- what assumptions (if any) can I make about the presence of those
out-of-common-range characters and their (correct) correspondence with
the codes I use to hard-code?

signed to unsigned conversion is well-defined in [conv.integral]. If
you're storing these numbers in (signed) chars as negatives, they'll
predictably be changed to unsigned char. You should be okay so long
as CHAR_BIT is appropriate.

For example, suppose you have signed char c = -41, and are going to
cast this to char. If char is signed, no problem. If char is unsigned
then the result is (1 << CHAR_BIT) - 41. Suppose CHAR_BIT is 8, then
the
result is 215. If CHAR_BIT is 9, you'll get 471. The former probably
will probably be the same character in whatever extended ASCII as
-41. The latter, probably not. So I guess you could have an #if
to watch this.

Of course, there are different versions of extended ASCII, and even
non-ASCII so -41 isn't really guaranteed to be anything in particular.
But you can know the result of converting to unsigned. Whereas
conversion from unsigned to signed is not defined. I guess that's
my point.
- assuming it is possible to, how can I ensure that my program displays
the correct "graphics" regardless of the platform / implementation it is
compiled onto?

If those characters were guaranteed to be present in _some_ order,
it might be conceivable. But they're not. How could you display
"filled
in square" on a platform that doesn't have such a character?

--Jonathan
 
F

Francesco S. Carta

- what assumptions (if any) can I make about the presence of those
out-of-common-range characters and their (correct) correspondence with
the codes I use to hard-code?

signed to unsigned conversion is well-defined in [conv.integral]. If
you're storing these numbers in (signed) chars as negatives, they'll
predictably be changed to unsigned char. You should be okay so long
as CHAR_BIT is appropriate.

For example, suppose you have signed char c = -41, and are going to
cast this to char. If char is signed, no problem. If char is unsigned
then the result is (1<< CHAR_BIT) - 41. Suppose CHAR_BIT is 8, then
the
result is 215. If CHAR_BIT is 9, you'll get 471. The former probably
will probably be the same character in whatever extended ASCII as
-41. The latter, probably not. So I guess you could have an #if
to watch this.

Of course, there are different versions of extended ASCII, and even
non-ASCII so -41 isn't really guaranteed to be anything in particular.
But you can know the result of converting to unsigned. Whereas
conversion from unsigned to signed is not defined. I guess that's
my point.

I didn't consider that CHAR_BIT problem at all, thank you for pointing
it out Jonathan.

I think I'd work around this by checking if the normal char is signed or
not, and filling the appropriate table with the appropriate values - so
that I'll avoid signed/unsigned conversions completely.
If those characters were guaranteed to be present in _some_ order,
it might be conceivable. But they're not. How could you display
"filled
in square" on a platform that doesn't have such a character?

I think I've discovered my true point, I'm interested into a subset of:

http://en.wikipedia.org/wiki/Code_page_437

which, as it seems, "is still the primary font in the core of any EGA
and VGA compatible graphic card".

If I decide to spend some effort in making some portable program that
uses them, I'd have to find a way to activate that code page or
something comparable as explained in:

http://en.wikipedia.org/wiki/Box_drawing_characters

and resort to acceptable replacements (such as \, /, |, - and +) in case
none of the above is available.

In this way the program could be considered "portable" enough - at least
for me ;-)

Thanks a lot for your attention.
 
J

James Kanze

- what assumptions (if any) can I make about the presence
of those out-of-common-range characters and their (correct)
correspondence with the codes I use to hard-code?
signed to unsigned conversion is well-defined in [conv.integral]. If
you're storing these numbers in (signed) chars as negatives, they'll
predictably be changed to unsigned char. You should be okay so long
as CHAR_BIT is appropriate.

He needs a CHAR_BIT which is at least 8, which is guaranteed.

In practice, I'd use the positive (actually defined) values, and
not some negative mapping, even if char is signed.

I'd use 0xD7, rather than -41. Formally, the conversion of this
value to char, if char's are signed, is implementation defined,
but practically, doing anything but preserving the bit pattern
would break so much code it isn't going to happen.

Formally, of course, there's no such thing as "extended
ASCII":). There are just other code sets, which happen to
correspond exactly to ASCII for the range 0-127.
I didn't consider that CHAR_BIT problem at all, thank you for pointing
it out Jonathan.
I think I'd work around this by checking if the normal char is signed or
not, and filling the appropriate table with the appropriate values - so
that I'll avoid signed/unsigned conversions completely.
I think I've discovered my true point, I'm interested into a subset of:

which, as it seems, "is still the primary font in the core of any EGA
and VGA compatible graphic card".

I don't think so, but I've not actually programmed anything at
that low a level for many, many years.

Not that it matters, since you probably can't access the graphic
card directly.
If I decide to spend some effort in making some portable program that
uses them, I'd have to find a way to activate that code page or
something comparable as explained in:

and resort to acceptable replacements (such as \, /, |, - and +) in case
none of the above is available.

Most machines don't have "code pages"; they're an MS-DOS
invention. Most modern systems *do* support Unicode, however
(under Windows, it's code page 65001 if you're using UTF-8
encoding). You might have more luck with portability if you
used Unicode characters in the range 0x2500-0x257F.
In this way the program could be considered "portable" enough - at least
for me ;-)

It's only portable to Windows.
 
F

Francesco S. Carta

Most machines don't have "code pages"; they're an MS-DOS
invention. Most modern systems *do* support Unicode, however
(under Windows, it's code page 65001 if you're using UTF-8
encoding). You might have more luck with portability if you
used Unicode characters in the range 0x2500-0x257F.

Heck, that's one of those (in)famous Columbus' eggs... thanks for the
further details James, I will resort to using Unicode characters, that's
a way better bet.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top