A character with a negative value

E

Eric Sosman

[...]

Sorry, I though you were explaining Martin Wells' claim "Can't happen
in a conforming implementation." It wasn't clear from context that you
were pointing out the limitations that made his claim inaccurate.

It was Chris Dollin's claim, in reference to
Martin Wells' example.
 
O

Old Wolf

Martin's scenario seems a bit fanciful, but as far
as I can tell it is permitted by the Standard. I see
no requirement that toupper((unsigned char)ch) and
tolower((unsigned char)ch) must have the same sign.

toupper and tolower are defined to return non-negative
int values. Martin's original suggestion of tolower(17)
returning -8 is not possible.
 
E

Eric Sosman

Old Wolf wrote On 11/01/07 17:56,:
toupper and tolower are defined to return non-negative
int values. Martin's original suggestion of tolower(17)
returning -8 is not possible.

Sorry; you're right. I should have said there's no
requirement that

(char)toupper((unsigned char)ch)
and
(char)tolower((unsigned char)ch)

have the same sign.
 
A

Army1987

Plain char may be signed or unsigned. Typical ranges could be:

CHAR_MIN == -128, CHAR_MAX == 127

CHAR_MIN == 0, CHAR_MAX == 255

The Standard says that the behaviour is undefined if we pass an
argument to the "to*" functions whose value is outside the range of 0
through UCHAR_MAX. This most certainly should have been CHAR_MIN
through CHAR_MAX.
getc() & co. return an int. putc() & co. return an int. I hardly
ever use a char to store a single character. If you hold all
characters (except those in strings) as an unsigned character
stored in an int, you won't have that problem.
 
B

Ben Pfaff

Army1987 said:
getc() & co. return an int. putc() & co. return an int. I hardly
ever use a char to store a single character. If you hold all
characters (except those in strings) as an unsigned character
stored in an int, you won't have that problem.

Aren't most characters in fact stored in strings? That is the
case in my own programs.
 
J

James Kuyper

Eric said:
[...]

Sorry, I though you were explaining Martin Wells' claim "Can't happen
in a conforming implementation." It wasn't clear from context that you
were pointing out the limitations that made his claim inaccurate.

It was Chris Dollin's claim, in reference to
Martin Wells' example.

I apologize for the misattribution! I must have gotten lost somewhere
while tracing things back.
 
C

Chris Dollin

He just said "alphabetical". He didn't say English, ...

That's true, but if he'd meant characters other than the ones that we
/know/ a C program handles he would have said so, yes?

(One of my later messages made that restriction specific, just in case
that was the issue Martin was addressing.)
 
J

James Kuyper

Chris said:
That's true, but if he'd meant characters other than the ones that we
/know/ a C program handles he would have said so, yes?

No, I can't say that I'd have expected him to say so explicitly. The
context was a question about the legal argument range of the to*()
functions, and that range is very definitely intended to be sufficient
to handle the extended execution character set, not just the basic
execution character set. The to*() functions have locale-dependent
behavior, and have to be usable in all supported locales, not just the
"C" locale.
 
C

Chris Dollin

James said:
No, I can't say that I'd have expected him to say so explicitly. The
context was a question about the legal argument range of the to*()
functions, and that range is very definitely intended to be sufficient
to handle the extended execution character set, not just the basic
execution character set. The to*() functions have locale-dependent
behavior, and have to be usable in all supported locales, not just the
"C" locale.

OK, I can accept that I may have been over-reading his "alphabetical".
(Thinks: how to avoid making that mistake in the future?)
 
E

Eric Sosman

Chris said:
That's true, but if he'd meant characters other than the ones that we
/know/ a C program handles he would have said so, yes?

The characters we /know/ a C program handles have codes
in the range CHAR_MIN through CHAR_MAX. And that's the problem:
Even if you're operating in the "C" locale where all alphabetic
characters are positive, you still may encounter negative values
in a string. Many letters and signs outside the ASCII set will
have negative codes on some machines, and your program may have
its own private Götterdämmerung if it hands those negative values
to <ctype.h> functions, even if operating in the "C" locale. The
undefinedness of the behavior is locale-independent.

It's onerous and awkward, but it's important to cast `char'
to `unsigned char' when calling a <ctype.h> function. (But note
that the cast is *incorrect* when the argument is an `int' as
returned by getc(), etc.)
 
C

Chris Dollin

Eric said:
The characters we /know/ a C program handles have codes
in the range CHAR_MIN through CHAR_MAX. And that's the problem:
Even if you're operating in the "C" locale where all alphabetic
characters are positive, you still may encounter negative values
in a string. Many letters and signs outside the ASCII set will
have negative codes on some machines, and your program may have
its own private Götterdämmerung if it hands those negative values
to <ctype.h> functions, even if operating in the "C" locale. The
undefinedness of the behavior is locale-independent.

It's onerous and awkward, but it's important to cast `char'
to `unsigned char' when calling a <ctype.h> function.

This I do not -- and did not -- deny.
 
M

Martin Wells

Let's say we want to write a C89 fully-portable algorithm for making
every character in a string uppercase (where applicable). Is there
anything wrong with the following?

void AllUpper(char *p)
{
while (*p++ = toupper(*p));
}

Sure, toupper will have unexpected results if the character value is
negative; but, if the implementation has signed char for plain char,
and has negative numbers for some of its characters, isn't it the
implementation's problem to make sure it works properly?

Assuming that the cast to unsigned char is necessary, what will happen
to values which are negative? Will they become corrupt?

Martin
 
R

Richard Heathfield

Martin Wells said:
Let's say we want to write a C89 fully-portable algorithm for making
every character in a string uppercase (where applicable). Is there
anything wrong with the following?

void AllUpper(char *p)
{
while (*p++ = toupper(*p));
}

Yes. The lack of a cast exposes the call to unnecessary risk, and I'm
fairly sure you didn't mean to use ++ in quite such a cavalier manner. I'd
be happier to see it written like this:

void AllUpper(char *p)
{
while(*p)
{
*p = toupper((unsigned char)*p);
++p;
}
}
Sure, toupper will have unexpected results if the character value is
negative; but, if the implementation has signed char for plain char,
and has negative numbers for some of its characters, isn't it the
implementation's problem to make sure it works properly?

Nope. The implementation's problem is to supply a toupper that works
according to spec on inputs that are either EOF or representable as an
unsigned char. If you hand it negative values, a crash is by no means
ruled out. For example, consider an implementation that defines EOF as -1,
and implements toupper as follows:

#define toupper(x) (__convtoupper[(x) + 1])

On such an implementation, toupper(-42), say, could easily crash the
program or even the machine.
 
E

Eric Sosman

Martin Wells wrote On 11/02/07 12:32,:
Let's say we want to write a C89 fully-portable algorithm for making
every character in a string uppercase (where applicable). Is there
anything wrong with the following?

void AllUpper(char *p)
{
while (*p++ = toupper(*p));
}

Yes, if `char' is signed.
Sure, toupper will have unexpected results if the character value is
negative; but, if the implementation has signed char for plain char,
and has negative numbers for some of its characters, isn't it the
implementation's problem to make sure it works properly?

The implementation must work properly when used properly.
When used improperly (as above), all bets are off.

Note that a `char' with the value -1 is likely to be
mistaken for EOF. (EOF is the only legal negative argument
Assuming that the cast to unsigned char is necessary, what will happen
to values which are negative? Will they become corrupt?

Demons will fly from their serifs. Undefined behavior
is "undefined."

In typical implementations where `char' is not too wide,
the <ctype.h> functions use their argument values to index
predefined arrays. Feed such an implementation an out-of-
range argument, and it will try to use that argument as an
array index, with outcomes similar to those you get when you
wander outside your own arrays. You might get a garbage
result like toupper('µ') == '7', or you might get something
like a SIGSEGV or GPF. Some implementations may try to be
helpful by making a spare copy of half the array, but they're
still likely to misbehave with 'ÿ' (Unicode U00FF, easily
confused with EOF on systems with 8-bit signed characters).
 
C

Chris Dollin

Martin said:
Let's say we want to write a C89 fully-portable algorithm for making
every character in a string uppercase (where applicable). Is there
anything wrong with the following?

void AllUpper(char *p)
{
while (*p++ = toupper(*p));
}

Doesn't that fall afoul of undefined behaviour? `p` is assigned in
the expression, /and/ it's value is accessed for reasons other than
to determine its new value.
 
P

pete

Chris said:
Doesn't that fall afoul of undefined behaviour?
Yes.

`p` is assigned in
the expression, /and/ it's value is accessed for reasons other than
to determine its new value.

N869
6.5 Expressions
[#2] Between the previous and next sequence point an object
shall have its stored value modified at most once by the
evaluation of an expression. Furthermore, the prior value
shall be accessed only to determine the value to be
stored.

60)This paragraph renders undefined statement expressions
such as
i = ++i + 1;
a[i++] = i;
while allowing
i = i + 1;
a = i;
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,173
Latest member
GeraldReund
Top