character byte str[i] treated as signed, I need unsigned

Susan Rice · Nov 2, 2006

I'm comparing characters via

return(str1 - str2);

and I'm having problems with 8-bit characters being treated as signed
instead of unsigned integers. The disassembly is using

movsx eax,byte ptr[edx]

to load my character in to EAX register. I need it to use movzx.
How can I recode this to treat my characters as unsigned instead of signed?

Ian Collins · Nov 2, 2006

Susan said:
I'm comparing characters via

return(str1 - str2);

and I'm having problems with 8-bit characters being treated as signed
instead of unsigned integers. The disassembly is using

movsx eax,byte ptr[edx]

to load my character in to EAX register. I need it to use movzx.
How can I recode this to treat my characters as unsigned instead of signed?

By defining (or if all else fails) casting them as unsigned. You
haven't shown the definition of str1 or str2.

Walter Roberson · Nov 2, 2006

I'm comparing characters via

return(str1 - str2);

and I'm having problems with 8-bit characters being treated as signed
instead of unsigned integers. The disassembly is using

Click to expand...

movsx eax,byte ptr[edx]

Click to expand...

to load my character in to EAX register. I need it to use movzx.
How can I recode this to treat my characters as unsigned instead of signed?

Click to expand...

Anything down at the assembly level is out of scope for this newsgroup,
which does not deal with implementation specifics.

Fortunately, you do not need to go down to that level. Try just

return( (unsigned char)str1 - (unsigned char)str2 );

Peter Nilsson · Nov 3, 2006

Susan said:
I'm comparing characters via

return(str1 - str2);

How are str1, str2 and i declared? What's the rest of the function?
How is the result meant to be used?

and I'm having problems with 8-bit characters

Click to expand...

Why do you care how many bits in a character there are?

being treated as signed instead of unsigned integers.

Click to expand...

You're telling us what you _think_ the problem is, rather than
explaining the problem itself, e.g. "i inputted this, the output
I got was this, the output I wanted was this, here is my code
and what it is meant to do."

[In other words, don't tell us the sign difference is your problem,
tell us
_why_ it's a problem.]

You should know that knee-jerk "this'll fix it" responses may not be
addressing other important issues of your code. For instance, your
methodology is not guaranteed to yeild alphabetical ordering.

[In other words, your minimalist presentation may mean you only get
a superficial (and possibly broken) solution to your problem, whilst
deeper issues with your code are left uncorrected.]

The disassembly is using

movsx eax,byte ptr[edx]

Click to expand...

Learning C by examining the disassembly is the WORST thing you can
do. When you change architectures you may find that there's an awful
lot of assumtions on your part that you'll have to unlearn.

Old Wolf · Nov 3, 2006

Walter said:
Susan Rice said:

I'm comparing characters via

Click to expand...

return(str1 - str2);

Click to expand...

and I'm having problems with 8-bit characters being treated as signed
instead of unsigned integers.

Click to expand...

return( (unsigned char)str1 - (unsigned char)str2 );

A note to the OP: this will still return a negative value for
cases such as '1' - '2'. If the intent is to return a positive
value mod 256 in all cases then write:

return (unsigned int)(str1 - str2) % 256;

(note that the return statement does not need brackets around
its expression).

Susan Rice · Nov 3, 2006

Here's the real problem I was unaware of, as explained by
Kernighan & Ritchie (whom you probably know as K&R):

"There is one subtle point about the conversion of characters to
integers. The language does not specify whether variables of type
char are signed or unsigned quantities. When a char is converted
to an int, can it ever produce a negative result? The answer varies
from machine to machine, reflecting differences in architecture.
On some machines a char whose leftmost bit is 1 will be converted
to a negative integer ("sign extension"). On others, a char is
promoted to an int by adding zeros at the left end, and thus is
always positive."
--Kernighan & Ritchie: "The C Programming Language"
(K&R, the inventors of the language.)

=?ISO-8859-1?Q?=22Nils_O=2E_Sel=E5sdal=22?= · Nov 3, 2006

Susan said:
I'm comparing characters via

return(str1 - str2);

and I'm having problems with 8-bit characters being treated as signed
instead of unsigned integers. The disassembly is using

movsx eax,byte ptr[edx]

to load my character in to EAX register. I need it to use movzx.
How can I recode this to treat my characters as unsigned instead of signed?

By declaring your array to hold unsigned chars.

Frederick Gotham · Nov 3, 2006

Susan Rice:

I'm comparing characters via

return(str1 - str2);

and I'm having problems with 8-bit characters being treated as signed
instead of unsigned integers. The disassembly is using

movsx eax,byte ptr[edx]

to load my character in to EAX register. I need it to use movzx.
How can I recode this to treat my characters as unsigned instead of

signed?

I can't say for sure without knowing exactly what you're trying to do (e.g.
do you want roll-around, etc.), but here's something simple:

return (char unsigned)( (unsigned)str1 - str2 );

Somebody else offered something akin to the following:

return (char unsigned)str1 - (char unsigned)str2;

, but the casts are redudant, as both operands will be promoted to either
"signed int" or "unsigned int" before the subtraction takes place.

Of course, I don't know what you're trying to do, but at first glance, it
looks like you're going the wrong way about it (e.g. why are you using
plain char in the first place?)

CBFalconer · Nov 3, 2006

Susan said:
Susan said:

I'm comparing characters via

return(str1 - str2);

and I'm having problems with 8-bit characters being treated as
signed instead of unsigned integers. The disassembly is using

movsx eax,byte ptr[edx]

to load my character in to EAX register. I need it to use movzx.
How can I recode this to treat my characters as unsigned instead
of signed?

Click to expand...

Here's the real problem I was unaware of, as explained by
Kernighan & Ritchie (whom you probably know as K&R):

"There is one subtle point about the conversion of characters to
integers. The language does not specify whether variables of type
char are signed or unsigned quantities. When a char is converted
to an int, can it ever produce a negative result? The answer varies
from machine to machine, reflecting differences in architecture.
On some machines a char whose leftmost bit is 1 will be converted
to a negative integer ("sign extension"). On others, a char is
promoted to an int by adding zeros at the left end, and thus is
always positive."
--Kernighan & Ritchie: "The C Programming Language"
(K&R, the inventors of the language.)

Please don't top-post. Your answer belongs after (or intermixed
with) the material you quote, after snipping portions irrelevant to
your reply. I fixed this one.

As others have said, simply use unsigned chars.

Walter Roberson · Nov 3, 2006

Frederick Gotham said:
I can't say for sure without knowing exactly what you're trying to do (e.g.
do you want roll-around, etc.), but here's something simple:

return (char unsigned)( (unsigned)str1 - str2 );

Somebody else offered something akin to the following:

Click to expand...

return (char unsigned)str1 - (char unsigned)str2;

Click to expand...

, but the casts are redudant, as both operands will be promoted to either
"signed int" or "unsigned int" before the subtraction takes place.

Click to expand...

No, I used (unsigned char) not (char unsigned) .

You are being inconsistant in your reasoning for using (char unsigned) .
Your stated reasons have to do with your usage of Irish, which
(you have said) puts the most important information first. In this
case, the part that is most important is not the size of the item
but rather the unsigned-ness, so unsigned would go first in your
reasoning.

(You might, I suppose, argue that it is quite important in the cast
operation to know that you are casting to an integral type rather than
a floating type, and that on that basis that the char should go first.
However, there are no unsigned floating types, so the appearance
of unsigned already tells you that you cannot be working
with an integral type, so using unsigned first already provides
the "This will be an integral type" hint.)

Frederick Gotham · Nov 3, 2006

Walter Roberson:

No, I used (unsigned char) not (char unsigned).

ARE YOU BRAIN DEAD ?

If I misquote you as using "int const" rather than "const int", will you
roar from a mountain top that I got it wrong?

You are being inconsistant in your reasoning for using (char unsigned).
Your stated reasons have to do with your usage of Irish, which
(you have said) puts the most important information first. In this
case, the part that is most important is not the size of the item
but rather the unsigned-ness, so unsigned would go first in your
reasoning.

Have you drilled a hole into my skull and had a look at my brain?

Don't pretend to know how I think.

(You might, I suppose, argue that it is quite important in the cast
operation to know that you are casting to an integral type rather than
a floating type, and that on that basis that the char should go first.
However, there are no unsigned floating types, so the appearance
of unsigned already tells you that you cannot be working
with an integral type, so using unsigned first already provides
the "This will be an integral type" hint.)

How about you spend more time focusing on the functionality of the code
rather than whether the pretty ribbons are green or yellow, and whether
they curl clockwise or anticlockwise.

Walter Roberson · Nov 3, 2006

Frederick Gotham said:
How about you spend more time focusing on the functionality of the code
rather than whether the pretty ribbons are green or yellow, and whether
they curl clockwise or anticlockwise.

I would point out that your offering was functionally equivilent to
mine (the one that used explicit casts in both locations), so -you-
were the one worrying about prettiness, not functionality.

You were commenting on elements of my code that did not affect
the functionality but did affect the readability, so it was completely
fair for me to comment on the elements of your code that did not
affect the functionality but did affect the readability.

Have you drilled a hole into my skull and had a look at my brain?

Do I need to locate and cite your previous articles in which
you explain your choice of syntactical order? You *did* make such
an explanation, and your most recent usage was contrary to that
explanation. You did not apply the reasoning that you had earlier
stated. We must therefore conclude that you apply your
previously-stated reasons inconsistantly; or that your previously
stated reasons were not your real reasons; or that your previously
stated reasons were not your -complete- reasons.

Don't pretend to know how I think.

You are correct that I made a misstatement. I should not have
said that,
"You are being inconsistant in your reasoning for using (char unsigned)",
I should have said,
"You are being inconsistant with your stated reasoning for using
(char unsigned)".

This allows for a possibility that I did not allow for earlier,
namely that your actual reasoning might be quite consistant but that
your actual reasoning does not match your statements about your
reasoning.

Frederick Gotham · Nov 3, 2006

Walter Roberson:

I would point out that your offering was functionally equivilent to
mine (the one that used explicit casts in both locations), so -you-
were the one worrying about prettiness, not functionality.

Actually, my intent was to point out a flaw. Let's start off with two
char's:

char a,b;

Let's say we want to add the two of these together, and for the result to
be unsigned. All we need do is:

(unsigned)a + b;

However, what _you_ proposed was:

(char unsigned)a + (char unsigned)b;

Which might be equivalent to:

(int)(char unsigned)a + (int)(char unsigned)b;

, depending on whether "char unsigned" promotes to "int" or "unsigned". On
the majority of implementations, it promotes to "int". On such systems, the
result will therefore be a signed int.

You were commenting on elements of my code that did not affect
the functionality but did affect the readability, so it was completely
fair for me to comment on the elements of your code that did not
affect the functionality but did affect the readability.

I pointed out the flaw. At times though, I also point out redundancies. If
I see:

double a;
long b,c;

a = (double)b / (double)c;

, then I'd point out that only one cast is required:

a = (double)b/c;

However I tend not to comment on things like:

int const Vs const int
i++ Vs ++i

Do I need to locate and cite your previous articles in which
you explain your choice of syntactical order?

You suggested that my word order would change because of the context.

You *did* make such an explanation, and your most recent usage was
contrary to that explanation.

_You_ think so, because of the context. Perhaps was reasoning doesn't go so
far as to take the context into account, but rather picks one syntax that
should be used throughout. Who knows?! I stopped thinking about it a long
time ago and I just go with the flow now.

You did not apply the reasoning that you had earlier
stated. We must therefore conclude that you apply your
previously-stated reasons inconsistantly; or that your previously
stated reasons were not your real reasons; or that your previously
stated reasons were not your -complete- reasons.

Or you could conclude that you do not understand my thinking, or that my
thinking takes into account the probablity that Alaska will suffer flash-
floods on account of Global Warming.

This allows for a possibility that I did not allow for earlier,
namely that your actual reasoning might be quite consistant but that
your actual reasoning does not match your statements about your
reasoning.

I am done explaining why I like red ribbons that turn clockwise on my
bicycle. Please see past the ribbons and look at the actual bicycle, as
I've had my fill of explaining my preference.

Richard Heathfield · Nov 3, 2006

Frederick Gotham said:

If I see:

double a;
long b,c;

a = (double)b / (double)c;

, then I'd point out that only one cast is required:

a = (double)b/c;

....and then I'd point out that *no* cast is required:

a = b;
a /= c;

Frederick Gotham · Nov 3, 2006

Richard Heathfield:

...and then I'd point out that *no* cast is required:

a = b;
a /= c;

In performing an assignment, you give the idea that you need to store a
value. For instance, consider:

a = (Type)b+c;

in place of:

a = b;
a += c;

The latter version may result in less efficient code than the former
version, because when a compiler sees an assignment statement, it's first
thought will be "hmm, I have to store a value".

The former version explicitly demonstrates that both the value of b and c
can be discarded, leaving the door wide open for the compiler to do
whatever it likes (e.g. make use of CPU registers).

Of course, I'm sure you can find an optimiser which will make the same
machine code for both of them.

Both of our methods work. Perhaps you prefer _your_ method. Perhaps _I_
prefer _my_ method. Let's not argue over whether pretty green anticlockwise
ribbons are better than pretty red clockwise ribbons.

Richard Heathfield · Nov 3, 2006

Frederick Gotham said:

Both of our methods work. Perhaps you prefer _your_ method. Perhaps _I_
prefer _my_ method. Let's not argue over whether pretty green
anticlockwise ribbons are better than pretty red clockwise ribbons.

This isn't a matter of preference, but of fact. You claimed that one cast is
*required*. I merely demonstrated that your claim is false.

Ian Collins · Nov 4, 2006

Frederick said:
Richard Heathfield:

In performing an assignment, you give the idea that you need to store a
value. For instance, consider:

a = (Type)b+c;

in place of:

a = b;
a += c;

The latter version may result in less efficient code than the former
version, because when a compiler sees an assignment statement, it's first
thought will be "hmm, I have to store a value".

Or it may (probably) won't, it could impede the optimiser, making it
less efficient. Don't get so hung up on speculative micro
optimisations, let the compiler do it's job.

Old Wolf · Nov 5, 2006

Frederick said:
Have you drilled a hole into my skull and had a look at my brain?
Don't pretend to know how I think.

Do you know something about neuroscience that the rest
of us don't ?

Frederick Gotham · Nov 5, 2006

Old Wolf:

Do you know something about neuroscience that the rest
of us don't ?

No, but Wikipedia is your friend:

http://en.wikipedia.org/wiki/Neuroscience

comparison between signed and unsigned	4	Jul 13, 2008
Working with unsigned/signed types	0	Dec 20, 2006
Unpacking signed shorts and integers with specified endianness	4	Jun 18, 2007
Is char obsolete?	20	Apr 8, 2011
Adding an unsigned byte type to the JVM	4	Nov 13, 2005
Should I use "char" or "unsigned char" for strings?	4	Mar 28, 2005
XMLHTTP character issue - converting byte array to string	4	Jan 31, 2006
string comparison: signed or unsigned char?	0	Jul 14, 2004

character byte str[i] treated as signed, I need unsigned

Susan Rice

Ian Collins

Walter Roberson

Peter Nilsson

Old Wolf

Susan Rice

=?ISO-8859-1?Q?=22Nils_O=2E_Sel=E5sdal=22?=

Frederick Gotham

CBFalconer

Walter Roberson

Frederick Gotham

Walter Roberson

Frederick Gotham

Richard Heathfield

Frederick Gotham

Richard Heathfield

Ian Collins

Old Wolf

Frederick Gotham

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads