Hex to int

R

rajash

Thanks to everyone for interesting discussions. To make the group
happy, my next listing has int main and headers declared!!

Here is my solution to Exercise 2.3.

// write htoi, hex to int

#include<stdio.h>

int main(int c, char **v)
{
int x=-1, htoi();
if(c>1)
x=htoi(v[1]);
if(x>=0)
printf("%d\n", x);
else
printf("Unspecified error.\n");
}

int htoi(char *s)
{
char *t=s;
int x=0, y=1;
if(*s=='0' && (s[1]=='x' || s[1]=='X'))
t+=2;
s+=strlen(s);
while(t<=--s) {
if('0' <= *s && *s <= '9')
x+=y*(*s - '0');
else if('a' <= *s && *s <= 'f')
x+=y*(*s - 'a' + 10);
else if('A' <= *s && *s <= 'F')
x+=y*(10 + *s - 'A');
else
return -1; /* invalid input! */
y<<=4;
}
return x;
}
 
F

Flash Gordon

Thanks to everyone for interesting discussions. To make the group
happy, my next listing has int main and headers declared!!

If your only reason for doing it is to keep the group happy then you
don't understand the issues and should reread what people have told you.
Either that or choose something other than programming.

Also you have *not* included all the required headers.
Here is my solution to Exercise 2.3.

// write htoi, hex to int

#include<stdio.h>

The space shortage ended some years back, so it will be easier to read
if you use some
#include said:
int main(int c, char **v)

Conventionally the parameters are called argc and argv. Although other
names are legal it makes it easier for everyone if you use the
conventional names.
{
int x=-1, htoi();

Very bad style on two counts.
1) You have not used the prototype style of declaration so the compiler
cannot check the number and type of parameters passed to htoi.
2) It is generally considered better to not declare the functions
locally but globally, since they are visible globally. So it would be
better to put "int htoi(char *s);" at file scope or define htos prior to
main.
if(c>1)
x=htoi(v[1]);
if(x>=0)
printf("%d\n", x);
else
printf("Unspecified error.\n");

How about returning the value from main that the return type promisses
will be returned?
}

int htoi(char *s)
{
char *t=s;
int x=0, y=1;

Why not try using more than one character for variable names? It makes
it *far* easier to read if you use meaningful names.
if(*s=='0' && (s[1]=='x' || s[1]=='X'))
t+=2;
s+=strlen(s);

You need string.h for strlen, without it your program invokes undefined
behaviour and could fail on some implementations, for example if size_t
(the return type of strlen) is 64 bits but int is only 32 bits.
while(t<=--s) {
if('0' <= *s && *s <= '9')
x+=y*(*s - '0');
else if('a' <= *s && *s <= 'f')
x+=y*(*s - 'a' + 10);

The letters are not guaranteed to be consecutive and one some systems
they are not.
else if('A' <= *s && *s <= 'F')
x+=y*(10 + *s - 'A');
else
return -1; /* invalid input! */
y<<=4;

None of the above handles overflow properly. Overflow of signed types
invokes undefined behaviour and might not (on some implementations will
not) do what you expect.
 
K

Keith Thompson

Thanks to everyone for interesting discussions. To make the group
happy, my next listing has int main and headers declared!!

Here is my solution to Exercise 2.3.

// write htoi, hex to int

#include<stdio.h>

int main(int c, char **v)

The traditional names for the parameters to main() are argc and argv.
Using different names is legal, but a very bad idea; it just makes
your code more difficult to read.
{
int x=-1, htoi();

Putting a function declaration inside a function definition is legal,
but not a good idea. A function declaration with empty parentheses
says that the function takes a fixed but unspecied number and type(s)
of arguments.

Either declare htoi() with a full prototype at file scope (outside main()):

int htoi(char *s);

or move the full definition of htoi() above the definition of main(),
so you don't need a separate declaration.
if(c>1)
x=htoi(v[1]);
if(x>=0)
printf("%d\n", x);
else
printf("Unspecified error.\n");

Error messages are normally printed to stderr, not stdout.

You could detect the specific error of invoking main() with no
arguments. Consider something like this:

if (argc <= 1) {
fprintf(stderr, "Usage: %s arg\n", argv[0]);
exit(EXIT_FAILURE);
}
}

int htoi(char *s)

Consider declaring this static, since it's not used from any other
translation units.
{
char *t=s;
int x=0, y=1;
if(*s=='0' && (s[1]=='x' || s[1]=='X'))
t+=2;
s+=strlen(s);
while(t<=--s) {
if('0' <= *s && *s <= '9')

You can use the isdigit() function for this.
x+=y*(*s - '0');
else if('a' <= *s && *s <= 'f')
x+=y*(*s - 'a' + 10);
else if('A' <= *s && *s <= 'F')
x+=y*(10 + *s - 'A');

You're assuming that the numeric codes for the letters 'a'..'f' and
'A'..'F' are contiguous. This is not guaranteed, and there are
character sets where the alphabet is not contiguous (though in the one
example of this that I know of, 'a'..'f' and 'A'..'F' happen to be
contiguous). Consider using the isxdigit() function.
else
return -1; /* invalid input! */
y<<=4;
}
return x;
}

These are mostly superficial points. I'd take a closer look at the
algorithm, but your use of meaningless single-character identifiers
makes your code difficult to read.
 
B

Ben Bacarisse

Here is my solution to Exercise 2.3.
<snip>

I'll comment only on the thing not yet pointed out...
int htoi(char *s)
{
char *t=s;
int x=0, y=1;
if(*s=='0' && (s[1]=='x' || s[1]=='X'))
t+=2;
s+=strlen(s);
while(t<=--s) {

This loop condition is likely to invoke undefined behaviour. Can you
see why?
 
B

Ben Bacarisse

Here is my solution to Exercise 2.3.

<snip>

I'll comment only on the thing not yet pointed out...
int htoi(char *s)
{
char *t=s;
int x=0, y=1;
if(*s=='0' && (s[1]=='x' || s[1]=='X'))
t+=2;
s+=strlen(s);
while(t<=--s) {

This loop condition is likely to invoke undefined behaviour. Can you
see why?

If strlen(s) equals 0.

No. It can invoke UB when strlen(s) != 0 and it may not invoke UB in
some cases when strlen(s) == 0. Anyway, the question was partly
rhetorical. I think people learn better when they think, so I rather
hoped that my question would remain unanswered (at least for some while).
 
R

RoS

In data Mon, 03 Dec 2007 13:11:05 +0000, Ben Bacarisse scrisse:
rajash@thisisnotmyrealemail said:
Here is my solution to Exercise 2.3.
<snip>

I'll comment only on the thing not yet pointed out...
int htoi(char *s)
{
char *t=s;
int x=0, y=1;
if(*s=='0' && (s[1]=='x' || s[1]=='X'))
t+=2;
s+=strlen(s);
while(t<=--s) {

This loop condition is likely to invoke undefined behaviour. Can you
see why?

is it because if s point to "" => strlen(s)==0; (s+=0)==s and --s
point to memory not allowed to change (or read)?
 
B

Ben Bacarisse

RoS said:
In data Mon, 03 Dec 2007 13:11:05 +0000, Ben Bacarisse scrisse:
rajash@thisisnotmyrealemail said:
Here is my solution to Exercise 2.3.
<snip>

I'll comment only on the thing not yet pointed out...
int htoi(char *s)
{
char *t=s;
int x=0, y=1;
if(*s=='0' && (s[1]=='x' || s[1]=='X'))
t+=2;
s+=strlen(s);
while(t<=--s) {

This loop condition is likely to invoke undefined behaviour. Can you
see why?

is it because if s point to "" => strlen(s)==0; (s+=0)==s and --s
point to memory not allowed to change (or read)?

s pointing to "" and/or strlen(s) being zero has nothing to do with
it. Just consider a simple call like htoi("20") and work out what has
to happen for the while loop to stop.

Whenever I see a while loop, I negate the test in my head as ask "is
this condition safe?". When there is code after the loop you can
check that that code makes sense in an environment where the loop
condition is false (not significant in this case).

It gets a little messy when there are assignment operators (and
equivalents) in the test but it is still worth doing. In languages
like C, you have to scan the body for breaks, gotos and returns but
that is simple enough.
 
R

rajash

Ben said:
Here is my solution to Exercise 2.3.
<snip>

I'll comment only on the thing not yet pointed out...
int htoi(char *s)
{
char *t=s;
int x=0, y=1;
if(*s=='0' && (s[1]=='x' || s[1]=='X'))
t+=2;
s+=strlen(s);
while(t<=--s) {

This loop condition is likely to invoke undefined behaviour. Can you
see why?

No.
 
J

James Kuyper

Ben said:
Here is my solution to Exercise 2.3.
<snip>

I'll comment only on the thing not yet pointed out...
int htoi(char *s)
{
char *t=s;
int x=0, y=1;
if(*s=='0' && (s[1]=='x' || s[1]=='X'))
t+=2;
s+=strlen(s);
while(t<=--s) {
This loop condition is likely to invoke undefined behaviour. Can you
see why?

No.

For a string not prefixed by "0x" or "0X", what happens to s if t==s at
the time the loop condition is evaluated?
 
B

Ben Bacarisse

James Kuyper said:
Ben said:
(e-mail address removed) writes:

Here is my solution to Exercise 2.3.
<snip>

I'll comment only on the thing not yet pointed out...

int htoi(char *s)
{
char *t=s;
int x=0, y=1;
if(*s=='0' && (s[1]=='x' || s[1]=='X'))
t+=2;
s+=strlen(s);
while(t<=--s) {
This loop condition is likely to invoke undefined behaviour. Can you
see why?

No.

For a string not prefixed by "0x" or "0X", what happens to s if t==s
at the time the loop condition is evaluated?

Thanks. I was going to get back to this, it just took me a bit of
time...

If the OP needs any more hints: UB occurs whenever the function is
called with a pointer to the start of an object which does not begin
"0x" or "0X". htoi("-32" + 1) and htoi("0x32") are OK, but htoi("32")
is not.
 
L

Lew Pitcher

James Kuyper said:
Ben Bacarisse wrote:
(e-mail address removed) writes:
Here is my solution to Exercise 2.3.
<snip>
I'll comment only on the thing not yet pointed out...
int htoi(char *s)
{
char *t=s;
int x=0, y=1;
if(*s=='0' && (s[1]=='x' || s[1]=='X'))
t+=2;
s+=strlen(s);
while(t<=--s) {
This loop condition is likely to invoke undefined behaviour. Can you
see why?
No.
For a string not prefixed by "0x" or "0X", what happens to s if t==s
at the time the loop condition is evaluated?

Thanks. I was going to get back to this, it just took me a bit of
time...

If the OP needs any more hints: UB occurs whenever the function is
called with a pointer to the start of an object which does not begin
"0x" or "0X". htoi("-32" + 1) and htoi("0x32") are OK, but htoi("32")
is not.

Ben, not meaning to jiggle your elbow, but...

How is
htoi("-32" + 1)
different from
htoi("32");
?

Just trying to understand your point here
 
B

Ben Bacarisse

Lew Pitcher said:
James Kuyper said:
(e-mail address removed) wrote:
Ben Bacarisse wrote:
(e-mail address removed) writes:
Here is my solution to Exercise 2.3.
<snip>
I'll comment only on the thing not yet pointed out...
int htoi(char *s)
{
char *t=s;
int x=0, y=1;
if(*s=='0' && (s[1]=='x' || s[1]=='X'))
t+=2;
s+=strlen(s);
while(t<=--s) {
This loop condition is likely to invoke undefined behaviour. Can you
see why?

For a string not prefixed by "0x" or "0X", what happens to s if t==s
at the time the loop condition is evaluated?

Thanks. I was going to get back to this, it just took me a bit of
time...

If the OP needs any more hints: UB occurs whenever the function is
called with a pointer to the start of an object which does not begin
"0x" or "0X". htoi("-32" + 1) and htoi("0x32") are OK, but htoi("32")
is not.

Ben, not meaning to jiggle your elbow, but...

How is
htoi("-32" + 1)
different from
htoi("32");
?

Just trying to understand your point here

I was not trying to be cute and obscure. Here's my version of the
problem. I hope I have not made a mistake with all this!

In the call htoi("-32" + 1) s points one byte into an object so the
terminating condition of the loop 't <= --s' is met when s points at
the '-'. If the string has no "0x" prefix, the terminating condition
is met when s has been decremented to make a pointer that points
before the start of the object. This, I think, is not allowed.

--s is defined to mean the same as s -= 1 and s -= 1 is defined to
mean the same as s = s - 1 (but the lvalue s is not evaluated twice).
Thus section 6.5.6 (which describes + and -) governs what pointer
arithmetic you can do with --. Paragraph 8 says that the result is
well defined only when the resulting pointer points into, or just
past, the object. You don't have to de-reference the pointer to get
UB -- the arithmetic alone is enough (although * applied to the "just
past the end" pointer is also undefined).
 
J

jameskuyper

Lew said:
Ben, not meaning to jiggle your elbow, but...

How is
htoi("-32" + 1)
different from
htoi("32");

Trace through the original code; the '-' is recognized as an invalid
character, resulting in an early exit from the loop just one step
before it would otherwise have had undefined behavior.
 
L

Lew Pitcher

I was not trying to be cute and obscure. Here's my version of the
problem. I hope I have not made a mistake with all this!

In the call htoi("-32" + 1) s points one byte into an object so the
terminating condition of the loop 't <= --s' is met when s points at
the '-'. If the string has no "0x" prefix, the terminating condition
is met when s has been decremented to make a pointer that points
before the start of the object. This, I think, is not allowed.

Thanks, Ben. I knew I was missing something.
 
B

Ben Bacarisse

Trace through the original code; the '-' is recognized as an invalid
character, resulting in an early exit from the loop just one step
before it would otherwise have had undefined behavior.

Not quite. The function copies 's' to 't' and then loops 'while (t <=
--s)' so the inner test for invalid characters never sees the '-' in a
call like htoi("-32" + 1). The +1 simply avoids the UB by passing a
pointer that is not the start of an object. htoi("-32") would avoid
the UB for exactly the reasons you give.

The big point here is that running pointer loops backwards is tricky
and needs extra care because there is no special right to "go off the
end" (even by one) in that direction.
 
R

rajash

James said:
For a string not prefixed by "0x" or "0X", what happens to s if t==s at
the time the loop condition is evaluated?

Then --s will evaluate to the address one before the pointer
originally passed to the function. This would only cause a problem if
the original pointer was to address 0, but in that case it would be a
NULL pointer, which isn't allowed (the argument given to htoi must be
a STRING).
 
F

Flash Gordon

Then --s will evaluate to the address one before the pointer
originally passed to the function. This would only cause a problem if
the original pointer was to address 0,

Incorrect. Calculating a pointer to before the object you started in (as
in this case) ALWAYS invokes undefined behaviour. On a lot of
implementations (probably most) it does what the programmer would
expect, but bounds checking implementations that detect this and abort
the program are not only legal but useful.
but in that case it would be a
NULL pointer,

Not necessarily true. The null pointer might not be all bits 0 (i.e. an
address or 0) and a pointer with all bits 0 could well be a valid
pointer (i.e. not a null pointer). There are even sensible reasons why
one might do this.

A 0 in the source code used in a pointer context is another matter, and
the implementation has to do whatever magic is required to make it work,
even if that means "int *p = 0;" does not set p to all bits 0.
which isn't allowed (the argument given to htoi must be
a STRING).

Actually, it must be a pointer to the first character of a string, but
you were almost correct on this bit ;-)
 
K

Keith Thompson

Then --s will evaluate to the address one before the pointer
originally passed to the function. This would only cause a problem if
the original pointer was to address 0, but in that case it would be a
NULL pointer, which isn't allowed (the argument given to htoi must be
a STRING).

No, if the passed pointer points to the beginning of an object (as
opposed to the "-32"+1 example elsethread), then attempting to point
before the beginning of the object invokes undefined behavior. For
example, given:

char *s = "hello";

just evaluating (s-1), even without attempting to dereference it,
invokes UB.

Furthermore, address 0 is not necessarily the same as a null pointer.
(The FAQ, www.c-faq.com, has an entire section on null pointers.)

It may be that some implementations will happen to behave as you say,
but that's not guaranteed by the standard.
 
R

rajash

Keith said:
No, if the passed pointer points to the beginning of an object (as
opposed to the "-32"+1 example elsethread), then attempting to point
before the beginning of the object invokes undefined behavior. For
example, given:

char *s = "hello";

just evaluating (s-1), even without attempting to dereference it,
invokes UB.

There are two ways to think about pointers. It's true that (s-1) isn't
a "valid" pointer to char, because we don't (necessarily) own the
memory before s.

However, if you think about a pointer as just an address in memory,
then s-1 makes sense - it's just the location in memory before s.
Furthermore, address 0 is not necessarily the same as a null pointer.
(The FAQ, www.c-faq.com, has an entire section on null pointers.)

It may be that some implementations will happen to behave as you say,
but that's not guaranteed by the standard.

I think you are mistaken - I've definitely read that char *p=NULL; and
char *p=0; are completely equivalent.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top