x86 binary runs; x86_64 binary throws segfault

D

Don

Hello, folks. I have two boxes, both running Debian GNU/Linux 5,
kernel 2.6.26, and gcc version 4.3.2. One box is an AMD64 (obviously,
the x86_64 architecture), and one is an Intel i686.

I have a very simple program designed to a base string, which is a
numeral in base 12, supplied at the command line or from standard
input, into decimal, then send it back to the calling function as a
double. Very simple, under 200 lines.

Before converting the string to a decimal double, I test whether the
string is written in exponential notation, and if so, send it to a
function to convert it out of exponential notation. This consists of
juggling the integral marker (hard-coded as a semicolon, at this
point) back or forward in the string depending on the value of the
exponent, cutting off the exponent in the process. It was a hairier
problem than I expected, and it's not pretty, but it works *almost*
every time. I know that the problem is in this function because no
errors occur when the string is *not* in exponential notation, and
this function is therefore never called.

It works properly on all tested strings sent from the command line.

When I supply the strings from standard input, however, some strange
things happen. They only happen on the second string sent through the
function. The errors do *not* occur if I send a string that isn't in
exponential notation, which makes me think the problem is in the
exponential-handling function.

If the first two strings sent through stdin are "X44;4" (decimal
1492.3repeating) and "X;444e2" (the same number), the first is
converted properly and the second is converted to 17908.0000, which
about the proper number multiplied by twelve (not quite, but very
close). All subsequent passes work fine, and the same input in any
place but the second time works fine. "X44;4" does *not* fail when
passed through second.

If the first two strings sent through stdin are "X;3" (10.25 decimal)
and "0;x3e1" (the same number), the first is converted properly and
the second throws a segfault. Entering "0;x3e1" in any other place
(as in, not second from standard input) works perfectly fine,
producing "10.25" as output predictably. "X;3" does *not* fail when
passed through second.

I stared at this for some time, then in desperation sshed over to my
i686 box, recompiled the same code, and tried it there. Both of these
situations worked properly, without any issues.

I'm compiling with the following command line:

gcc -lm -o dec dec.c

I've googled the issue; in some places I'm told to try compiling with
the -arch x86_64 flag, and in other places I'm told that gcc should
automatically compile for x86_64 on an x86_64 system. I can't try it
for eight or nine more hours (I'm at work), so I thought I'd ask here
and see if perhaps the code itself is bad.

Here's the function at issue: it takes a pointer to the string to be
converted, plus an integer representing the point of the string where
the "e" is, computed at call time (I loop through the string looking
for an "e" or "E" to determine whether it needs to be called at all;
since I already had the loop index set for that when I called the
function, I thought it made sense to send it to the function rather
than let the function do the same loop again itself). Error checking
for the input has already occurred at this point, but it's proper
input that's causing this problem anyway. It calls "doztodec(char
*s)" to convert the part of the string that represents the exponent to
decimal; this function works fine on all input, and typically returns
a double. I'm putting it in a char because the exponent has to be
below 127 (my precision doesn't get anywhere like that high here), but
I understood that a cast would be done automatically in this case.
Should I make it explicit? I've sprinkled a lot of extra comments for
this posting to make it clearer what's going on.
*******
int expkill (char *s, int expspot)
{
int i,j;
char zensuf[3]; /* the base-12 exponent suffix */
char exp; /* the value of the suffix */
int semi; /* where semicolon is in original string */

for (i = expspot+1,j=0; *(s+i) != '\0'; ++i,++j)
*(zensuf+j) = *(s+i); /* put the exponent in zensuf */
exp = doztodec(zensuf); /* convert the exponent to decimal char */
for (semi=0; *(s+semi) != ';' && *(s+semi) != '\0'; ++semi); /* set
semi = semicolon's location */
*(s+expspot) = '\0'; /* cut off the "e" and the exponent */
if (semi >= strlen(s))
return 0; /* if the semicolon is already at the end of the string,
end processing */
if (exp > 0) { /* if the semicolon needs to be moved to the right */
for (i=semi; *(s+i+1) != '\0'; ++i)
*(s+i) = *(s+i+1); /* move everything over to get rid of original
semicolon */
*(s+i) = '\0'; /* move string terminator to new end of string */
for (i=strlen(s); i > semi+exp; --i)
*(s+i) = *(s+i-1); /* move the rest of string over to make room for
new semi */
*(s+semi+exp) = ';'; /* insert the new semicolon */
} else if (exp < 0) { /* if the semicolon needs to be moved to the
left */
exp = -exp, ++exp;
for (i=strlen(s)+exp; i >= 0; --i)
*(s+i) = *(s+i-exp); /* move string to right to make room for new
digits */
for (i=0; i < exp; ++i)
*(s+i) = '0'; /* pad left side of string with zeroes */
for (semi=0; *(s+semi) != ';'; ++semi); /* find location of
semicolon */
for (i=semi; *(s+i) != '\0'; ++i)
*(s+i) = *(s+i+1); /* move over to eliminate original semicolon */
*(s+i) = '\0'; /* insert new end of string */
*(s+1) = ';'; /* put in new semicolon */
}
if (*(s+strlen(s)-1) == ';')
*(s+strlen(s)-1) = '\0'; /* if there's no fractional part now, don't
have semicolon */
return 0;
}
*******

The contents of the file that produces erroneous input is:
X44;4
X;444e2

The contents of the file that throws a segfault is:
X;3
0;X3e1

Thanks to anyone who can help me figure this out.
 
E

Eric Sosman

Hello, folks.[...]

Could you post a short, complete program that demonstrates
the problem? Based on your report that "it works" with strings
entered on the command line but fails when input is read from a
stream, I have a sneaking suspicion that the problem is in the
way the strings are read. (If you're using fgets(), did you
remember that the '\n' at the end of the line is still there?)

Also, a little more information on your duodecimal notation
would be welcome. It seems that 'X' or 'x' stands for the
duodecimal digit ten, and I'm going to guess that 'Y'/'y' mean
eleven -- but that doesn't seem to be spelled out anywhere.
 
D

Don

Hello, folks.[...]

     Could you post a short, complete program that demonstrates
the problem?

Hmm. Not for several more hours, at least; I don't have a compiler
here at work, so I couldn't ensure that what I posted would work.
Plus, to get it really working I'd probably have to include all the
code.

I'm not using fgets(), though; I'm using a home-rolled getline(),
because I'm working through K&R and it's comfortably familiar to me.

Yes, 'X' or 'x' would work for ten; also 'T', 't', 'A', or 'a'. I
tend to use 'X' myself. Eleven is 'E', 'B', or 'b' (not 'e', as that
would conflict with the exponents). I didn't think it would be
relevant since whenever this expkill() function is not called,
everything works great; and indeed, whenever expkill() *is* called, it
all works great except for the second item read in.

I'll go ahead and humiliate myself and post the entire code. I
haven't worked in error checking yet, but since it's good input that's
breaking it I don't suppose that matters at this point.

*******
int main(int argc, char *argv[])
{
int c;
char doznum[MAXLINE];

while (*(++argv) != NULL && *argv[0] == '-') {
while (c = *++argv[0])
switch (c) {
default:
fprintf(stderr,"dec: illegal option \"%c\"\n",c);
break;
}
}
if (*(++argv) == NULL) {
printf("%f\n",doztodec(*(argv-1)));
return 0;
}
while (getline(doznum,MAXLINE) != EOF) {
printf("%f\n",doztodec(doznum));
}
return 0;
}

/* convert exponential notation to normal b/f processing */
int expkill (char *s, int expspot)
{
int i,j;
char zensuf[3];
char exp;
int semi; /* where semicolon is */

for (i = expspot+1,j=0; *(s+i) != '\0'; ++i,++j)
*(zensuf+j) = *(s+i);
exp = doztodec(zensuf);
for (semi=0; *(s+semi) != ';' && *(s+semi) != '\0'; ++semi);
*(s+expspot) = '\0';
if (semi >= strlen(s))
return 0;
if (exp > 0) {
for (i=semi; *(s+i+1) != '\0'; ++i)
*(s+i) = *(s+i+1);
*(s+i) = '\0';
for (i=strlen(s); i > semi+exp; --i)
*(s+i) = *(s+i-1);
*(s+semi+exp) = ';';
} else if (exp < 0) {
exp = -exp, ++exp;
for (i=strlen(s)+exp; i >= 0; --i)
*(s+i) = *(s+i-exp);
for (i=0; i < exp; ++i)
*(s+i) = '0';
for (semi=0; *(s+semi) != ';'; ++semi);
for (i=semi; *(s+i) != '\0'; ++i)
*(s+i) = *(s+i+1);
*(s+i) = '\0';
*(s+1) = ';';
}
if (*(s+strlen(s)-1) == ';')
*(s+strlen(s)-1) = '\0';
return 0;
}

double doztodec(char *s)
{
char frac[MAXDIGS];
int i, j;
int endpoint;
double decnum = 0.0;

for (i=0; *(s+i) != '\0'; ++i)
if (*(s+i) == 'e' || *(s+i) == 'E')
expkill(s,i);
for (i=0; *(s+i) != ';'; ++i);
endpoint = i;
if (endpoint < (strlen(s) - 1)) {
for (i=++i,j=0; *(s+i) != '\0'; ++i,++j)
*(frac+j) = *(s+i);
*(s+endpoint) = *(frac+j) = '\0';
decnum += fracdec(frac);
}
decnum += wholedec(s);
return decnum;
}

double wholedec(char *s)
{
int addnum = 0; /* added to decnum each loop */
int i;
int multiple = 1; /* provides multiple for loop */
double decnum = 0.0; /* final result to return */
char sign = 0;

reverse(s);
if (*(s+strlen(s)-1) == '-') {
sign = 1;
*(s+strlen(s)-1) = '\0';
}
for (i=0; *(s+i) != '\0'; ++i) {
addnum = decmify(*(s+i));
addnum *= multiple;
decnum += addnum;
multiple *= 12;
}
return (sign == 0) ? decnum : decnum * -1;
}

double fracdec(char *s)
{
int i;
double divisor = 1.0;
double fracval = 0.0;

for (i=0; *(s+i) != '\0'; ++i)
divisor *= 12.0;
fracval = wholedec(s) / divisor;
return fracval;
}

char decmify(char c)
{
switch (c) {
case '0': return 0; break;
case '1': return 1; break;
case '2': return 2; break;
case '3': return 3; break;
case '4': return 4; break;
case '5': return 5; break;
case '6': return 6; break;
case '7': return 7; break;
case '8': return 8; break;
case '9': return 9; break;
case 'X': case 'x': case 'T': case 't': case 'A': case
'a':
return 10; break;
case 'E': case 'B': case 'b':
return 11; break;
default: return 0; break;
}
}

void reverse(char *s)
{
int i, j;
char tmp;

for (i=0, j=strlen(s)-1; i<j; ++i, --j) {
tmp = *(s+i);
*(s+i) = *(s+j);
*(s+j) = tmp;
}
}

int getline(char *s, int lim)
{
int c, i;

for (i=0; (c=getchar())!=EOF && c!='\n' && c!='\t' && c!=' '; ++i) {
if (i < lim-1)
*(s+i) = c;
else if (i == lim)
*(s+i) = '\0';
}
if (c == EOF)
return EOF;
*(s+i) = '\0';
return i;
}
*******
Input file that returns wrong output:
*******
X44;4
X;444e2
*******
Input file that throws a segfault:
*******
X;3
0;X3e1
*******
These errors only show up on my AMD64 box, not on my x86 box. The
same numbers that throw errors when on the second line of the input
file do not when on any other line.

Thanks again.
 
D

Don

Naturally, forgot to post all the pre-main stuff:
*******
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<math.h>

#define MAXDIGS 30
#define MAXLINE 30

int getline(char *s, int lim);
void reverse(char *s);
char decmify(char c);
double wholedec(char *s);
double doztodec(char *s);
double fracdec(char *s);
int expkill (char *s, int expspot);
*******
That, plus the rest of it. Thanks again.
 
J

James Dow Allen

I'll go ahead and humiliate myself and post the entire code.

The foolishest question is the unasked one.
Nevertheless I'd suggest, as a later exercise,
trying to simplify this code.
/* convert exponential notation to normal b/f processing */
int expkill (char *s, int expspot)
{
        int i,j;
        char zensuf[3];
        char exp;
        int semi; /* where semicolon is */

        for (i = expspot+1,j=0; *(s+i) != '\0'; ++i,++j)
                *(zensuf+j) = *(s+i);
        exp = doztodec(zensuf);

I'd not be surprised if your error is right here! The for loop
does NOT copy the '\0' character, but doztodec() depends on
it. Since uninitialized bytes in zensuf[] will vary based
on "irrelevant" details, this would tend to explain
your symptoms.

By the way, when in doubt, use asserts or printf's to check
what's going on. For example a
fprintf(stderr, "Expkill %d %d\n", strlen(s), expspot);
at the beginning of expkill() would error-check its args
(though not the actual problem here).
        [several more lines snipped]

I didn't study the whole code. I just assumed that, were I
to find the error before patience ran outit would
be at beginning of examined code. :)

James Dow Allen
 
B

Ben Bacarisse

Don said:
Hello, folks.[...]

     Could you post a short, complete program that demonstrates
the problem?
I'll go ahead and humiliate myself and post the entire code. I
haven't worked in error checking yet, but since it's good input that's
breaking it I don't suppose that matters at this point.

*******
int main(int argc, char *argv[])
{
int c;
char doznum[MAXLINE];

while (*(++argv) != NULL && *argv[0] == '-') {
while (c = *++argv[0])
switch (c) {
default:
fprintf(stderr,"dec: illegal option \"%c\"\n",c);
break;
}
}
if (*(++argv) == NULL) {
printf("%f\n",doztodec(*(argv-1)));
return 0;
}
while (getline(doznum,MAXLINE) != EOF) {
printf("%f\n",doztodec(doznum));
}
return 0;
}

/* convert exponential notation to normal b/f processing */
int expkill (char *s, int expspot)
{
int i,j;
char zensuf[3];
char exp;
int semi; /* where semicolon is */

for (i = expspot+1,j=0; *(s+i) != '\0'; ++i,++j)
*(zensuf+j) = *(s+i);

The string in zensuf is not null terminated. You stop when "see" the
null but you never put it into the array.
exp = doztodec(zensuf);

This call expects a proper string -- one the is null terminated.
for (semi=0; *(s+semi) != ';' && *(s+semi) != '\0'; ++semi);
*(s+expspot) = '\0';
if (semi >= strlen(s))
return 0;
if (exp > 0) {
for (i=semi; *(s+i+1) != '\0'; ++i)
*(s+i) = *(s+i+1);
*(s+i) = '\0';
for (i=strlen(s); i > semi+exp; --i)
*(s+i) = *(s+i-1);
*(s+semi+exp) = ';';
} else if (exp < 0) {
exp = -exp, ++exp;
for (i=strlen(s)+exp; i >= 0; --i)
*(s+i) = *(s+i-exp);
for (i=0; i < exp; ++i)
*(s+i) = '0';
for (semi=0; *(s+semi) != ';'; ++semi);
for (i=semi; *(s+i) != '\0'; ++i)
*(s+i) = *(s+i+1);
*(s+i) = '\0';
*(s+1) = ';';
}
if (*(s+strlen(s)-1) == ';')
*(s+strlen(s)-1) = '\0';
return 0;
}

double doztodec(char *s)
{
char frac[MAXDIGS];
int i, j;
int endpoint;
double decnum = 0.0;

for (i=0; *(s+i) != '\0'; ++i)

See? There many be no null.
if (*(s+i) == 'e' || *(s+i) == 'E')
expkill(s,i);
for (i=0; *(s+i) != ';'; ++i);
endpoint = i;
if (endpoint < (strlen(s) - 1)) {
for (i=++i,j=0; *(s+i) != '\0'; ++i,++j)
*(frac+j) = *(s+i);
*(s+endpoint) = *(frac+j) = '\0';
decnum += fracdec(frac);
}
decnum += wholedec(s);
return decnum;
}

This is not magic or quick eyes. I compiled with -g, ran it under
valgrind and followed the logic to see what was wrong. i don't know
how I managed without valgrind! There may be other errors, but deal
with them one at a time.

BTW, a quick tip: Your code will loads easier to read is every *(x+e)
was replaced by x[e]. Most people expect to see pointers indexed with
[] and it is much easier to read.
 
D

Don

The foolishest question is the unasked one.
Nevertheless I'd suggest, as a later exercise,
trying to simplify this code.

Yes! I'm actually fairly happy with most of it, other than the expkill
() mess. But that is definitely beyond ugly. There must be a better
way to do that.
I'd not be surprised if your error is right here!  The for loop
does NOT copy the '\0' character, but doztodec() depends on
it.  Since uninitialized bytes in zensuf[] will vary based
on "irrelevant" details, this would tend to explain
your symptoms.

Hmm. Yes, I'll fix that; funny I missed it, when generally I end up
putting '\0's where they already are just to be sure. But would that
really explain why it fails consistently on the second call, and only
the second call, on x86_64 and not on i686?

@Ben Bacarisse:

I'd never heard of valgrind before today; I'll be sure to look at it.
I just recently discovered Beej's Quick Guide to GDB, which looks like
it'll help me a lot, too.
BTW, a quick tip: Your code will loads easier to read is every *(x+e)
was replaced by x[e]. Most people expect to see pointers indexed with
[] and it is much easier to read.

Hmm. When I went through the pointers chapter in K&R, it advised me
to go through my old programs with the array indexing and change them
to pointers, as it would be faster that way, though harder to read "at
least for the uninitiated."

Because you are clearly initiated, I question this now. Is array
indexing more normal? That would be annoying for me, since I've
finally gotten into the habit of using the pointer notation, but I'd
rather the code be maintainable.
 
M

Mark

Ben Bacarisse said:
This is not magic or quick eyes. I compiled with -g, ran it under
valgrind and followed the logic to see what was wrong. i don't know
how I managed without valgrind! There may be other errors, but deal
with them one at a time.

BTW, a quick tip: Your code will loads easier to read is every *(x+e)
was replaced by x[e]. Most people expect to see pointers indexed with
[] and it is much easier to read.

To add to Ben's good advice, and despite the fact you may get away
with it, this isn't a good idea:
^^^^^
 
F

Fred

I'll go ahead and humiliate myself and post the entire code.

The foolishest question is the unasked one.
Nevertheless I'd suggest, as a later exercise,
trying to simplify this code.
/* convert exponential notation to normal b/f processing */
int expkill (char *s, int expspot)
{
        int i,j;
        char zensuf[3];

Note the size for zensuf


and what happens above if j > 2 ?
 
L

lawrence.jones

Don said:
Hmm. When I went through the pointers chapter in K&R, it advised me
to go through my old programs with the array indexing and change them
to pointers, as it would be faster that way, though harder to read "at
least for the uninitiated."

Because you are clearly initiated, I question this now. Is array
indexing more normal? That would be annoying for me, since I've
finally gotten into the habit of using the pointer notation, but I'd
rather the code be maintainable.

You misunderstood K&R's advice. What they were suggesting is that it's
usually faster to walk a pointer through an array rather than using
subscripts. For example, instead of:

for (i = 0; i < n; i++)
a += 10;

do:

for (p = a; p < a + n; p++)
*p += 10;

The notation is irrelevant -- *(a + i) is exactly equivalent to a,
but a is usually much easier to read.
 
D

Don

Mark said:
To add to Ben's good advice, and despite the fact you may get away
with it, this isn't a good idea:


                      ^^^^^

Indeed; it's shockingly bad, bad enough that even I can see it's bad.
It does work, of course, but much better would be i=i+1. Which is
what I'm changing it to.
 
D

Don

   Note the size for zensuf


 and what happens above if j > 2 ?

Nothing; it can't be. I'm using doubles here, and I can't get
precision with more than two digits worth of decimal places. That's
why I'm storing it in a char, too; it'll never get anywhere near 127.
As I mentioned, I haven't put in error checks yet, but I haven't been
testing with that yet, either.

Is it even possible to get more precision than that with C?
 
D

Don

The string in zensuf is not null terminated.  You stop when "see" the
null but you never put it into the array.


This call expects a proper string -- one the is null terminated.

So all I need to do to fix this bizarre problem about the second call,
and only the second call, failing when every other call, even with the
same input, succeeds is null terminate the string?
 
J

James Dow Allen

        char zensuf[3];
 and what happens above if j > 2 ?

But j isn't > 2 :) :)

I do agree with you though, Fred, and if feeling
verbose would have offered:
Is your memory so tight that a red-zoned zensuf[5] would
be intolerable wastage?

But then I'd be afraid of followups like
Why not just zensuf[100] and get it over with?
(Hopefully no pedant would invoke malloc() for this trivia!)

Meanwhile, this would have given Eric Sosman the
chance to mention Pascal's wager! :)

James
 
D

Don

  Is your memory so tight that a red-zoned zensuf[5] would
  be intolerable wastage?

But then I'd be afraid of followups like>>  Why not just  zensuf[100] and get it over with?

That's exactly what I'm afraid of. I'm a C dilettante, really, just
getting my feet wet here, and I want to be as strict with myself as
possible. That way I won't go reserving 100 bytes for a string that
can't possibly be more than three (since more than three would produce
a meaningless result in the end anyway).

Frankly, I'm failing at that, too. I use "int" as a return value for
all my functions, even when I don't need a return value at all, and I
rarely *need* an int; a char would do just fine. But when I get the
right results from good input, I'll start working on what to do with
bad input; I just don't want to spend a lot of time error-checking if
I'm going to have to rewrite all the algorithms and then change the
error code anyway.
 
E

Eric Sosman

I'll go ahead and humiliate myself and post the entire code. I
haven't worked in error checking yet, but since it's good input that's
breaking it I don't suppose that matters at this point.

Oh, what a tangled web we weave
When first we practise to de-C-eive!

Don't feel bad; everyone's a beginner at some point. I'm
not going to try to debug the whole thing for you, but I'll
point out a few errors, awkwardnesses, and opportunities for
simplification (which usually means improvement). My overall
impression is that you're working much too hard ...
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<math.h>

#define MAXDIGS 30
#define MAXLINE 30

int getline(char *s, int lim);
void reverse(char *s);
char decmify(char c);
double wholedec(char *s);
double doztodec(char *s);
double fracdec(char *s);
int expkill (char *s, int expspot);
>
int main(int argc, char *argv[])
{
int c;
char doznum[MAXLINE];

while (*(++argv) != NULL&& *argv[0] == '-') {
while (c = *++argv[0])
switch (c) {
default:
fprintf(stderr,"dec: illegal option \"%c\"\n",c);
break;
}
}

Not a terribly helpful start, is it? "Find the command-line
arguments beginning with '-' and ignore them, except to issue
complaints about any other characters they contain ..."

Anyhow: At the end of this loop, argv points to a pointer
to the first non-'-' argument, or to a NULL pointer if there
are no such arguments.
if (*(++argv) == NULL) {

... and you immediately move onward, not having bothered
to test whether an "onward" even exists! If someone runs the
program with no non-'-' command-line arguments, this test takes
you to Never-Never Land because you're trying to inspect a pointer
that isn't in the argv array, has no defined value, and may
not even exist!
printf("%f\n",doztodec(*(argv-1)));

Here and below, it might be a good idea to print the string
*and* the converted value, as a sanity check. (Since the string
gets destroyed in the process of conversion, print it before
calling doztodec() rather than after.)
return 0;
}

Okay (assuming one or more non-'-' arguments), but passing
strange. If there's exactly one argument, you convert it to a
number, print it, and exit. If there are two or more, you
ignore them all and proceed. Weird.
while (getline(doznum,MAXLINE) != EOF) {
printf("%f\n",doztodec(doznum));
}
return 0;
}

[In the original, expkill() appeared here. I've moved it
further down because I think the narrative flow works better
that way.]
double doztodec(char *s)
{
char frac[MAXDIGS];
int i, j;
int endpoint;
double decnum = 0.0;

for (i=0; *(s+i) != '\0'; ++i)

Throughout the program, readability would be improved if
you treated pointers like arrays and used array-index notation.
For example, `s' is easier to read than `*(s+i)', and means
exactly the same thing.
if (*(s+i) == 'e' || *(s+i) == 'E')
expkill(s,i);
for (i=0; *(s+i) != ';'; ++i);

If you ever get an input string that has no ';', this
is a Never-Never Land loop ... You should also acquaint
yourself with the strchr() function.
endpoint = i;
if (endpoint< (strlen(s) - 1)) {

A simpler test for "Is there anything after the semicolon?"
is to ask whether the next character is the end-of-string '\0'
or is something else: `if (s[endpoint+1] != '\0')'.
for (i=++i,j=0; *(s+i) != '\0'; ++i,++j)
*(frac+j) = *(s+i);

`strcpy(frac, s+endpoint)' is a lot simpler. It may or may
not do the same thing as your loop, because what you've written
has no defined behavior in the C language. (Hint: Look closely
at the first expression in the `for' statement.)
*(s+endpoint) = *(frac+j) = '\0';
decnum += fracdec(frac);

On the other hand, there doesn't seem to be a need to copy
the fraction digits at all. Instead, you could convert them
straight from the string: `fracdec(s+endpoint+1)' and forget
about frac[] altogether.
}
decnum += wholedec(s);
return decnum;
}

/* convert exponential notation to normal b/f processing */
int expkill (char *s, int expspot)
{
int i,j;
char zensuf[3];
char exp;
int semi; /* where semicolon is */

for (i = expspot+1,j=0; *(s+i) != '\0'; ++i,++j)
*(zensuf+j) = *(s+i);

You'd better hope there aren't more than three characters
after the 'e' ...
exp = doztodec(zensuf);

Actually, it's worse: doztodec() expects a '\0'-terminated
string, and you've provided no '\0' to terminate it. Who knows
how far doztodec() might wander through uncharted memory before
it finds a '\0' byte, or what kind of result it might return from
whatever garbage it stumbles across first?

But, as with frac[] above, there doesn't seem to be any need
to copy the exponent. Use doztodec(s+expspot+1), and convert it
right where it lies.
for (semi=0; *(s+semi) != ';'&& *(s+semi) != '\0'; ++semi);

Again, the strchr() function is your friend. Or wants to be
your friend, at any rate.

By the way, sticking the semicolon at the end of the same line
as its for is a good way to confuse people. A hasty reader may
well think that the next line is the for's body, when in fact the
for's body is that inconspicuous no-space between the ) and the ;.
Rather than make the empty body unnoticeable, highlight it with

for (...; ...; ...)
;

or even with

for (...; ...; ...)
continue;
*(s+expspot) = '\0';
if (semi>= strlen(s))
return 0;
if (exp> 0) {
for (i=semi; *(s+i+1) != '\0'; ++i)
*(s+i) = *(s+i+1);
*(s+i) = '\0';

The memmove() function is your friend. You should also
meet and greet the memcpy() function, even though you should not
use the latter in this instance.
for (i=strlen(s); i> semi+exp; --i)
*(s+i) = *(s+i-1);
*(s+semi+exp) = ';';

It seems what you're trying to do here is slide the semicolon
rightward to adjust for the exponent; is that it? Okay, fine:
You'll change "12;345e1=2" to 1234;5" and all's well. But what
will you do with "1;e4"? Slide four places right and you'll
slide right off the edge of the world ...
} else if (exp< 0) {
exp = -exp, ++exp;

"Simplify, simplify!" Try `exp = 1 - exp'.
for (i=strlen(s)+exp; i>= 0; --i)
*(s+i) = *(s+i-exp);

Again, memmove(). And again, what if there are too few
digits to accommodate the slide? "12;345e-1" gives "1;2345",
but what happens to "12;345e-6"?
for (i=0; i< exp; ++i)
*(s+i) = '0';
for (semi=0; *(s+semi) != ';'; ++semi);
for (i=semi; *(s+i) != '\0'; ++i)
*(s+i) = *(s+i+1);
*(s+i) = '\0';
*(s+1) = ';';
}
if (*(s+strlen(s)-1) == ';')
*(s+strlen(s)-1) = '\0';
return 0;

As an alternative (and far simpler) approach, consider
a different way of combining the base number and the exponent.
Rather than doing all this string-bashing to slide the ';'
back and forth (and worrying about whether there's enough
room), stop and think: the exponent part multiplies the base
part by some power of twelve. Read up on the pow() function,
and see if anything suggests itself.

Caveat: Because floating-point arithmetic usually involves
rounding errors, the results you get from the string-bashing and
arithmetical versions are likely to be slightly different. For
example, 1;2e1 means 12; means exactly fourteen, but if you
instead compute 1;2 ~= 1.1666666667 and then multiply by 12 you
will probably get something just a little different. This may
or may not be a problem for your application.
}

double wholedec(char *s)
{
int addnum = 0; /* added to decnum each loop */
int i;
int multiple = 1; /* provides multiple for loop */
double decnum = 0.0; /* final result to return */
char sign = 0;

reverse(s);
if (*(s+strlen(s)-1) == '-') {
sign = 1;
*(s+strlen(s)-1) = '\0';
}
for (i=0; *(s+i) != '\0'; ++i) {
addnum = decmify(*(s+i));
addnum *= multiple;
decnum += addnum;
multiple *= 12;
}

This is much more complicated than it needs to be. You
could do without reverse() altogether by starting at the end
of the string and working backward instead of reversing the
string and then marching forward. Even easier, you could
work from left to right with (pseudocode)

decnum = 0
for each digit d
decnum = decnum * 12 + decmify(d)
return (sign == 0) ? decnum : decnum * -1;
}

double fracdec(char *s)
{
int i;
double divisor = 1.0;
double fracval = 0.0;

Pointless initialization: The initial value will never be
used, so you might as well initialize to `-2.71828'. Or to
nothing. As it is, specifying an initial value fools the reader
into thinking that the value is important -- when in fact it is
completely UNimportant.
for (i=0; *(s+i) != '\0'; ++i)
divisor *= 12.0;
fracval = wholedec(s) / divisor;
return fracval;

As it turns out, not only is the initial value of fracval
unimportant, but the variable itself is just about useless.
You could write `return wholedec(s) / divisor;' and get rid
of fracval altogether. This will not make your code any more
or less efficient, since a decent compiler will do the equivalent
of discarding fracval anyhow. But by removing irrelevant material
(I recently saw Kevin Kline as Cyrano de Bergerac, and although
the production played more for laughs than for romance the romance
was by no means wanting -- but I digress), you make it easier for
the reader (Jose Ferrer was good, too) to follow what's afoot.

Source code is written for two audiences: Compilers and
programmers. The latter are the more important; write with
their needs foremost in your mind.
}

char decmify(char c)
{
switch (c) {
case '0': return 0; break;
case '1': return 1; break;
case '2': return 2; break;
case '3': return 3; break;
case '4': return 4; break;
case '5': return 5; break;
case '6': return 6; break;
case '7': return 7; break;
case '8': return 8; break;
case '9': return 9; break;

You might simplify things a little by taking advantage of
the fact that the codes for '0' through '9' are consecutive
and ascending. Hence, if c is one of those digits, c-'0' is
its numerical value.
case 'X': case 'x': case 'T': case 't': case 'A': case
'a':
return 10; break;
case 'E': case 'B': case 'b':
return 11; break;

Strange choice of digits, but that's entirely up to you.
Given 'X' and 'x' as ten, I'd also have expected 'Y' and 'y'
as eleven -- but no. Also, it appears the 'E' case (and what
about 'e', by the way?) can appear only in the exponent, for
an input like "12;345eE".
default: return 0; break;

You might want to consider returning an "impossible" value
here, something like -1, perhaps. (On the other hand, the rest
of the program doesn't seem to pay any attention to possible
anomalies detected here, but just plows blindly ahead with
whatever value comes back.)
}
}

void reverse(char *s)
{
int i, j;
char tmp;

for (i=0, j=strlen(s)-1; i<j; ++i, --j) {
tmp = *(s+i);
*(s+i) = *(s+j);
*(s+j) = tmp;
}
}

int getline(char *s, int lim)
{
int c, i;

for (i=0; (c=getchar())!=EOF&& c!='\n'&& c!='\t'&& c!=' '; ++i) {

Treating tabs and spaces as "line ends" seems odd. You might
want to rename the function to getword() or some such, since "line"
almost always means '\n'-terminated to a C programmer.
if (i< lim-1)
*(s+i) = c;
else if (i == lim)
*(s+i) = '\0';

Note that this *always* stores beyond the end of the array.
You should either change the call to pass MAXLINE-1 or (better,
IMHO) change the way you handle lim in this function.
}
if (c == EOF)
return EOF;
*(s+i) = '\0';

Oh, darn! If i reached lim, the loop above was careful to
stop storing characters (except for the off-by-one error). But
after the loop is finished, you store to s anyhow. When
i==lim+12345, it'll all end in tears ...
return i;
}

As I said at the start, I haven't tried to find every single
bug -- I haven't even tried to find the exact cause(s) of the
bugs you reported. I've spotted a few that may (or may not) be
to blame, and I've pointed out some things you could simplify.
I think that if you undertake some of the simplifications, the
resulting code may be easier to debug than the original. Good luck!
 
S

Seebs

That's exactly what I'm afraid of. I'm a C dilettante, really, just
getting my feet wet here, and I want to be as strict with myself as
possible. That way I won't go reserving 100 bytes for a string that
can't possibly be more than three (since more than three would produce
a meaningless result in the end anyway).

That's not a useful kind of strictness, IMHO.

Basically, if you want to be strict, be strict about being cautious and
ensuring that you have buffers against problems. It is usually unwise
to assume that you know what input might yield meaningful results after
future changes...
Frankly, I'm failing at that, too. I use "int" as a return value for
all my functions, even when I don't need a return value at all, and I
rarely *need* an int; a char would do just fine.

But "int" is almost always better, because it's the native thing -- on
many targets, using char would hurt performance significantly.

-s
 
M

Mark

Don said:
Indeed; it's shockingly bad, bad enough that even I can see it's bad.
It does work, of course, but much better would be i=i+1. Which is
what I'm changing it to.

Unless you are specifically working on your use of arrays and pointers,
you might want to simplify the way you approach this. If you use some
of the standard functions, it becomes much more readable.

This is a quick and dirty example (for concept rather than style). It
takes a copy of the string (let's say "X;444e2") and:

- Takes a copy so it's non-destructive;
- Breaks off the exponent and keeps track of it's value which
it obtains by calling itself* leaving the copy as "X;444";
- Finds where the ';' is, remembers where it is and removes it
to leave the copy as "X444";
- Calculates the value of the number;
- Corrects for the ';' and exponent.

It's much shorter and simpler.

No doubt it will now be picked apart for some horrible error or its
style. ;-)

double doz2dec(char *s)
{
char *copy;
char *ptr;
int exp=0;
int offset=0;
double value;

/* Use a copy to preserve original */
copy = malloc(strlen(s)+1);
if (!copy) {
perror("Couldn't find enough memory");
exit(EXIT_FAILURE);
}
memcpy(copy, s, strlen(s)+1);

/* Find the exponent and remove
* from the copy to leave the
* number */
if ((ptr = strchr(copy, 'e')) || (ptr = strchr(copy, 'E'))) {
*ptr = '\0';
exp = doz2dec(ptr+1);
}

/* Remove ';' whilst remembering
offset */
if ((ptr = strchr(copy, ';'))) {
offset = strlen(copy) - (ptr-copy) - 1;
memmove(ptr, ptr+1, offset+1);
}

/* Calculate value */
value = 0.0L;
for (ptr=copy; *ptr; ptr++) {
value = (value * 12) + decmify(*ptr);
}

/* Adjust for exponent and
offset */
if (offset-exp) {
value /= pow(12, offset-exp);
}

free(copy); /* Don't need the copy any more */

return value;
}
 
R

Richard Tobin

It does work, of course

There's no of course about it. An implementation might work out
what the incremented value will be, assign it to i, and then do the
increment.

-- Richard
 
D

Don

Eric, your critique was very eye-opening. Thank you very sincerely.
I've taken a few minutes and implemented many of your changes; the
functions in the standard libraries you've suggested will have to wait
until later.

     Not a terribly helpful start, is it?  "Find the command-line
arguments beginning with '-' and ignore them, except to issue
complaints about any other characters they contain ..."

Placeholders, for when I do include options. I have a few I plan to
work in. I've got a converter that goes in the other direction with
arguments concerning output format and desired precision; I'll
implement the same options here, once I get the core conversion
working.
     ... and you immediately move onward, not having bothered
to test whether an "onward" even exists!  If someone runs the
program with no non-'-' command-line arguments, this test takes
you to Never-Never Land because you're trying to inspect a pointer
that isn't in the argv array, has no defined value, and may
not even exist!

Is it possible to run a program with no non-'-' arguments? The
program name will always be there, won't it? Of course, I see that I
skip over that anyway with the initial ++argv in the while loop.

I suppose I should just test it "if (*(argv) == NULL)" then?
     If you ever get an input string that has no ';', this
is a Never-Never Land loop ...  You should also acquaint
yourself with the strchr() function.

Which is weird, because it works anyway. But clearly I've just been
lucky; I've changed it to:

for (i=0; *(s+i) != ';' && *(s+i) != '\0'; ++i);

(I haven't gotten to changing all the pointer notation yet, so I wrote
it that way to be consistent. I'll get to it.)
     A simpler test for "Is there anything after the semicolon?"
is to ask whether the next character is the end-of-string '\0'
or is something else: `if (s[endpoint+1] != '\0')'.

Much simpler, no function call overhead. Fixed.
           for (i=++i,j=0; *(s+i) != '\0'; ++i,++j)
                   *(frac+j) = *(s+i);

     `strcpy(frac, s+endpoint)' is a lot simpler.  It may or may
not do the same thing as your loop, because what you've written
has no defined behavior in the C language.  (Hint: Look closely
at the first expression in the `for' statement.)
           *(s+endpoint) = *(frac+j) = '\0';
           decnum += fracdec(frac);

     On the other hand, there doesn't seem to be a need to copy
the fraction digits at all.  Instead, you could convert them
straight from the string: `fracdec(s+endpoint+1)' and forget
about frac[] altogether.

Yes, and one less string I'll have to test for overflow. frac[] is
eliminated; fixed.
     You'd better hope there aren't more than three characters
after the 'e' ...

There can't be, as I understand it, because that would mean much more
precision than the data type can handle. I'll write in error checking
on it later.
     Actually, it's worse: doztodec() expects a '\0'-terminated
string, and you've provided no '\0' to terminate it.  Who knows
how far doztodec() might wander through uncharted memory before
it finds a '\0' byte, or what kind of result it might return from
whatever garbage it stumbles across first?

Yes; fixed that by giving zensuf a '\0'.
     But, as with frac[] above, there doesn't seem to be any need
to copy the exponent.  Use doztodec(s+expspot+1), and convert it
right where it lies.

Which also saves yet another loop. Thanks; fixed.
     It seems what you're trying to do here is slide the semicolon
rightward to adjust for the exponent; is that it?  Okay, fine:
You'll change "12;345e1=2" to 1234;5" and all's well.  But what
will you do with "1;e4"?  Slide four places right and you'll
slide right off the edge of the world ...

Yes; I should pad it with zeroes, and will do so.
     Again, memmove().  And again, what if there are too few
digits to accommodate the slide?  "12;345e-1" gives "1;2345",
but what happens to "12;345e-6"?

Ha; I actually thought of that one. :) They're padded on the left
with zeroes, and then semicolon inserted later. That's what

is for. I'm not sure if this actually works (in fact, looking at it I
think it doesn't), but that's what I was going for. I'll test it out
tonight.
will probably get something just a little different.  This may
or may not be a problem for your application.

It is; I'm looking for the maximum possible accuracy. Eliminating all
those small inaccuracies is something I'm very interested in.
        decnum = 0
        for each digit d
            decnum = decnum * 12 + decmify(d)

Implemented this with the following:
*******
while ((c = s[i++]) != '\0') {
decnum = decnum * 12 + decmify(c);
*******
My, that looks a lot better. Thanks; that's quite clever. (I haven't
tested it yet, but I'm pretty sure it matches your pseudocode.) And
yes, I remembered to declare char c above.

     Pointless initialization: The initial value will never be
used, so you might as well initialize to `-2.71828'.  Or to
nothing.  As it is, specifying an initial value fools the reader
into thinking that the value is important -- when in fact it is
completely UNimportant.

Well, divisor *has* to start at 1.0, doesn't it? Otherwise, when I
say divisor *= 12.0; I'm going to get zero. Aren't number variables
intialized to zero unless otherwise stated? I don't have the standard
with me here, but that's what I was going for there. fracval, though,
I follow you; though it doesn't matter, because I got rid of fracval
entirely, as you suggested.
     Source code is written for two audiences: Compilers and
programmers.  The latter are the more important; write with
their needs foremost in your mind.

That's as much as I could work through now; I'll get the rest later.
Thanks again; it's been very eye-opening.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top