endian problem, please help

K

kolmogolov

hi,

it's not really an endian problem. I think I must
be missing something else ...

The problem can be reduced to different
results of the following two segments of codes:
(cut and pasted verbatim)

1.

*width = (unsigned char) fgetc(fp) +
256 * (unsigned char) fgetc(fp) +
65536 * (unsigned char) fgetc(fp) +
16777216L * (unsigned char) fgetc(fp);

yields *width == 131072 which should have been 512 for
fp points to the byte sequence of "00 02 00 00" while

2.
*width =(unsigned char) fgetc(fp);
*width += 256 * (unsigned char) fgetc(fp);
*width += 65536 * (unsigned char) fgetc(fp);
*width += 16777216L * (unsigned char) fgetc(fp);

results in the expected value 512.

What am I missing? It's driving me ....... :(

Thanks for any hint!
 
M

matevzb

hi,

it's not really an endian problem. I think I must
be missing something else ...

The problem can be reduced to different
results of the following two segments of codes:
(cut and pasted verbatim)

1.

*width = (unsigned char) fgetc(fp) +
256 * (unsigned char) fgetc(fp) +
65536 * (unsigned char) fgetc(fp) +
16777216L * (unsigned char) fgetc(fp);

yields *width == 131072 which should have been 512 for
fp points to the byte sequence of "00 02 00 00" while

2.
*width =(unsigned char) fgetc(fp);
*width += 256 * (unsigned char) fgetc(fp);
*width += 65536 * (unsigned char) fgetc(fp);
*width += 16777216L * (unsigned char) fgetc(fp);

results in the expected value 512.

What am I missing? It's driving me ....... :(

Thanks for any hint!
I'm not 100% sure, but it could be the order of evaluation (first
result is 2*65536). From C89 draft:
"Except as indicated by the syntax{27} or otherwise specified later
(for the function-call operator () , && , || , ?: , and comma
operators), the order of evaluation of subexpressions and the order in
which side effects take place are both unspecified."
 
G

Guest

hi,

it's not really an endian problem. I think I must
be missing something else ...

The problem can be reduced to different
results of the following two segments of codes:
(cut and pasted verbatim)

1.

*width = (unsigned char) fgetc(fp) +
256 * (unsigned char) fgetc(fp) +
65536 * (unsigned char) fgetc(fp) +
16777216L * (unsigned char) fgetc(fp);

yields *width == 131072 which should have been 512 for
fp points to the byte sequence of "00 02 00 00" while

There's no guarantee that the fgetc() call on the first line gets
called first. It might on some systems, but the calls are allowed to
occur in any order, and on your system, it so happens that that order
is not what you want. Your version with four separate statements does
not have this problem since statements are not allowed to be reordered
(except when the compiler knows it doesn't matter for the result).
 
K

Kenny McCormack

hi,

it's not really an endian problem. I think I must
be missing something else ...

The problem can be reduced to different
results of the following two segments of codes:
(cut and pasted verbatim)

Why not use fread()?
 
K

kolmogolov

Kenny said:
Why not use fread()?

Thanks for all prompt answers!

So, it HAS BEEN an incorrect assumption about
the evaluation order, right?

Yes, I sometimes do use fread() followed by an endian
conversion if necessary

In case the object I'm reading is constantly written in
either big or little endian, I though I could have save
the endian conversion codes this way.

So, do you think the (debugging) correct version of mine
is ok? I mean, I'd like to kow how would real experts do
this?
 
M

matevzb

On Dec 10, 5:07 pm, "(e-mail address removed)" <[email protected]>
wrote:
So, do you think the (debugging) correct version of mine
is ok? I mean, I'd like to kow how would real experts do
this?
I don't know how real experts would do it, but I'd check for errors
returned by fgetc() as well. If it fails, it will return EOF, so check
for that, but _before_ converting to unsigned char. From the IRIX 5.3
man page:
WARNING
If the integer value returned by getc, getchar, or fgetc is stored
into a
character variable and then compared against the integer constant
EOF,
the comparison may never succeed, because sign-extension of a
character
on widening to integer is machine-dependent.
 
K

kolmogolov

matevzb said:
On Dec 10, 5:07 pm, "(e-mail address removed)" <[email protected]>
wrote:

I don't know how real experts would do it, but I'd check for errors
returned by fgetc() as well. If it fails, it will return EOF, so check
for that, but _before_ converting to unsigned char. From the IRIX 5.3
man page:
WARNING
If the integer value returned by getc, getchar, or fgetc is stored
into a
character variable and then compared against the integer constant
EOF,
the comparison may never succeed, because sign-extension of a
character
on widening to integer is machine-dependent.

I envy the manpages you have on IRIX. Wonderful advices. Thanks.

I don't know if I'll be blamed for including non-standard things
(or should I better use long instead of int32_t for maximum
portability) but I'm calling

#include <stdint.h>
int32_t fget_int32_le(FILE *fp)
{
int32_t x, weight;
int c, i;

for (x=0, weight=1, i=0; i<4; i++)
{
assert ( EOF != (c=fgetc(fp) ) ); /* TODO: to be handled */
x += weight * (unsigned char) c;
weight *= 256;
}
/*
x =(unsigned char) fgetc(fp);
x += 256 * (unsigned char) fgetc(fp);
x += 65536 * (unsigned char) fgetc(fp);
x += 16777216L * (unsigned char) fgetc(fp);
*/
return x;
}
 
C

CBFalconer

The problem can be reduced to different results of the following
two segments of codes: (cut and pasted verbatim)

1.
*width = (unsigned char) fgetc(fp) +
256 * (unsigned char) fgetc(fp) +
65536 * (unsigned char) fgetc(fp) +
16777216L * (unsigned char) fgetc(fp);

yields *width == 131072 which should have been 512 for
fp points to the byte sequence of "00 02 00 00" while

2.
*width =(unsigned char) fgetc(fp);
*width += 256 * (unsigned char) fgetc(fp);
*width += 65536 * (unsigned char) fgetc(fp);
*width += 16777216L * (unsigned char) fgetc(fp);

results in the expected value 512.

What am I missing? It's driving me ....... :(

You don't show a complete program, so it's all pure guesswork.
First, you don't need the casts. fgetc returns an int, which is
the unsigned char equivalent of the input char. So get rid of
them. Casts are usually wrong anyhow.

Second, you don't show the type of width. I suspect you have
undefined behaviour due to overflows. The value of "65536 *
fgetc(fp)" and "256 * fgetc(fp)" can exceed the size of an int.
Width should be an unsigned long. 65536 should be 65536L. 256
should be 256L.

You also have the problem of unspecified order of fgetc calls.
Putting it together, you have, with width an unsigned long:

int ch;
unsigned long width, i;

i = 1; width = 0;
while ((i < 16777217L) && (EOF != (ch = fgetc(fp)))) {
width += i * ch; i *= 256;
}

which won't blow up if fgetc ever returns EOF.
 
M

matevzb

On Dec 10, 6:05 pm, "(e-mail address removed)" <[email protected]>
wrote:
I envy the manpages you have on IRIX. Wonderful advices. Thanks.
<OT>I usually check different systems, but at home I have an IRIX at
hand, so that's the first one to check. The same man pages are
available at
http://techpubs.sgi.com/library/tpl/cgi-bin/browse.cgi?coll=0530&db=man&pth=ALL.
You can also google for "man getchar", it usually yields good results
(for different systems) said:
I don't know if I'll be blamed for including non-standard things
(or should I better use long instead of int32_t for maximum
portability) but I'm calling

#include <stdint.h>
<stdint.h> and <inttypes.h> are C99-compliant, so you can use them (I'm
not sure how portably, though). int32_t however is a POSIX extension
(http://www.opengroup.org/onlinepubs/000095399/basedefs/stdint.h.html)
 
M

matevzb

in C99. (Apparently it's required in POSIX.)
Oops, my bad. I incorrectly assumed that since POSIX requires it and
specifies it as an extension to ISO, it wasn't specified in the
Standard. Must check the Standard more often...
 
K

Kenneth Brody

hi,

it's not really an endian problem. I think I must
be missing something else ...

The problem can be reduced to different
results of the following two segments of codes:
(cut and pasted verbatim)

1.

*width = (unsigned char) fgetc(fp) +
256 * (unsigned char) fgetc(fp) +
65536 * (unsigned char) fgetc(fp) +
16777216L * (unsigned char) fgetc(fp);

yields *width == 131072 which should have been 512 for
fp points to the byte sequence of "00 02 00 00" while

2.
*width =(unsigned char) fgetc(fp);
*width += 256 * (unsigned char) fgetc(fp);
*width += 65536 * (unsigned char) fgetc(fp);
*width += 16777216L * (unsigned char) fgetc(fp);

results in the expected value 512.

What am I missing? It's driving me ....... :(

It's called (or at least, related to) "sequence points".

There is no guarantee that the four fgetc's in (1) will be
called in the order you want. In fact, it appears that your
compiler is using the exact opposite order.

In (2), you have placed a sequence point between each of the
fgetc calls, and therefore guarantee the order in which they
get called.

This is really no different than:

printf("%d %d %d %d\n",fgetc(fp),fgetc(fp),fgetc(fp),fgetc(fp));
or
int i=0;
printf("%d %d %d\n",i++,i++,i++);

(Though I'm not sure if your code invokes UB, as does my second
example, or is simply "implementation defined".)

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:[email protected]>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top