Bug with compiler or am I just doing something illegal

R

raphfrk

This program should copy one file onto the other. It works if I
compile it with gcc to a cygwin program. However, if I compile it
with the -mno-cygwin option, it doesn't work (this targets native
windows).

Anyway, I just want to check that the program is valid before I see if
I can find a way around a compiler bug.

It might be something simple that I am doing wrong.

-------------------------------------------------------------
#include <stdio.h>

main()
{
FILE *fp;

fp = fopen( "scan0001b.bmp" , "r" );
if( fp == NULL )
{
printf("File open failed for read\n");
exit(0);
}

FILE *fpo;

fpo = fopen( "scanout.bmp" , "w");

if( fpo == NULL )
{
printf("File open failed for write\n");
exit(0);
}


int c=1;

while(!feof(fp))
{
c = fgetc(fp);
if( c>=0 )
fputc( c , fpo );
}

fclose(fp);
fclose(fpo);

}
 
W

Willem

raphfrk wrote:
) This program should copy one file onto the other. It works if I
) compile it with gcc to a cygwin program. However, if I compile it
) with the -mno-cygwin option, it doesn't work (this targets native
) windows).
)
) Anyway, I just want to check that the program is valid before I see if
) I can find a way around a compiler bug.
)
) It might be something simple that I am doing wrong.

It might be.
There are certainly several simple things wrong with the code.

) FILE *fp;
)
) fp = fopen( "scan0001b.bmp" , "r" );

Not binary mode "rb" ? It is a binary file, right ?
Perhaps if you read in text mode, certain characters
can flag end of file on windows. Ctrl-Z perhaps.

) int c=1;

Why initialize it ? And why to 1 ??

) while(!feof(fp))

This mistake is so common that it has its own FAQ entry.
You see, the eof flag is set at the moment a read operation is done
when the file is at EOF. So after the last character is read, feof(fp)
will *not* return true. The next read will return an error code, and
*only then* will feof(fp) be true.

But theoretically, as is, it should work, because of the extra if (c>=0).

) {
) c = fgetc(fp);
) if( c>=0 )
) fputc( c , fpo );
) }

The correct idiom is this:

while ((c = fgetc(fp)) != EOF) {
fputc(c, fpo); /* Should check for errors here also, I think */
}
/* And here an if (ferror(fp)) would be nice */

) fclose(fp);
) fclose(fpo);
)
) }
)
) ---------------------------------------------
)
) When I run it, it stops before it has read the entire file (it only
) reads around 10%).

Offhand, I guess there is a ^Z character at 10% in the input file.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
L

Lew Pitcher

This program should copy one file onto the other. It works if I
compile it with gcc to a cygwin program. However, if I compile it
with the -mno-cygwin option, it doesn't work (this targets native
windows).

Anyway, I just want to check that the program is valid before I see if
I can find a way around a compiler bug.

It might be something simple that I am doing wrong.

-------------------------------------------------------------
#include <stdio.h>

main()
{
FILE *fp;

fp = fopen( "scan0001b.bmp" , "r" );

You've opened this file with the "read text" option. Presuming that the
filename represents a file in the Microsoft bitmap graphics format
(".BMP"), then this is the wrong mode to open the file in. You probably
want
fp = fopen( "scan0001b.bmp" , "rb" );
here.
if( fp == NULL )
{
printf("File open failed for read\n");
exit(0);
}

FILE *fpo;

fpo = fopen( "scanout.bmp" , "w");

Similarly, this is the wrong mode to open a (presumably binary) output file.
if( fpo == NULL )
{
printf("File open failed for write\n");
exit(0);
}


int c=1;

while(!feof(fp))

Remember, feof() does not read the file, and returns true /after/ the true
read (in your case, fgetc()) returns an end-of-file condition.
{
c = fgetc(fp);

In Microsoft Windows, text files are permitted to contain a binary octet
marker (0x1a or ^Z), which will indicate a logical end-of-file prior to the
physical end of the file. While this is a leftover from the MSDOS 1 days,
it still is enforced and acted apon by the underlying Windows I/O model.

If you /did/ mean your input file to contain pure binary data (rather than
the text that you indicate by the file mode string), then there is a very
good chance that at least one character (octet) of this pure binary data
has the value of 0x1a. This would cause your fgetc() on the file to
prematurely return EOF, and subsequently cause feof() to return true. This
in turn causes you to abort the copy process prior to the actual physical
end-of-file of the (presumably binary) input file.
if( c>=0 )
fputc( c , fpo );
}

fclose(fp);
fclose(fpo);

}

--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------
 
B

badc0de4

raphfrk said:
This program should copy one file onto the other. It works if I
compile it with gcc to a cygwin program. However, if I compile it
with the -mno-cygwin option, it doesn't work (this targets native
windows).

Anyway, I just want to check that the program is valid before I see if
I can find a way around a compiler bug.

It might be something simple that I am doing wrong.

main() return an int. Say so when you write the function:
int main(void)
{
FILE *fp;

fp = fopen( "scan0001b.bmp" , "r" );

You're not opening the file in binary mode.
On some systems, that will cause some characters in the file to be
interpreted by the C library (or the Operating System??) and those
characters will not reach your program exactly as they are on disk. It
may even happen that some character tells the C library (or the
Operating System) to stop reading the file right there, even though
the number of characters read is only a small percentage of the file
length reported by the system through other means.
if( fp == NULL )
{
printf("File open failed for read\n");
exit(0);
}

FILE *fpo;

fpo = fopen( "scanout.bmp" , "w");

You're not opening the file in binary mode.
Translation of characters can occur in a similar way to what happens
when you read a file in not binary mode.
if( fpo == NULL )
{
printf("File open failed for write\n");
exit(0);
}


int c=1;

Why 1? What does it mean?
I see you're using C99 (you're mixing declarations and code). There's
nothing wrong with that as long as you understand that C99 compilers
aren't as readily available as C89 compilers, and you don't mind lose
some portability.
while(!feof(fp))
{
c = fgetc(fp);
if( c>=0 )
fputc( c , fpo );
}
This loop is wrong.
feof() doesn't do what you think it does.
Read the answer to question 12.2 on the c-faq ( http://c-faq.com/ ),
and while you're there, bookmark the site and return there every now
and again.
fclose(fp);
Failed to test if the fclose() call succeded.
 
R

raphfrk

You've opened this file with the "read text" option. Presuming that the
filename represents a file in the Microsoft bitmap graphics format
(".BMP"), then this is the wrong mode to open the file in. You probably
want
fp = fopen( "scan0001b.bmp" , "rb" );
here.

Ahh, I didn't realise that there was a rb option.

Does it work exactly the same other than stopping at logic end of
file?

Can I still use things like fscanf( fp , "%c" , &variable )
Remember, feof() does not read the file, and returns true /after/ the true
read (in your case, fgetc()) returns an end-of-file condition.

Ahh, another piece of useful info.

Thanks to all that replied. This group is great for spotting non-
obvious coding errors.

A previous tip for ensuring malloc used the right size has really
reduced bugs when I use dynamic allocation.
 
B

Ben Bacarisse

Robbie Hatley said:
while(!feof(fp))
{
c = fgetc(fp);
if( c>=0 )
fputc( c , fpo );
}

In addition to the excellent advice others here gave you,
there's one *HUGE* error here which everyone seems to have
missed somehow: the 0 byte, 0x00, is a perfectly
valid byte both for text files (ASCII, iso-8859-1, etc)
and binary (non-text) files. Many of the bytes in a
bmp file will be 0. If you omit those and close up the
gaps, you will severely corrupt the copy of the original
file.

I presume you missed the = part of the >=?
It will NOT render correctly in a graphics viewer
(Paintshop Pro, Internet Explorer, or whatever). It
probably can't even be opened, because the headers will
be screwed up.

So your while loop is wrong from several standpoints.

Actually no (unless I've missed some subtlety). It is non-idiomatic
but looks to be entirely correct to me.
Try:

// Loop while file pointer is valid, until break:
while (fp)
{
c = fgetc(fp); // ATTEMPT TO READ A BYTE.
if (feof(fp)) break; // BREAK IF READ ATTEMPT FAILED.
else fputc(c, fpo); // COPY EVEN THE "0" BYTES.
}

The canonical version would be:

while ((c = fgetc(fp)) != EOF)
fputc(c, fpo);
 
K

Keith Thompson

Robbie Hatley said:
while(!feof(fp))
{
c = fgetc(fp);
if( c>=0 )
fputc( c , fpo );
}

In addition to the excellent advice others here gave you,
there's one *HUGE* error here which everyone seems to have
missed somehow: the 0 byte, 0x00, is a perfectly
valid byte both for text files (ASCII, iso-8859-1, etc)
and binary (non-text) files. Many of the bytes in a
bmp file will be 0. If you omit those and close up the
gaps, you will severely corrupt the copy of the original
file. It will NOT render correctly in a graphics viewer
(Paintshop Pro, Internet Explorer, or whatever). It
probably can't even be opened, because the headers will
be screwed up.
[...]

Look again. The test is "c>=0", not "c>0". 0 bytes are treated the
same as any other valid bytes.

And I'd dispute that '\0' is a valid byte in a text file, at least for
most text formats. A text file *can* have '\0' characters, but
they'll cause problems for programs that use fgets() because they'll
be treated as string terminators.
 
R

rahul

This program should copy one file onto the other. It works if I
compile it with gcc to a cygwin program. However, if I compile it
with the -mno-cygwin option, it doesn't work (this targets native
windows).
Just curious, do you mean that this code worked correctly in the
former case? Since the errors seems to be in your code, I don't
understand how it worked well as a cygwin program. One possible
explanation is that as cygwin emulates unix environment, it does not
distinguish between byte and text streams.
 
R

raphfrk

Just curious, do you mean that this code worked correctly in the
former case? Since the errors seems to be in your code, I don't
understand how it worked well as a cygwin program. One possible
explanation is that as cygwin emulates unix environment, it does not
distinguish between byte and text streams.

Yeah, I think that is probably it. My memory is that fprintf as
native cygwin will output unix type files.

In fact, it can cause issues when compiling source files. It executes
a dos2unix command before it passes them to the compiler. This can
mean that the line number in the compiler doesn't match the DOS line
number as sometimes notepad adds hidden characters that are
interpreted as newlines. (I think created by pressing shift-delete or
some other wierd combination).

I end up running dos2unix and unix2dos on the files every so often.
This ensures the DOS version and the unix version are equivalent.
 
K

Keith Thompson

rahul said:
Just curious, do you mean that this code worked correctly in the
former case? Since the errors seems to be in your code, I don't
understand how it worked well as a cygwin program. One possible
explanation is that as cygwin emulates unix environment, it does not
distinguish between byte and text streams.

Yes, Cygwin, in its default configuration, uses the Unix format for
text files.
 
K

Keith Thompson

Robbie Hatley said:
The c being an int (not char or unsigned char) is important,
because fgetc returns an int containing the unsigned value
(0 to 255) of the character, or EOF if character couldn't
be read. (I'm assuming EOF would always be defined out of
the 0-255 range so it couldn't be confused with a character
value.)

Right.

EOF is of type int, and has a negative value, which must therefore be
outside the range of unsigned char. But note that the range of
unsigned char can be greater than 0..255 on systems where CHAR_BIT>8.
(Most modern systems other than DSPs have CHAR_BIT==8, but the
standard merely requires CHAR_BIT>=8.)

You can have problems if an input character, when converted from
unsigned char to int, yields a negative value that might happen to
match the value of EOF. This can happen only if sizeof(int)==1, which
can only happen if CHAR_BIT>=16 (since int must be at least 16 bits
wide). In practice, this isn't an issue; I know of no current hosted
implementation with CHAR_BIT!=8, and freestanding implementations
aren't required to provide <stdio.h>. But in theory, the canonical
I/O loop:

int c;
...
while ((c = fgetc(fin)) != EOF) {
/* ... */
}

isn't absolutely 100% portable.
 
R

raphfrk

You can have problems if an input character, when converted from
unsigned char to int, yields a negative value that might happen to
match the value of EOF. This can happen only if sizeof(int)==1, which
can only happen if CHAR_BIT>=16 (since int must be at least 16 bits
wide). In practice, this isn't an issue; I know of no current hosted
implementation with CHAR_BIT!=8, and freestanding implementations
aren't required to provide <stdio.h>. But in theory, the canonical
I/O loop:

int c;
...
while ((c = fgetc(fin)) != EOF) {
/* ... */
}

isn't absolutely 100% portable.

What would EOF be equal to in that case and what would be the correct
code?

In fact, it doesn't look possible, it is not just a portability issue.
 
V

vippstar

What would EOF be equal to in that case and what would be the correct
code?
The correct code would use feof() and ferror() and ignore what fgetc()
returns.
Example:

while((c = fgetc(fin), !feof(fin)) && !ferror(fin)) { /* ... */ }

However I also do not know of such implementation. I'm sure future
implementors will not let this happend, it will break a fair amount of
C code.
 
R

raphfrk

However I also do not know of such implementation. I'm sure future
implementors will not let this happend, it will break a fair amount of
C code.

Ahh, so the EOF return is in effect useless if you want 100% portable
code.
 
K

Keith Thompson

raphfrk said:
What would EOF be equal to in that case and what would be the correct
code?

In fact, it doesn't look possible, it is not just a portability issue.

EOF is required to be a negative value of type int (in practice, it's
generally -1). That wouldn't change. It just would happen to match a
valid character value (represented as unsigned char and converted to
int).

In fact the conversion from unsigned char to signed int would yield an
implementation-defined result (or, in theory, raise an
implementation-defined signal).
 
K

Keith Thompson

The correct code would use feof() and ferror() and ignore what fgetc()
returns.
Example:

while((c = fgetc(fin), !feof(fin)) && !ferror(fin)) { /* ... */ }

You can still do the comparison to EOF; you just have to realize that
it doesn't necessarily mean that you've reached end-of-file.

If c is equal to EOF *and* either feof(fin) or ferror(fin) returns
true, then you've reached the end of your input (or encountered an
error). Checking for EOF first can avoid the (minor) cost of the
function calls in some cases.

I might write it something like this:

#define END_OF_FILE(c, f) ((c)==EOF && (feof(f) || ferror(f)))

while (c = fgetc(fin), ! END_OF_FILE(c, fin)) {
/* ... */
}
However I also do not know of such implementation. I'm sure future
implementors will not let this happend, it will break a fair amount of
C code.

Agreed. In practice, it's just not worth worrying about.
 
C

CBFalconer

raphfrk said:
.... snip ...

What would EOF be equal to in that case and what would be the
correct code?

EOF will be whatever it is defined as in stdio.h. You have no
problem generating correct code, as long as you include the header.
 
H

Harald van Dijk

You can have problems if an input character, when converted from
unsigned char to int, yields a negative value that might happen to
match the value of EOF. [...]

What would EOF be equal to in that case and what would be the correct
code?
The correct code would use feof() and ferror() and ignore what fgetc()
returns.
Example:

while((c = fgetc(fin), !feof(fin)) && !ferror(fin)) { /* ... */ }

You can still do the comparison to EOF; you just have to realize that it
doesn't necessarily mean that you've reached end-of-file.

If c is equal to EOF *and* either feof(fin) or ferror(fin) returns true,
then you've reached the end of your input (or encountered an error).
Checking for EOF first can avoid the (minor) cost of the function calls
in some cases.

Here's how I would do and have done it:

for (;;) {
c = getc(in);
if (c == EOF) {
if (feof(in)) {
/* cleanup omitted */
return;
}
if (ferror(in)) {
do_error("read error");
return;
}
}

/* process read character c, which may happen to compare equal to EOF */
}

Here, I don't want to treat read errors the same way as the end of the
file. If there is a read error, I want to print an error message, set a
flag to return EXIT_FAILURE at the end of the program, but return from the
read function and see if the input that has been read so far is useful. If
the end of the file is reached, I don't want to print anything, I don't
want to set any flag, and I want to return from the read function. This
already requires me to check feof or ferror, so handling a character that
happens to compare equal to EOF has practically no extra cost.
 
K

Keith Thompson

Harald van Dijk said:
You can have problems if an input character, when converted from
unsigned char to int, yields a negative value that might happen to
match the value of EOF. [...]

What would EOF be equal to in that case and what would be the correct
code?
The correct code would use feof() and ferror() and ignore what fgetc()
returns.
Example:

while((c = fgetc(fin), !feof(fin)) && !ferror(fin)) { /* ... */ }

You can still do the comparison to EOF; you just have to realize that it
doesn't necessarily mean that you've reached end-of-file.

If c is equal to EOF *and* either feof(fin) or ferror(fin) returns true,
then you've reached the end of your input (or encountered an error).
Checking for EOF first can avoid the (minor) cost of the function calls
in some cases.

Here's how I would do and have done it:

for (;;) {
c = getc(in);
if (c == EOF) {
if (feof(in)) {
/* cleanup omitted */
return;
}
if (ferror(in)) {
do_error("read error");
return;
}
}

/* process read character c, which may happen to compare equal to EOF */
}

Here, I don't want to treat read errors the same way as the end of the
file. If there is a read error, I want to print an error message, set a
flag to return EXIT_FAILURE at the end of the program, but return from the
read function and see if the input that has been read so far is useful. If
the end of the file is reached, I don't want to print anything, I don't
want to set any flag, and I want to return from the read function. This
already requires me to check feof or ferror, so handling a character that
happens to compare equal to EOF has practically no extra cost.

Have you actually used a system where EOF may match a valid character?
 
H

Harald van Dijk

Have you actually used a system where EOF may match a valid character?

No, I haven't. I originally wrote the code the way I did without paying
attention to that, and ended up with

for (;;) {
c = getc(in);
if (c == EOF) {
if (ferror(in)) {
do_error("read error");
}
/* cleanup omitted */
return;
}

/* process read character c */
}

After realising the problem, while it's only a problem in theory for me, I
noticed the cost of avoiding the assumption was so low and that it didn't
clutter the code, that I really had no excuse not to just fix it anyway.

I already try to avoid using the result of an assignment, because I think
it makes the code harder to read. I would never write

while ((c = getc(in)) != EOF)

and don't care much for

while (c = getc(in), c != EOF)

either. I understand they're valid, and I don't consider them bad style,
but it's not my preferred way to write. Similarly, I didn't break out of
the loop, but placed the cleanup near the EOF check, because that made
more sense, made the code easier to follow for me.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,832
Latest member
GlennSmall

Latest Threads

Top