Byte-26 Terror

E

Eik

Hey,

I got a serieus problem with reading a 26-byte (0x1A, 032, b00011010:
SUB [Substitute]) from a file.

Take this example:

---BEGIN PROGRAM---

#include <stdio.h>

main () {
int i;
unsigned char c;
FILE *fp;

fp = fopen ("chars.txt", "r");
for (i = 0; i < 256; i++) {
c = getc (fp);
printf ("%3d|", c);
}
}

---END PROGRAM---

The file chars.txt contains 256 bytes -> byte values: 0, 1, 2, ...,
253, 254, 255

When I run the program I get this:

0| 1| 2| ... 24| 25|255|255|255| ... etc. Til 'i' reaches 256 and
the program stops shitting '255' (EOF's) on my screen

What is wrong with the 26-byte?!?
When I remove the 26-byte from chars.txt, I get this:

0| 1| 2| ... 24| 25| 27| 28| ... 253|254|255|255|

Why is the 26-byte ruining my program? How should I change my program?

THX

Ps. I'm using Windows XP, and the C-compiler Miracle C

Ohhh... and does anybody know how you can get the size of an file
without checking where the EOF byte is (because in binairy file's
there are more EOF's)
 
E

Eric Sosman

Eik said:
Hey,

I got a serieus problem with reading a 26-byte (0x1A, 032, b00011010:
SUB [Substitute]) from a file.
[...]

This is Question 12.38 in the comp.lang.c Frequently
Asked Questions (FAQ) list

http://www.eskimo.com/~scs/C-faq/top.html
Ohhh... and does anybody know how you can get the size of an file
without checking where the EOF byte is (because in binairy file's
there are more EOF's)

.... and this is Question 19.12. Two for two: Does that
suggest anything about the utility of reading the FAQ
before posting, hmmm?
 
B

Ben Pfaff

I got a serieus problem with reading a 26-byte (0x1A, 032, b00011010:
SUB [Substitute]) from a file.

Open the file in binary mode: "rb" instead of "r" on fopen().
 
J

Jens.Toerring

Eik said:
I got a serieus problem with reading a 26-byte (0x1A, 032, b00011010:
SUB [Substitute]) from a file.
Take this example:
---BEGIN PROGRAM---
#include <stdio.h>
main () {

Better make that

int main()

to be ready for C99 compliant compilers that require it.
int i;
unsigned char c;
FILE *fp;

fp = fopen ("chars.txt", "r");

Open the file in binary mode, i.e. use "rb" instead of just "r", this
should cure the 0x1A problem. As far as I know in text mode Windows
interprets the 0x1A (aka ^Z) as an end-of-file marker and then seems
to have problems when you want to read more from the file;-)
for (i = 0; i < 256; i++) {
c = getc (fp);

getc() returns an int, so you better get used to storing its return
value into one, see more about why below.
printf ("%3d|", c);
}

You're missing the return value of the program - even if you
don't specify the return type of main() it's still supposed
to return an int. It's usually a ood idea to return EXIT_SUCCESS
on success (but if you do that don't forget to include said:
Ohhh... and does anybody know how you can get the size of an file
without checking where the EOF byte is (because in binairy file's
there are more EOF's)

In C there doesn't exist an EOF byte. EOF is a value that won't
fit into a char (and that's why you always should store the return
value of functions like getc() in an int, not a char, otherwise
you're unable to detect the end of the file).

But there are some OSes that take your much beloved character with
the value 0x1A (^Z) as an end-of-file marker when the file gets
read in text mode. But "end-of-file marker" is something different
from EOF. The end-of-file marker is a completely legal character
that can happen to be in every file, it just gets assigned a
special meaning when a file gets read in text mode on some systems.
In contrast, EOF is not a character at all but just a possible
return value of some functions (like getc()), indicating that the
end of the file has been reached. Often EOF if -1 and that explains
why you always got 255 after the ^Z: When you write -1 in binary
(on your system) all the bits of that number are set, and when you
now just take the lowest 8 and interpret them as an unsigned char
you end up with, voila, 255.

To see how to figure out the length of a file see the FAQ, section
19.12, but you will probably end up using some system specific
function if you really need that piece of information.

Regards, Jens
 
K

Kenneth Brody

Eik wrote:
[...]
#include <stdio.h>

main () {
int i;
unsigned char c;
FILE *fp;

fp = fopen ("chars.txt", "r");
for (i = 0; i < 256; i++) {
c = getc (fp);
printf ("%3d|", c);
}
}
[...]

Open the file in binary mode. If you are on a platform that treats
Ctrl-Z in a text file as EOF, then getc() will return EOF.

(Also, note that getc returns "int" not "unsigned char".)

Try the following (still using text mode) program:

=============

#include <stdio.h>

main () {
int i;
int c; /* <------------- change */
FILE *fp;

fp = fopen ("chars.txt", "r");
for (i = 0; i < 256; i++) {
c = getc (fp);
if ( c == EOF ) /* <-------------- insert */
{
printf("\nEnd of file reached.\n");
break;
}
printf ("%3d|", c);
}
}

=============

Then change the fopen to use "rb" instead of "r".
 
K

Kelsey Bjarnason

[snips]

fp = fopen ("chars.txt", "r");
Why is the 26-byte ruining my program? How should I change my program?

Simple: you're using a DOS-based system (DOS or Windows), in which
character value 26, in a text file, means EOF. You're opening the file in
text mode, so when it hits that character, you're at the end of the file.

Either lose the character 26 - the EOF - or open the file in binary mode,
which doesn't care what the values are.

On a related note, using a char type for the character read back is a bad
idea; if there _is_ an EOF returned - and you need to differentiate it
from real data - you can't. Those functions return an int for a reason.
 
E

Eik

Kenneth Brody said:
Eik wrote:
[...]
#include <stdio.h>

main () {
int i;
unsigned char c;
FILE *fp;

fp = fopen ("chars.txt", "r");
for (i = 0; i < 256; i++) {
c = getc (fp);
printf ("%3d|", c);
}
}
[...]

Open the file in binary mode. If you are on a platform that treats
Ctrl-Z in a text file as EOF, then getc() will return EOF.

(Also, note that getc returns "int" not "unsigned char".)

Try the following (still using text mode) program:

=============

#include <stdio.h>

main () {
int i;
int c; /* <------------- change */
FILE *fp;

fp = fopen ("chars.txt", "r");
for (i = 0; i < 256; i++) {
c = getc (fp);
if ( c == EOF ) /* <-------------- insert */
{
printf("\nEnd of file reached.\n");
break;
}
printf ("%3d|", c);
}
}

=============

Then change the fopen to use "rb" instead of "r".

Thanx man
 
W

William L. Bahn

When you open a file in text mode, you are telling the compiler that the
expectation is that the contents of that file are one-byte ASCII codes and
that you want it to be aware of that and act accordingly. In addition to
reading them out of the file, your compiler will insert the code to do
certain things when certain values are read (which depend on your OS and
your compiler) because that is what that ASCII code tells it to do. 0x1A is
'Z' or 'z' with bits 5 and 6 cleared, which makes it Ctl-Z. In DOS text
files, that is used as the end-of-file marker. So your code is only doing
exactly what you told it to do.

If you are storing binary data in a file - read it as a binary file.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top