looking at binary files with C

J

Joe Pfeiffer

Uno said:
There were a lot of problems with this source that needing ironing
out, and I needed to start to somehwhere. When I've got a program
that compiles, I like to have the magic numbers #defined. I haven't
really done much C on a 64-bit processor, so my ballparking seems to
be wanting.

Note that if you had done this earlier in the process, your mystery
segfault wouldn't have happened, saving you and everybody else some
time.
 
U

Uno

And do what with it?

Pass it to a common C extension.
$ indent -i3 hist7.c [...]
$ cat hist7.c
#include<stdio.h>
#include<limits.h>
#include<stdlib.h>
#define SIZE 20000000
#define SIZE2 (UCHAR_MAX+1)

int
main (void)
{
int c;
long counter = 0;
long counter2 = 0;
long counter3 = 0;
long a[SIZE2], b[SIZE2];
long i, j, end;
FILE *fp;
char filename[] = "shoulder.wmv";
unsigned char *p;

fp = fopen (filename, "rb+");
fseek (fp, 0L, SEEK_END);
end = ftell (fp);
fseek (fp, 0L, SEEK_SET);
printf ("file_length is %ld\n", end);
p = malloc (end);

You don't check whether malloc() succeeded. (Don't bother telling
us how much memory you have; just check whether p==NULL.)

Note that the fseek() method isn't guaranteed to tell you the size
of the file. It probably does so on your system, but the standard
doesn't guarantee it.

(And as long as your code is non-portable, you probably might as well
use some non-portable method like fstat().)

If it works on your system, and if you're sufficiently sure that
the file's size won't change while the program is running, that's
probably ok.

This is my latest version that uses non-portable aspects of my platform:

[sorry, could only paste as quotation]
$ cc -Wall -Wextra hist9.c -o hist
hist9.c: In function ‘main’:
hist9.c:17: warning: unused variable ‘a’
hist9.c:15: warning: unused variable ‘counter2’
hist9.c:14: warning: unused variable ‘counter’
$ ./hist
file_length is 19573712
Main control
b[245] is 50064
b[246] is 38224
b[247] is 58741
b[248] is 92306
b[249] is 58773
b[250] is 65617
b[251] is 41401
b[252] is 97099
b[253] is 75547
b[254] is 113450
b[255] is 128194
counter3 reached 819416
file_length is 19573712
$ cat hist9.c
#include <stdio.h>
#include <limits.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>

#define SIZE2 (UCHAR_MAX+1)

int
main (void)
{
int c;
long counter = 0;
long counter2 = 0;
long counter3 = 0;
long a[SIZE2], b[SIZE2];
long i, j, end, size;
FILE *fp;
char filename[] = "shoulder.wmv";
unsigned char *p;
struct stat st;

fp = fopen (filename, "rb+");
fseek (fp, 0L, SEEK_END);
end = ftell (fp);
fseek (fp, 0L, SEEK_SET);
printf ("file_length is %ld\n", end);
p = malloc (end);
if (p == NULL)
{
printf ("malloc failed.\n");
exit (EXIT_FAILURE);
}
for (i = 0; i < SIZE2; ++i)
{
b = 0;
}
printf ("Main control\n");
for (j = 0; j < end; ++j)
{
c = fgetc (fp);
if (c != EOF)
{
p[j] = c;
++b[c];
}
else
break;
}
// a little output
for (i = 245; i < SIZE2; ++i)
{
printf ("b[%ld] is %ld\n", i, b);
counter3 = counter3 + b;
}
printf ("counter3 reached %ld\n", counter3);

// stat

stat (filename, &st);
size = st.st_size;
printf ("file_length is %ld\n", size);

fclose (fp);
return 0;
}

// cc -Wall -Wextra hist9.c -o hist
$
[snip]
Does the malloc'ing look good for reading this file? Also, what do
POSIX and C say about the EOF character itself? If my data are typical,
I would say that neither thinks that EOF is part of the file.

There is no "EOF character". fgetc() returns *either* the value
of the character it just read (treated as an unsigned char and
converted to int) *or* the special value EOF if it was unable to
read a character.

Typically UCHAR_MAX==255 (it could be larger) and EOF==-1 (it can
be any negative int value). So fgetc() will return either a value
in the range 0..255, or the value -1.

No, EOF is not part of the file; it's an indication that there's
nothing more in the file, or that there was an error. You can call
feof() and/or ferror() to determine which.

Do you really need to read the entire file into memory? Perhaps you
do, but it's often better to process the data as you're reading it.
It depends on what your goal is.


Ok, good, that seems to square with my recollection and data. Cheers,
 
J

James Waldby

.
#define SIZE 20000000 ....
for (j = 0; j < SIZE; ++j) {
c = fgetc(fp);
counter++;
if (c != EOF) {
a[c] = ++a[c];
} else
break;
}
....

That code is quite ugly and limited (value of SIZE is irrelevantly
based on a specific file's size), besides having the undefined-
behavior problem that other posters mentioned, in "a[c] = ++a[c]".
I suggest that you instead use code like the following:

while (1) {
c = fgetc(fp);
if (c == EOF) break;
++a[c];
++counter;
}

Some people prefer to do the read and EOF test in one line:
if ((c = fgetc(fp)) == EOF) break;
 
U

Uno

That code is quite ugly and limited (value of SIZE is irrelevantly
based on a specific file's size), besides having the undefined-
behavior problem that other posters mentioned, in "a[c] = ++a[c]".
I suggest that you instead use code like the following:

while (1) {
c = fgetc(fp);
if (c == EOF) break;
++a[c];
++counter;
}

Some people prefer to do the read and EOF test in one line:
if ((c = fgetc(fp)) == EOF) break;

Thx, james, I like the main control a lot better now:

$ cc -Wall -Wextra hist10.c -o hist
hist10.c: In function ‘main’:
hist10.c:18: warning: unused variable ‘j’
hist10.c:17: warning: unused variable ‘a’
hist10.c:15: warning: unused variable ‘counter2’
$ cat hist10.c
#include <stdio.h>
#include <limits.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>

#define SIZE2 (UCHAR_MAX+1)

int
main (void)
{
int c;
long counter = 0;
long counter2 = 0;
long counter3 = 0;
long a[SIZE2], b[SIZE2];
long i, j, end, size;
FILE *fp;
char filename[] = "shoulder.wmv";
unsigned char *p;
struct stat st;

fp = fopen (filename, "rb+");
fseek (fp, 0L, SEEK_END);
end = ftell (fp);
fseek (fp, 0L, SEEK_SET);
printf ("file_length is %ld\n", end);
p = malloc (end);
if (p == NULL)
{
printf ("malloc failed.\n");
exit (EXIT_FAILURE);
}
for (i = 0; i < SIZE2; ++i)
{
b = 0;
}
printf ("Main control\n");
while (1){
c = fgetc (fp);
if (c == EOF) break;
p[counter++] = c;
++b[c];
}
// a little output
for (i = 245; i < SIZE2; ++i)
{
printf ("b[%ld] is %ld\n", i, b);
counter3 = counter3 + b;
}
printf ("counter3 reached %ld\n", counter3);

// stat

stat (filename, &st);
size = st.st_size;
printf ("file_length is %ld\n", size);

fclose (fp);
return 0;
}

// cc -Wall -Wextra hist10.c -o hist

$ ./hist
file_length is 19573712
Main control
b[245] is 50064
b[246] is 38224
b[247] is 58741
b[248] is 92306
b[249] is 58773
b[250] is 65617
b[251] is 41401
b[252] is 97099
b[253] is 75547
b[254] is 113450
b[255] is 128194
counter3 reached 819416
file_length is 19573712
$
 
E

Eric Sosman

It takes me a few posts before I get a syntactical C program to use to
describe what I'm doing. Without referent source, I can't communicate this.

If you cannot say what you want your program to do, neither
you nor anyone else can say whether it fulfills your purposes.
But you did notice that I have
#define SIZE2 (UCHAR_MAX+1)
, and this was the seg fault I couldn't get my head around when I first
posted.

No, it had nothing to do with the matter.
A client paid me
(!)

to put a binary file on the net, and I used perl's
net::ftp without knowing that I had to call the binary method in order
to upload without *nix eating ascii 13. (Thx a lot c.l.p.misc: for
nothing.)

Let me get this straight: You are upset with comp.lang.c for
failing to tell you how to use Perl correctly? Before we were even
aware you were trying to do anything?
The + on all opens because I might want to write to them.

Foolish. On some systems, wasteful.
No check for
failure because the only one I've got in my head is
or die "death $@\n"; Is there a K&R2 reference for how to do it?

After the call and assignment, check whether `fp' is NULL.
If it's not, fopen() succeeded and you're ready to do I/O on
the stream `fp' points to. If it's NULL, fopen() failed and you
should take alternative action. Even if the alternative action
amounts to nothing more than "I give up," it will at least be a
predictable response. Just plowing ahead as if everything were
fine produces unpredictable results: Quite likely a SIGSEGV, but
not necessarily anything that benign.
Right, so I'm looking for your comment on the dynamic allocation which I
posted as a reply to keith.

Since you are collecting payment for the program comp.lang.c
is more or less writing for you, perhaps you should offer the
forum a percentage. Until then, no more free comments.
 
S

Seebs

You should be able to describe it in English. If you can't, I suggest
that you don't really know what you're doing.

Going by his posting history, I think your evaluation may be
misleading. You are assuming a kind of baseline lucidity such that
"know what you're doing" makes sense, and I just plain don't see
the justification for this assumption.

-s
 
U

Uno

Going by his posting history, I think your evaluation may be
misleading. You are assuming a kind of baseline lucidity such that
"know what you're doing" makes sense, and I just plain don't see
the justification for this assumption.

-s


I'm looking at binary files with C. Read the fucking subject.

Now that you've come on to my thread, I'm gonna go somewhere where
there's less eurotrash.
 
U

Uno

If you cannot say what you want your program to do, neither
you nor anyone else can say whether it fulfills your purposes.

The program that I have farthest downthread is pretty close to what I
came in the door for. I thought it counted as a virtue around here not
to bring up the mixed-language programming and how standard C fits into
the whole.
No, it had nothing to do with the matter.


(!)

It is only due to a family emergency that I've been asked to do this. I
simply stipulated to them that if I did computer work for them, they
would have to give me what I would earned out doing my thing otherwise.
They accidentally paid my gas bill instead of where I told them, so I
got 3 years of gas for my efforts. (I'm very proud to have a gas bill
of $30 in february.)
Let me get this straight: You are upset with comp.lang.c for
failing to tell you how to use Perl correctly? Before we were even
aware you were trying to do anything?

No, you read incorrectly. The parenthetical expression was about
c.l.p.misc, which is chock of full of Seeb's and Cbfalconer's who really
aren't there to help or contribute beyond derision and being almost
always wrong.
Foolish. On some systems, wasteful.

Ok, I'll remove it.
After the call and assignment, check whether `fp' is NULL.
If it's not, fopen() succeeded and you're ready to do I/O on
the stream `fp' points to. If it's NULL, fopen() failed and you
should take alternative action. Even if the alternative action
amounts to nothing more than "I give up," it will at least be a
predictable response. Just plowing ahead as if everything were
fine produces unpredictable results: Quite likely a SIGSEGV, but
not necessarily anything that benign.

I think I've done that correctly now. I'm pretty close on most things,
but I need a nudge with others.
Since you are collecting payment for the program comp.lang.c
is more or less writing for you, perhaps you should offer the
forum a percentage. Until then, no more free comments.

You tend to chime in on the beginning of threads, which probably
behooves you. Cheers,
 
K

Keith Thompson

Uno said:
I'm looking at binary files with C. Read the fucking subject.

Now that you've come on to my thread, I'm gonna go somewhere where
there's less eurotrash.

Thank you, Seebs!
 
E

Eric Sosman

[...]
If you cannot say what you want your program to do, neither
you nor anyone else can say whether it fulfills your purposes.

The program that I have farthest downthread is pretty close to what I
came in the door for. I thought it counted as a virtue around here not
to bring up the mixed-language programming and how standard C fits into
the whole.

Okay. Your program fulfills every requirement, goal, wish,
hope, and dream you've expressed. The only mystery is: Why are
you asking comp.lang.c to suggest alterations to perfection?
It is only due to a family emergency that I've been asked to do this. I
simply stipulated to them that if I did computer work for them, they
would have to give me what I would earned out doing my thing otherwise.
They accidentally paid my gas bill instead of where I told them, so I
got 3 years of gas for my efforts. (I'm very proud to have a gas bill of
$30 in february.)

Somebody paid a thou for your services? P.T. Barnum's saying
comes instantly to mind.
I think I've done that correctly now. I'm pretty close on most things,
but I need a nudge with others.

A nudge with a clue-by-four, methinks. Uno, listen to me:
You have NO REASON to represent yourself as a competent programmer,
as a person whose programming services are deserving of pay, as a
person who has the slightest command of the tools you use so poorly.
This is not a judgement: We all start from a state of ignorance, and
learning our way out of it takes time and effort. If you expend the
effort and take the time, perhaps you will eventually learn. But at
the moment, you are NOT a C programmer, quite likely not an Anything
programmer, and passing yourself off as one is an act of fraud.

IANAL, and cannot say whether this particular species of fraud
is criminally or even civilly actionable. However, I'll relate a bit
of history: Some years ago I worked for a large company that had hired
a tiny company three or four times over a few years, spending maybe a
half million dollars for the benefit of their expertise. We hired them
again, and they sent us an "expert" who was in fact a learn-by-doing
rank beginner; this was exposed when he spent eleven days getting
nowhere on a problem I (a non-expert) then debugged in less than
twenty minutes.

We didn't take them to court. We just took them off the Approved
Contractors list, and never hired them again. Deprived of their
annual quarter-million, they were out of business within the year.

And my question to you, Uno, is: Do you want to be *that* expert?
 
H

Hans Vlems

#define SIZE 100000
[...]
long a[SIZE];
[...]
for (j = 0; j < 7000000; ++j) {
[...]
So, why the seg fault?

     Choose the phrase that most accurately completes this statement:
"7000000 is ________ 100000."

     a) less than
     b) greater than
     c) equal to
     d) more equal to
     e) more tweeted than

Quite right, though I doubt the OP understands why you came up with
this little quiz question.
The OP ought to learn the difference between learning C (or any other
programming language)
and programming... Once that is understood he will not only answer
your question correctly but
understand why you put it that way.
Hans
 
H

Hans Vlems

Let's just say this code is a prime example of why I took points of on
student programs that had magic numbers (like 7000000 for instance) in
the executable code.

There were a lot of problems with this source that needing ironing out,
and I needed to start to somehwhere.  When I've got a program that
compiles, I like to have the magic numbers #defined.  I haven't really
done much C on a 64-bit processor, so my ballparking seems to be wanting.

$  cc -Wall -Wextra -g hist3.c -o hist
$ ./hist
UCHAR_MAX + 1 is 256
a[0] is 217337
a[1] is 123676
a[2] is 137894
a[3] is 100155
...
a[252] is 97099
a[253] is 75547
a[254] is 113450
a[255] is 128194
Counter reached 19573713
Counter2 reached 19573712
$ cat hist3.c
#include <stdio.h>
#include <limits.h>
#define SIZE 20000000
#define SIZE2 (UCHAR_MAX+1)

int main(void)
{
     int c;
     long counter = 0;
     long counter2 = 0;
     long a[SIZE2];
     long i, j;
     FILE *fp;
     fp = fopen("shoulder.wmv", "rb+");
     printf("UCHAR_MAX + 1 is %d\n", SIZE2);
     for (i = 0; i < (SIZE2); ++i) {
        a = 0;
     }
     for (j = 0; j < SIZE; ++j) {
        c = fgetc(fp);
        counter++;
        if (c != EOF) {
            a[c] = ++a[c];
        } else
            break;
     }
     for (i = 0; i < (SIZE2); ++i) {
        printf("a[%ld] is %ld\n", i, a);
        counter2 = counter2 + a;
     }
     printf("Counter reached %ld\n", counter);
     printf("Counter2 reached %ld\n", counter2);
     fclose(fp);
     return 0;

}

// cc -Wall -Wextra -g hist3.c -o hist
$

I needed two different SIZES, one that would be larger than the byte
count of the file to be read and another that was one greater than
UCHAR_MAX.  Seems to behave.  Thanks all for comments.


Before trying to get a program compiled correctly you might need to
spend some time on design first.
Hans
 
L

lovecreatesbeauty

[...]
If you cannot say what you want your program to do, neither
you nor anyone else can say whether it fulfills your purposes.
The program that I have farthest downthread is pretty close to what I
came in the door for. I thought it counted as a virtue around here not
to bring up the mixed-language programming and how standard C fits into
the whole.

     Okay.  Your program fulfills every requirement, goal, wish,
hope, and dream you've expressed.  The only mystery is: Why are
you asking comp.lang.c to suggest alterations to perfection?
It is only due to a family emergency that I've been asked to do this. I
simply stipulated to them that if I did computer work for them, they
would have to give me what I would earned out doing my thing otherwise.
They accidentally paid my gas bill instead of where I told them, so I
got 3 years of gas for my efforts. (I'm very proud to have a gas bill of
$30 in february.)

     Somebody paid a thou for your services?  P.T. Barnum's saying
comes instantly to mind.
I think I've done that correctly now. I'm pretty close on most things,
but I need a nudge with others.

     A nudge with a clue-by-four, methinks.  Uno, listen to me:
You have NO REASON to represent yourself as a competent programmer,
as a person whose programming services are deserving of pay, as a
person who has the slightest command of the tools you use so poorly.
This is not a judgement: We all start from a state of ignorance, and
learning our way out of it takes time and effort.  If you expend the
effort and take the time, perhaps you will eventually learn.  But at
the moment, you are NOT a C programmer, quite likely not an Anything
programmer, and passing yourself off as one is an act of fraud.

     IANAL, and cannot say whether this particular species of fraud
is criminally or even civilly actionable.  However, I'll relate a bit
of history: Some years ago I worked for a large company that had hired
a tiny company three or four times over a few years, spending maybe a
half million dollars for the benefit of their expertise.  We hired them
again, and they sent us an "expert" who was in fact a learn-by-doing
rank beginner; this was exposed when he spent eleven days getting
nowhere on a problem I (a non-expert) then debugged in less than
twenty minutes.

     We didn't take them to court.  We just took them off the Approved
Contractors list, and never hired them again.  Deprived of their
annual quarter-million, they were out of business within the year.

     And my question to you, Uno, is: Do you want to be *that* expert?


Hey main, can this version get paid?


$ cat a.c
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>

int main(void)
{
FILE *fp;
const char *file = "./intro.wmv";
int errno_copy;
int ch;
unsigned long char_set[UCHAR_MAX + 1] = {0};

errno = 0;
fp = fopen(file, "r");
errno_copy = errno;
if (!fp){
printf("%s %s\n", strerror(errno_copy), file);
return EXIT_FAILURE;
}
while ((ch = fgetc(fp)) != EOF){
char_set[ch]++;
}
fclose(fp);

for (ch = 0; ch != sizeof char_set / sizeof *char_set; ch++){
if (char_set[ch]){
printf("BYTE:%d CNT:%lu\n", ch, char_set[ch]);
}
}

return EXIT_SUCCESS;
}
$
 
L

lovecreatesbeauty

       {if(i=='\t')
             P("[\\t, %u]", nc);
        else if(i=='\a')
             P("[\\a, %u]", nc);
        else if(i=='\r')
             P("[\\r, %u]", nc);
        else if(i=='\n')
             P("[\\n, %u]", nc);
        else if(i=='\b')
             P("[\\b, %u]", nc);
        else if(i=='\f')
             P("[\\f, %u]", nc);
         else if(i=='\v')
             P("[\\v, %u]", nc);
        else if(i=='\\')
             P("[\\, %u]", nc);
        else P("[%c, %u]", i, nc);


i think there're some other chars can't be printed. see the demo of
your code. Note the last "PuTTY" brought by running your code.

$ make && ./a.out ./intro.wmv

[cut]

N. Bytes File ./a.out=665107
$ PuTTY
 
I

Ian Collins

this is my try, i know there will be error etc etc
i'm not perfect...

#include<stdio.h>
#include<stdint.h>
#include<limits.h>

// macro for types
#define u64 uint64_t
#define u32 uint32_t
#define u16 uint16_t
#define u8 uint8_t

// macro for function
#define P printf

Readers can stop here!
 
J

Joe Pfeiffer

io_x said:
// macro for types
#define u64 uint64_t
#define u32 uint32_t
#define u16 uint16_t
#define u8 uint8_t

// macro for function
#define P printf

// macro for keyWords
#define G goto
#define R return
#define W while
#define F for

Please, no.
 
M

Michael Press

[...]

Code as disorganized as what you present here will fail
with high probability. You have posted in clc enough to
have heard most of the advice on writing good code, and
heard all the mistakes tyros make, some of which you
made here.

Perhaps at this time you are incapable of writing good
beginner's code even with help. What do you say? Do you
want to tackle this?
 
O

Oliver Jackson

It takes me a few posts before I get a syntactical C program to use to
describe what I'm doing.  Without referent source, I can't communicate this.





But you did notice that I have
#define SIZE2 (UCHAR_MAX+1)
, and this was the seg fault I couldn't get my head around when I first
posted.

Yes, but that's just because you're a paste eating, curry sniffing,
crap gargling, strychnine snorting, methane huffing, egg suckin',
famcon attending, plushie humping, brassiere wearing, bumper sticker
displaying, nym shifting, gui loving, garbage eating, mouth breathing,
card carrying dumbass.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,072
Latest member
trafficcone

Latest Threads

Top