hexump.c

J

jacob navia

Got the idea of adding one more exercise to the tutorial.

The goal is to show a small hexdump utility without any bells and
whistles, and add a bunch of exercises to add those. Here it is.

It uses standard C. Please tell me if there could be any
portability problems.

I do not use putchar but fputc to make it easier to add an output
file later as one more argument.

Please tell me if you see any errors in it. Note that the manifest
constants will be replaced by #defines in the exercises, when they
are asked to increase the number of columns, etc.

------------------------------------------------------cut here
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc,char *argv[])
{
if (argc < 2) {
fprintf(stderr,"Usage: %s <file name>\n",argv[0]);
return EXIT_FAILURE;
}
FILE *file = fopen(argv[1],"rb");
if (file == NULL) {
fprintf(stderr,"Impossible to open %s for reading\n",argv[1]);
return EXIT_FAILURE;
}
int oneChar = fgetc(file);
int column = 0,line = 0;
unsigned char tab[16+1];
char *hex = "0123456789abcdef";
int address = 1;

while (oneChar != EOF) {
if (column == 0) {
memset(tab,'.',16);
fprintf(stdout,"[%d] ",address);
}
if (oneChar >= ' ' && oneChar <= 127) {
tab[column] = oneChar;
}
fputc(hex[(oneChar >> 4)&0xf],stdout);
fputc(hex[oneChar&0xf],stdout);
fputc(' ',stdout);
column++;
if (column == 16) {
fputc(' ',stdout);
tab[column]=0;
fputs(tab,stdout);
column = 0;
}
line++;
if (line == 16) {
fputc('\n',stdout);
line = 0;
}
oneChar = fgetc(file);
address++;
}
fclose(file);
address--;
if (column > 0 ) {
while (column < 16) {
fprintf(stdout," ");
tab[column]=' ';
column++;
}
tab[16]=0;
fprintf(stdout," %s\n[%d]\n",tab,address);
}
else fprintf(stdout,"[%d]\n",address);
return EXIT_SUCCESS;
}

----------------------------------------------------cut here
 
T

tea strainer

jacob said:
fprintf(stderr,"Usage: %s <file name>\n",argv[0]); return
fails if argv[0] is null
unsigned char tab[16+1];

you should use char to store printable characters
if (oneChar >= ' ' && oneChar <= 127) {
tab[column] = oneChar;
}

could lead to unprintable characters appearing on non-ascii systems
fputc(hex[(oneChar >> 4)&0xf],stdout);

the mask is unnecessary unless you are targeting systems with CHAR_BIT>8
fprintf(stdout," ");

you know there is a function called printf?

looks like your C code for trivial exercises is just as sloppy as the
code in your buggy and overpriced compiler jacob.
 
I

Ian Collins

Got the idea of adding one more exercise to the tutorial.

The goal is to show a small hexdump utility without any bells and
whistles, and add a bunch of exercises to add those. Here it is.

It uses standard C. Please tell me if there could be any
portability problems.

I do not use putchar but fputc to make it easier to add an output
file later as one more argument.

Please tell me if you see any errors in it. Note that the manifest
constants will be replaced by #defines in the exercises, when they
are asked to increase the number of columns, etc.

------------------------------------------------------cut here
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
int main(int argc,char *argv[])
{
if (argc< 2) {
fprintf(stderr,"Usage: %s<file name>\n",argv[0]);
return EXIT_FAILURE;
}
FILE *file = fopen(argv[1],"rb");
if (file == NULL) {
fprintf(stderr,"Impossible to open %s for reading\n",argv[1]);
return EXIT_FAILURE;
}
int oneChar = fgetc(file);
int column = 0,line = 0;
unsigned char tab[16+1];

This should be char, not unsigned char.
char *hex = "0123456789abcdef";

This should be const char*.
 
J

jacob navia

Le 09/09/11 23:06, tea strainer a écrit :
jacob said:
fprintf(stderr,"Usage: %s<file name>\n",argv[0]); return
fails if argv[0] is null
unsigned char tab[16+1];

you should use char to store printable characters
OK
if (oneChar>= ' '&& oneChar<= 127) {
tab[column] = oneChar;
}

could lead to unprintable characters appearing on non-ascii systems

Yes, I will replace this with isprint()

fputc(hex[(oneChar>> 4)&0xf],stdout);

the mask is unnecessary unless you are targeting systems with CHAR_BIT>8

You are just wrong. If I would follow you I would risk a segment
violation.
you know there is a function called printf?

I said in the introduction to the code that I did not want to use
implicit stdout since the output file will be changed as an exercise.

But yes, I did not know about printf, thanks

looks like your C code for trivial exercises is just as sloppy as the
code in your buggy and overpriced compiler jacob.

You are too coward to insult people openly, you hide behind a pseudonym.

Good bye "tea strainer".
 
I

Ike Naar

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc,char *argv[])
{
if (argc < 2) {
fprintf(stderr,"Usage: %s <file name>\n",argv[0]);

Undefined behaviour in the (unlikely) case where argc = 0 and argv[0] = NULL.
return EXIT_FAILURE;
}
FILE *file = fopen(argv[1],"rb");
if (file == NULL) {
fprintf(stderr,"Impossible to open %s for reading\n",argv[1]);
return EXIT_FAILURE;
}
int oneChar = fgetc(file);
int column = 0,line = 0;

The variables column and line serve the same purpose.
One of them can go.
unsigned char tab[16+1];

Why unsigned?
char *hex = "0123456789abcdef";
int address = 1;

while (oneChar != EOF) {
if (column == 0) {
memset(tab,'.',16);
fprintf(stdout,"[%d] ",address);
}
if (oneChar >= ' ' && oneChar <= 127) {
tab[column] = oneChar;
}
fputc(hex[(oneChar >> 4)&0xf],stdout);
fputc(hex[oneChar&0xf],stdout);
fputc(' ',stdout);
column++;
if (column == 16) {
fputc(' ',stdout);
tab[column]=0;
fputs(tab,stdout);
column = 0;
}
line++;
if (line == 16) {
fputc('\n',stdout);
line = 0;
}
oneChar = fgetc(file);
address++;
}
fclose(file);
address--;
if (column > 0 ) {
while (column < 16) {
fprintf(stdout," ");
tab[column]=' ';
column++;
}
tab[16]=0;
fprintf(stdout," %s\n[%d]\n",tab,address);
}
else fprintf(stdout,"[%d]\n",address);
return EXIT_SUCCESS;
}

The output would look nicer (better aligned) if the address
were printed using a constant field width. Now it looks like this:

[1] 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 xxxxxxxxxxxxxxxx
[17] 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 xxxxxxxxxxxxxxxx
[33] 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 xxxxxxxxxxxxxxxx
[49] 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 xxxxxxxxxxxxxxxx
[65] 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 xxxxxxxxxxxxxxxx
[81] 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 xxxxxxxxxxxxxxxx
[97] 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 xxxxxxxxxxxxxxxx
[113] 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 0a xxxxxxxxxxxxxxx.
[128]

Note the misalignment between first/second and seventh/eighth lines.

And personally I find the one-based addresses a bit odd, but
perhaps that's just me.
 
J

jacob navia

Le 10/09/11 00:29, Ike Naar a écrit :
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
int main(int argc,char *argv[])
{
if (argc< 2) {
fprintf(stderr,"Usage: %s<file name>\n",argv[0]);

Undefined behaviour in the (unlikely) case where argc = 0 and argv[0] = NULL.

Yes. But do you know a system where that happens?

Never found one.
return EXIT_FAILURE;
}
FILE *file = fopen(argv[1],"rb");
if (file == NULL) {
fprintf(stderr,"Impossible to open %s for reading\n",argv[1]);
return EXIT_FAILURE;
}
int oneChar = fgetc(file);
int column = 0,line = 0;

The variables column and line serve the same purpose.
One of them can go.

No, lines count the lines and columns the number of chars
read in that line of output.
unsigned char tab[16+1];

Why unsigned?


Bug. Changed that to plain char.
The output would look nicer (better aligned) if the address
were printed using a constant field width. Now it looks like this:

[1] 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 xxxxxxxxxxxxxxxx
[17] 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 xxxxxxxxxxxxxxxx
[33] 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 xxxxxxxxxxxxxxxx
[49] 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 xxxxxxxxxxxxxxxx
[65] 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 xxxxxxxxxxxxxxxx
[81] 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 xxxxxxxxxxxxxxxx
[97] 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 xxxxxxxxxxxxxxxx
[113] 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 0a xxxxxxxxxxxxxxx.
[128]

Note the misalignment between first/second and seventh/eighth lines.

Yes, that will be one exercise. But to solve it, you have to know the
size of the file in order to know how many places you need.
And personally I find the one-based addresses a bit odd, but
perhaps that's just me.

Many exercises will be to add options:

1) Option for zero based addresses in hex
2) Option for decimal dump instead of hexadecimal
3) Option for writing the output into a file instead of
stdout.
4) Option to print more/less than 16 characters
5) Option to limit the output to N text positions
6) Option to dump in 16/32 bit format instead of 8.

Thanks for your answer

jacob
 
I

Ike Naar

Le 10/09/11 00:29, Ike Naar a ?crit :
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
int main(int argc,char *argv[])
{
if (argc< 2) {
fprintf(stderr,"Usage: %s<file name>\n",argv[0]);

Undefined behaviour in the (unlikely) case where argc = 0 and argv[0] = NULL.

Yes. But do you know a system where that happens?
Never found one.

I agree that argc = 0 is unlikely. But the standard allows it.
No, lines count the lines and columns the number of chars
read in that line of output.

Are you really really sure?
I get the impression that they both represent the number of columns,
and that line==column is an invariant of the
``while (oneChar != EOF)'' loop.
 
J

jacob navia

Le 10/09/11 00:53, Ike Naar a écrit :
Are you really really sure?
I get the impression that they both represent the number of columns,
and that line==column is an invariant of the
``while (oneChar != EOF)'' loop.

A REAL BUG!

Thanks a lot. Of course, line should be incremented only when
column goes to zero.

Thanks!

jacob
 
M

Morris Keesan

if (argc < 2) {
fprintf(stderr,"Usage: %s <file name>\n",argv[0]);
return EXIT_FAILURE;
}

This misses the usage-error case where argc > 2.
Others have pointed out the possibility that argv[0] == NULL.
And, personally, as a long-time Unix programmer, I prefer to let any
file-processing program work as a filter, so I would have written
something like

FILE *file;
if (argc < 2) {
file = stdin;
} else if (argc > 2) {
fprintf(stderr,"Usage: %s <file name>\n",argv[0]);
return EXIT_FAILURE;
} else {
file = fopen(argv[1], "rb");
etc.
....

fputc(hex[(oneChar >> 4)&0xf],stdout);
fputc(hex[oneChar&0xf],stdout);
Assumes CHAR_BIT == 8. I've used systems where this is not true, where
this code will not show the entire value of each char.
 
I

Ian Collins

fputc(hex[(oneChar>> 4)&0xf],stdout);
fputc(hex[oneChar&0xf],stdout);
Assumes CHAR_BIT == 8. I've used systems where this is not true, where
this code will not show the entire value of each char.

Oh come on, do you really think anyone following a basic tutorial will
be using such a system?
 
J

jacob navia

Le 10/09/11 02:33, Morris Keesan a écrit :
if (argc < 2) {
fprintf(stderr,"Usage: %s <file name>\n",argv[0]);
return EXIT_FAILURE;
}

This misses the usage-error case where argc > 2.

Yes. It should be argc != 2 instead. Never thought of
giving too many arguments to the program, you are right.
fputc(hex[(oneChar >> 4)&0xf],stdout);
fputc(hex[oneChar&0xf],stdout);
Assumes CHAR_BIT == 8. I've used systems where this is not true, where
this code will not show the entire value of each char.

True also. I will mention it in the discussion.
 
N

Nobody

I do not use putchar but fputc to make it easier to add an output
file later as one more argument.

To make it easier still:

FILE *outfp = stdout;

Then use outfp instead of stdout.
char *hex = "0123456789abcdef";

This should be "const". And there's no need for a pointer. IOW:

const char hex[] = "0123456789abcdef";
fprintf(stdout,"[%d] ",address);

Addresses in decimal?
 
J

jacob navia

Le 10/09/11 11:04, Nobody a écrit :
To make it easier still:

FILE *outfp = stdout;

Yes, that can be done.

Then use outfp instead of stdout.
char *hex = "0123456789abcdef";

This should be "const". And there's no need for a pointer. IOW:

const char hex[] = "0123456789abcdef";

Well, a pointer doesn't harm either. That array wouldn't be
zero terminated, that's why I do not use that construct. True,
in this case it doesn't matter that it isn't a string but
newcomers could be confused to use that construct in other
stuations instead of a pointer to a string with very bad
consequences...
fprintf(stdout,"[%d] ",address);

Addresses in decimal?

Yes, and even 1 based!

An exercise asks for changing this to hexadecimal and zero
based index.

Thanks for your input
 
J

jacob navia

Le 10/09/11 11:51, pete a écrit :
jacob said:
Le 10/09/11 11:04, Nobody a écrit :
char *hex = "0123456789abcdef";

This should be "const". And there's no need for a pointer. IOW:

const char hex[] = "0123456789abcdef";

Well, a pointer doesn't harm either. That array wouldn't be
zero terminated, that's why I do not use that construct.

The array *would be* zero terminated.

ISO/IEC 9899:1999 (E)
6.7.8 Initialization

14 An array of character type may be initialized
by a character string literal,
optionally enclosed in braces.
Successive characters of the character string literal
(including the terminating null character if there is room
or if the array is of unknown size)
initialize the elements of the array.

Note that it says: "If there is room"... It could be not zero terminated
and it was like that that I remembered that construct. A newcomer could
write:

char str[4] = "abcd";

He would receive no warnings, not a hint of what is wrong.
 
J

jacob navia

Here is the hexdump with code, commentaries and
exercises. I have incorporated most remarks and
again thanks for your help:

 1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <string.h>
4 #include <ctype.h>
5 int main(int argc,char *argv[])
6 {
7 if (argc < 2) {
8 fprintf(stderr,"Usage: %s <file name>\n",argv[0]);
9 return EXIT_FAILURE;
10 }
11 FILE *file = fopen(argv[1],"rb");
12 if (file == NULL) {
13 fprintf(stderr,"Impossible to open %s for reading\n",argv[1]);
14 return EXIT_FAILURE;
15 }
16 int oneChar = fgetc(file);
17 int column = 0,line = 0;
18 char tab[16+1];
19 const char *hex = "0123456789abcdef";
20 int address = 1;
21
22 while (oneChar != EOF) {
23 if (column == 0) {
24 memset(tab,'.',16);
25 fprintf(stdout,"[%d] ",address);
26 }
27 if (isprint(oneChar)) {
28 tab[column] = oneChar;
29 }
30 fputc(hex[(oneChar >> 4)&4],stdout);
31 fputc(hex[oneChar&4],stdout);
32 fputc(' ',stdout);
33 column++;
34 if (column == 16) {
35 fputc(' ',stdout);
36 tab[column]=0;
37 fputs(tab,stdout);
38 column = 0;
39 line++;
40 if (line == 16) {
41 fputc('\n',stdout);
42 line=0;
43 }
44 fputc('\n',stdout);
45 }
46 oneChar = fgetc(file);
47 address++;
48 }
49 fclose(file);
50 address--;
51 if (column > 0 ) {
52 while (column < 16) {
53 fprintf(stdout," ");
54 tab[column]=' ';
55 column++;
56 }
57 tab[16]=0;
58 fprintf(stdout," %s\n[%d]\n",tab,address);
59 }
60 else fprintf(stdout,"[%d]\n",address);
61 return EXIT_SUCCESS;
62 }

Analysis:
--------
The program should be called (in its present form) like this:
hexdump <file name>
i.e. it needs at least one argument: the name of the file to dump.
We test for this in line 7 and if the name is missing we issue a
warning and exit with a failure value. When printing the warning
we use the value stored in argv[0] as the name of the program.
This is generally the case, most systems will store the name of the
program in argv[0]. It could be however, that a malicious user
calls this program constructing its command line for an "execv" call
and leaves argv[0] empty. In that case our program would crash.

Is that possibility a real one? Should we guard against it?
It is highly unlikely that a user that has already enough
access to the machine to write (and compile) programs would
bother to crash our minuscule hexdump utility.. But anyway
the guard would need a tiny change only. Line 8 would need
to be changed to:

8 if (argv[0]) fprintf(stderr,"...message",argv[0]);

Now we know that we have at least an argument. In line 11 we
try to open the file that we should dump. Note that we use the
binary form of the fopen call "rb" (read binary) to dump
exactly each byte in the file.

If we can't open the file (fopen returns NULL) we print a
warning message into the error output file and return a
failure value.

Now we know we have an open file to dump (line 16) so we
start initializing stuff for the main part of the program.
We read the first character into a variable that will hold
each character in the loop (line 16). We will count columns
and lines, so we initialize the counters to zero (line 17).
We need a table of characters to hold the ASCII equivalences
of each byte (line 18). That table should be a string, so
we dimension it to one character more than the required
length.

We need a table of hexadecimal letters (line 19) that shouldn't
be changed, it is a constant "variable". We tell the compiler
this fact. And then we need to know at what position we are
in the file, so we declare an "address" counter. It is initialized
to one since we have already read one character in line 16.

Now we arrive at our loop. We will read and display characters
until the last character of the file, i.e. until we hit the
end of file condition (line 22).

If we are at the start of a line, i.e. when our "column" variable
is at the start of a line we set the table of ASCII equivalences
to '.' and we put out the position of the file where we are.
We use a one based index for our position so that the first character
is 1. But maybe that is not what the user of an hexdump utility
expects, we can change that to a different address field display,
see the exercises at the end.

We should put the contents of our character into the table if
it is printable. We use the 'isprint' function (line 27) to
determine that, and if true we store the value of our character
into the table.

Then, we output the value of our character. We print in hexadecimal
first the higher 4 bits, then the lower 4 bits. Since 4 bit
numbers can only go from zero to 15, we index directly our "hex" table
with the value of those 4 bits lines 30 and 31.

Note that we mask the bits with the value 0xf, i.e. 15. This means
that we ensure that only the lower 4 bits are used. This is important
to avoid making an index error when we access our "hex" table.
We assume that characters are 8 bits. See exercise 6.

We separate each character with a space (line 32) update our column
counter and test if we have arrived at the end of our dump line.

If that is the case we put an extra blank, finish the table with zero
and print it. We bump our "line" counter, and if we have arrived
at a block of 16 lines, we put an extra empty line (line 41).

We separate lines (line 44) and read the next character (line 46).

When we arrive at the end of file the loop stops, and execution
continues with line 49, where we close the file we have opened.
This is not strictly necessary in this case since when a program
exits all the files it has opened are close automatically in most
systems, but it is better to do it since if we later want to use
our dump routine as a part of a bigger software package we would
leave a file open.

We adjust the address since we have counted the EOF as a character
in line 50.

We are at the end of the file we output the last line, if any. If
the file size is not a multiple of 16, we have already put some
characters: we complete the last line with blanks. If the file
size is exactly a multiple of 16 we just output our address variable
to indicate to the user the exact size of the file.

Exercises:
---------

1: Add an optional argument so that an output file can be specified.
2: You should have noticed that between the 9th and the 10th line
the output is not aligned since 10 has one more character than
9. Fix this. All lines should be aligned.
3: Add an option (call it -hexaddress) to write file addresses in
hexadecimal instead of decimal as shown.
4: Add another option (call it -column:XXX) todisplay more or less
text positions in a line. For instance -column:80 would fix the
display to 80 columns. Adjust the number of characters displayed
accordingly. Note that you should not make the number of characters
less than 4 or greater than 512.
5: Add an option to display 32 bits instead of just 8 at a time.
6: What would happen if you are working in a machine where the
characters are 16 bits wide? What needs to be changed in the
above program?
 
B

Ben Bacarisse

jacob navia said:
Here is the hexdump with code, commentaries and
exercises. I have incorporated most remarks and
again thanks for your help:

I would prefer more horizontal white space. Although this is a matter
of style, I think it has a significant impact on readability. I accept
that you may disagree, but at least you should aim to be consistent in a
tutorial exercise. For example, some assignments have space round the
'=' and some do not. Some operators get spaces round them and others do
not. E.g.:
30 fputc(hex[(oneChar >> 4)&4],stdout);

You've introduced a bug here at some point. Also here:
31 fputc(hex[oneChar&4],stdout);

Two of the three 4s should be 15 (as they used to be).

<snip>
 
J

jacob navia

Le 10/09/11 14:03, Ben Bacarisse a écrit :
jacob navia said:
Here is the hexdump with code, commentaries and
exercises. I have incorporated most remarks and
again thanks for your help:

I would prefer more horizontal white space. Although this is a matter
of style, I think it has a significant impact on readability. I accept
that you may disagree, but at least you should aim to be consistent in a
tutorial exercise. For example, some assignments have space round the
'=' and some do not. Some operators get spaces round them and others do
not. E.g.:
30 fputc(hex[(oneChar>> 4)&4],stdout);

You've introduced a bug here at some point. Also here:
31 fputc(hex[oneChar&4],stdout);

Two of the three 4s should be 15 (as they used to be).

<snip>

Yes, and I corrected it in the code and not in the text
as you can see. Anyway the explanation is (I hope) OK
:)

Thanks Ben.
 
N

Nobody

Note that it says: "If there is room"... It could be not zero terminated

Not as written. To avoid NUL-termination, you would have to write:

const char hex[16] = "0123456789abcdef";

Also, it doesn't matter whether or not it's zero terminated, as you really
are using it as an array of characters, rather than as a string.
 
M

Morris Keesan

fputc(hex[(oneChar>> 4)&0xf],stdout);
fputc(hex[oneChar&0xf],stdout);
Assumes CHAR_BIT == 8. I've used systems where this is not true, where
this code will not show the entire value of each char.

Oh come on, do you really think anyone following a basic tutorial will
be using such a system?

No, but jacob was asking about possible portability problems. That's one.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top