malloc + 4??

K

Kevin Torr

http://www.yep-mm.com/res/soCrypt.c

I have 2 malloc's in my program, and when I write the contents of them to
the screen or to a file, there aren addition 4 characters.

As far as I can tell, both the code to register the malloc and to write
information into the malloc is solid. Why then ismy program returning an
additional 4 characters?

register malloc 1:
line 192

register malloc 2:
line 214

write to malloc 1:
line 200 - 205

write to malloc 2:
line 221 - 225

display malloc 2:
line 157

write malloc 2:
line 251

Here's how you execute the program:

socrypt.exe /e :i input.txt :eek: output.txt :A keya.txt :B keyb.txt :k
keyout.txt

**note that the input, keya, and keyb files must exist or the program will
return an error code.

If you write a text string into the input.txt file, it will write the same
string into the output.txt file plus an addition 4 characters.

The 1024 char random 'masterkey' is also written out to the keyout.txt file
with an addition 4 characters.

Why is this happening? I'm totally baffled and have spent days trying to
figure this out.
 
J

Joona I Palaste

Kevin Torr said:
I have 2 malloc's in my program, and when I write the contents of them to
the screen or to a file, there aren addition 4 characters.
As far as I can tell, both the code to register the malloc and to write
information into the malloc is solid. Why then ismy program returning an
additional 4 characters?

The C standard says that malloc is allowed to allocate more memory than
requested.
 
C

Chris Torek


In general, it is better to post the actual problematic code
(preferably after shrinking it down to a "problematic nub", as it
were), but a URL reference can work if the one reading netnews
bothers to follow the link. :)
I have 2 malloc's in my program, and when I write the contents of them to
the screen or to a file, there aren addition 4 characters.
As far as I can tell, both the code to register the malloc and to write
information into the malloc is solid. Why then ismy program returning an
additional 4 characters?

It is not quite as solid as one might hope, although the problem
has nothing to do with malloc() per se. Here are excerpts from the
code (quoted with ">" as usual, although I had to insert the markers
myself):
// soCrypt 1.0

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
// #include <md5.h>

OK so far, although //-comments are specific to C99. You have
included necessary headers, so you will not need to cast malloc()'s
return value.
// Global variables

int statCode = 0; // Status code
int mode; // Mode variable (1 = enc, 2 = dec)
int i; // Looper variable
int inSize = 0; // Input filesize
int intRand; // Random int
char tmp_char; // Temporary char

Many of these should not be file-scope external-linkage ("global")
variables, although this is mostly a style issue (at least in a
program this small).

Note that tmp_char has type "char"; on a typical PowerPC, it would
hold values between 0 and 255 inclusive, because there plain "char"
is unsigned. The variable inSize is a plain (signed) int and has
at least the range [-32767..+32767] (although most systems, today,
have an even wider range, about +/- 2 billion).
char *pMasterKey; // Malloc pointer to the master key
char *pInputData; // Malloc pointer to the input data

Skipping forward, we have:
// Reads the input file

int readFile()
{

rewind(inFile);
i = 0;
tmp_char = 'a';
while(tmp_char != EOF)
{
i++;
tmp_char = getc(inFile);
}
inSize = i-1;

This loop tries to count the size of the file by calling getc()
until getc() returns EOF. The problem is that EOF is some sort of
negative number -- typically -1, but perhaps even -2000 or some
such -- and tmp_char is a plain "char". If tmp_char is unable to
hold the value EOF, which will be true if plain char is unsigned
or if EOF is less than CHAR_MIN (e.g., -2000 vs -128 for instance),
the loop will never terminate.

This is why getc() returns a value of type "int" in the first place,
so that it can return all possible "char"s (having first converted
any negative ones to positive values as if via "unsigned char"),
yet also return the special marker value EOF. If you want to store
both "any valid character" *and* EOF, you need something with a
wider range than "any valid character".

Of course, there is really no need for tmp_char at all, nor for
correcting for the off-by-one error produced by counting inside
the loop *before* getting a character. Just change the loop to,
e.g.:

while (getc(inFile) != EOF)
i++;

Combine this with using local variables, and perhaps a "for"
loop to collect up the initialization, test, and increment, we
might get something like:

int readFile() {
int i;

rewind(inFile);
for (i = 0; getc(inFile) != EOF; i++)
continue;
inSize = i;

(although I would also pass the "FILE *" parameter to readFile,
and probably return the allocated memory rather than an "int"
status code).
if ((pInputData = (char *)malloc(inSize * sizeof(char))) == NULL)
{
statCode = 8;
return(statCode);
}

This is OK in and of itself, but there are two important things to
note. First, the cast is not required. It does no harm, but also
does no help. It is a bit like saying "tmp_char = (char)getc(inFile)",
when tmp_char is already a char. The assignment will do the
conversion for you -- and you do not use a cast below, so why use
one above?

Second, and the actual source of the observed problem later, note
that this allocates just enough space to store all the characters
you intend to read from the file.

Consider the C string "hello world". How many characters are in
it? How many characters does it take to *store* it? Why, after:

char hello[] = "hello world";

is there a difference of 1 between "sizeof hello" and "strlen(hello")?

The answer is: because C strings require a '\0' marker after all
their valid "char"s. The array hello[] has size 12, not size 11,
because it stores the 11 "char"s that make up the two words and
the blank, and then one more to store the '\0' marker.

The variable inSize might (for instance) hold 5 if the file contents
are "word\n" (perhaps followed by an EOF marker, if your system
actually uses such markers in files), but if you want to use the
sequence {'w', 'o', 'r', 'd', '\n'} as a C string, you need *six*
bytes: {'w', 'o', 'r', 'd', '\n', '\0'}.

Of course, there is no requirement that you treat the file as
a C string -- that part is up to you. In any case:
rewind(inFile);
i = 0;
tmp_char = 'a';
while (i < inSize)
{
tmp_char = getc(inFile);
*(pInputData + i) = tmp_char;
i++;
}
return 0;
}

There is nothing *wrong* here, but the code can be simplified
enormously. First, tmp_char is never inspected without first
calling getc(), so there is no need to "prime the pump" -- the
loop tests "i < inSize". Second, there is no need for tmp_char
at all; you can just assign the value from getc() directly into
pInputData. Third, you can write pInputData that way,
rather than using the equivalent unary-"*" sequence, and again
perhaps a "for" loop might express the whole sequence better:

rewind(inFile);
for (i = 0; i < inSize; i++)
pInputData = getc(inFile);
return 0;
}

Skipping forward to the source of the observed problem:
// Writes the output file

int writeFile()
{
fputs(pInputData, outFile);
fputs(pMasterKey, keyOut);
return 0;
}

The fputs() function demands a string -- a sequence of "char"s
ending with a '\0' termination marker. pInputData points to the
first of a sequence of "char"s, but not one that has the "stop here
at the \0" mark in it.

Without making any other changes, you can either allocate one extra
byte and put in the '\0', or you can change the method you use to
write the final output. The two simply have to agree as to whether
pInputData (and pMasterKey -- but I did not even look at that code)
is a counted string (length inSize, for pInputData) or a C-style
'\0'-terminated string.

Note that a '\0'-terminated string cannot *contain* a '\0', so if
you want (for whatever reason) to allow embedded '\0' bytes, you
will have to choose the counted-string method. You could write
out a counted string with a loop:

int writeFile() {
int i;

for (i = 0; i < inSize; i++)
putc(pInputData, outFile);
/* optional:
if (fflush(outFile) || ferror(outFile))
... handle output failure ... */
/* and repeat for pMasterKey */
}

or you can use fwrite(), which essentially does the loop for you.
(Note that fwrite() works just fine with ordinary text, and in
fact, fputs() can be implemented internally as:

int fputs(char *s, FILE *stream) {
size_t len = strlen(s);

return fwrite(s, 1, len, stream) == len ? 0 : EOF;
}

Here strlen() looks for, but does not count, the terminating '\0',
then fwrite() loops over all the valid bytes, putc()ing each one
to the stream. The only remaining problem is that the return value
from fwrite() does not match that from fputs(), so it has to be
converted.)
 
K

Kevin Torr

Chris Torek said:

In general, it is better to post the actual problematic code
(preferably after shrinking it down to a "problematic nub", as it
were), but a URL reference can work if the one reading netnews
bothers to follow the link. :)
I have 2 malloc's in my program, and when I write the contents of them to
the screen or to a file, there aren addition 4 characters.
As far as I can tell, both the code to register the malloc and to write
information into the malloc is solid. Why then ismy program returning an
additional 4 characters?

It is not quite as solid as one might hope, although the problem
has nothing to do with malloc() per se. Here are excerpts from the
code (quoted with ">" as usual, although I had to insert the markers
myself):
// soCrypt 1.0

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
// #include <md5.h>

OK so far, although //-comments are specific to C99. You have
included necessary headers, so you will not need to cast malloc()'s
return value.
// Global variables

int statCode = 0; // Status code
int mode; // Mode variable (1 = enc, 2 = dec)
int i; // Looper variable
int inSize = 0; // Input filesize
int intRand; // Random int
char tmp_char; // Temporary char

Many of these should not be file-scope external-linkage ("global")
variables, although this is mostly a style issue (at least in a
program this small).

Note that tmp_char has type "char"; on a typical PowerPC, it would
hold values between 0 and 255 inclusive, because there plain "char"
is unsigned. The variable inSize is a plain (signed) int and has
at least the range [-32767..+32767] (although most systems, today,
have an even wider range, about +/- 2 billion).
char *pMasterKey; // Malloc pointer to the master key
char *pInputData; // Malloc pointer to the input data

Skipping forward, we have:
// Reads the input file

int readFile()
{

rewind(inFile);
i = 0;
tmp_char = 'a';
while(tmp_char != EOF)
{
i++;
tmp_char = getc(inFile);
}
inSize = i-1;

This loop tries to count the size of the file by calling getc()
until getc() returns EOF. The problem is that EOF is some sort of
negative number -- typically -1, but perhaps even -2000 or some
such -- and tmp_char is a plain "char". If tmp_char is unable to
hold the value EOF, which will be true if plain char is unsigned
or if EOF is less than CHAR_MIN (e.g., -2000 vs -128 for instance),
the loop will never terminate.

This is why getc() returns a value of type "int" in the first place,
so that it can return all possible "char"s (having first converted
any negative ones to positive values as if via "unsigned char"),
yet also return the special marker value EOF. If you want to store
both "any valid character" *and* EOF, you need something with a
wider range than "any valid character".

Of course, there is really no need for tmp_char at all, nor for
correcting for the off-by-one error produced by counting inside
the loop *before* getting a character. Just change the loop to,
e.g.:

while (getc(inFile) != EOF)
i++;

Combine this with using local variables, and perhaps a "for"
loop to collect up the initialization, test, and increment, we
might get something like:

int readFile() {
int i;

rewind(inFile);
for (i = 0; getc(inFile) != EOF; i++)
continue;
inSize = i;

(although I would also pass the "FILE *" parameter to readFile,
and probably return the allocated memory rather than an "int"
status code).
if ((pInputData = (char *)malloc(inSize * sizeof(char))) == NULL)
{
statCode = 8;
return(statCode);
}

This is OK in and of itself, but there are two important things to
note. First, the cast is not required. It does no harm, but also
does no help. It is a bit like saying "tmp_char = (char)getc(inFile)",
when tmp_char is already a char. The assignment will do the
conversion for you -- and you do not use a cast below, so why use
one above?

Second, and the actual source of the observed problem later, note
that this allocates just enough space to store all the characters
you intend to read from the file.

Consider the C string "hello world". How many characters are in
it? How many characters does it take to *store* it? Why, after:

char hello[] = "hello world";

is there a difference of 1 between "sizeof hello" and "strlen(hello")?

The answer is: because C strings require a '\0' marker after all
their valid "char"s. The array hello[] has size 12, not size 11,
because it stores the 11 "char"s that make up the two words and
the blank, and then one more to store the '\0' marker.

The variable inSize might (for instance) hold 5 if the file contents
are "word\n" (perhaps followed by an EOF marker, if your system
actually uses such markers in files), but if you want to use the
sequence {'w', 'o', 'r', 'd', '\n'} as a C string, you need *six*
bytes: {'w', 'o', 'r', 'd', '\n', '\0'}.

Of course, there is no requirement that you treat the file as
a C string -- that part is up to you. In any case:
rewind(inFile);
i = 0;
tmp_char = 'a';
while (i < inSize)
{
tmp_char = getc(inFile);
*(pInputData + i) = tmp_char;
i++;
}
return 0;
}

There is nothing *wrong* here, but the code can be simplified
enormously. First, tmp_char is never inspected without first
calling getc(), so there is no need to "prime the pump" -- the
loop tests "i < inSize". Second, there is no need for tmp_char
at all; you can just assign the value from getc() directly into
pInputData. Third, you can write pInputData that way,
rather than using the equivalent unary-"*" sequence, and again
perhaps a "for" loop might express the whole sequence better:

rewind(inFile);
for (i = 0; i < inSize; i++)
pInputData = getc(inFile);
return 0;
}

Skipping forward to the source of the observed problem:
// Writes the output file

int writeFile()
{
fputs(pInputData, outFile);
fputs(pMasterKey, keyOut);
return 0;
}

The fputs() function demands a string -- a sequence of "char"s
ending with a '\0' termination marker. pInputData points to the
first of a sequence of "char"s, but not one that has the "stop here
at the \0" mark in it.

Without making any other changes, you can either allocate one extra
byte and put in the '\0', or you can change the method you use to
write the final output. The two simply have to agree as to whether
pInputData (and pMasterKey -- but I did not even look at that code)
is a counted string (length inSize, for pInputData) or a C-style
'\0'-terminated string.

Note that a '\0'-terminated string cannot *contain* a '\0', so if
you want (for whatever reason) to allow embedded '\0' bytes, you
will have to choose the counted-string method. You could write
out a counted string with a loop:

int writeFile() {
int i;

for (i = 0; i < inSize; i++)
putc(pInputData, outFile);
/* optional:
if (fflush(outFile) || ferror(outFile))
... handle output failure ... */
/* and repeat for pMasterKey */
}

or you can use fwrite(), which essentially does the loop for you.
(Note that fwrite() works just fine with ordinary text, and in
fact, fputs() can be implemented internally as:

int fputs(char *s, FILE *stream) {
size_t len = strlen(s);

return fwrite(s, 1, len, stream) == len ? 0 : EOF;
}

Here strlen() looks for, but does not count, the terminating '\0',
then fwrite() loops over all the valid bytes, putc()ing each one
to the stream. The only remaining problem is that the return value
from fwrite() does not match that from fputs(), so it has to be
converted.)


Wow, thanks for all that. I will have to get my head around it all.

So when do I need not cast mallocs? when I include a header? which header?
Or are you saying that I don't need to cast a malloc if I've already defined
the pointer as a data type?
What would be a possible bad thing if I did cast a malloc when I didn't need
to? Is there something that could go wrong or is it just redundant?
 
J

John Tsiombikas (Nuclear / the Lab)

Kevin said:
Wow, thanks for all that. I will have to get my head around it all.

So when do I need not cast mallocs?
Never.

when I include a header? which header?

You have to include stdlib.h when you use malloc (or provide the
prototype of malloc() yourself, but i can't imagine why you would prefer
that)
 
B

Ben Pfaff

John Tsiombikas (Nuclear / the Lab) said:

Too many negatives for me. Simply stated, the return value of
malloc() rarely needs to be cast.
You have to include stdlib.h when you use malloc (or provide the
prototype of malloc() yourself, but i can't imagine why you would
prefer that)

Providing a prototype of malloc() yourself is arguably not valid
practice based on this sentence from the standard, section 7.1.4:

2 Provided that a library function can be declared without
reference to any type defined in a header, it is also
permissible to declare the function and use it without
including its associated header.

You can certainly declare malloc() without a type defined in a
header, but giving a prototype requires using size_t. It's
better just to use the header.
 
J

John Tsiombikas (Nuclear / the Lab)

Martin said:
Actually he *always* need not cast mallocs.
What he need never do is cast mallocs.

Hm, it seems I missed the *not* in there, for some obscure reason I
thought that he asked "So when do I need to cast mallocs?". :)))
 
J

John Tsiombikas (Nuclear / the Lab)

Ben said:
Providing a prototype of malloc() yourself is arguably not valid
practice based on this sentence from the standard, section 7.1.4:

2 Provided that a library function can be declared without
reference to any type defined in a header, it is also
permissible to declare the function and use it without
including its associated header.

You can certainly declare malloc() without a type defined in a
header, but giving a prototype requires using size_t. It's
better just to use the header.

So are you saying that you can't do the following?

#include <stddef.h> /* for size_t */
void *malloc(size_t);

this is providing a prototype for malloc, without including stdlib.h and
yes ofcourse it's better to just use the header, I just added that
comment for completeness, if I may quote myself :)
"or provide the prototype of malloc() yourself, but i can't imagine why
you would prefer to do that"
 
B

Ben Pfaff

John Tsiombikas (Nuclear / the Lab) said:
So are you saying that you can't do the following?

#include <stddef.h> /* for size_t */
void *malloc(size_t);

this is providing a prototype for malloc, without including stdlib.h

You got size_t from a header, which seems to fall afoul of the
spirit of the provision above. Whether it is actually undefined
behavior would be better judged in comp.std.c. But it is better
in any case to simply include <stdlib.h>.
 
K

Keith Thompson

Kevin Torr said:
news:[email protected]... [...]
Wow, thanks for all that. I will have to get my head around it all.

So when do I need not cast mallocs? when I include a header? which header?
Or are you saying that I don't need to cast a malloc if I've already defined
the pointer as a data type?
What would be a possible bad thing if I did cast a malloc when I didn't need
to? Is there something that could go wrong or is it just redundant?

Speaking of redundant, you just re-posted over 230 lines of Chris
Torek's text. Please limit quoted material to what's relevant to your
reply.

To answer your question, if you're going to use malloc() you should
always include <stdlib.h>, and you should never cast the result.

Casting the result masks possible errors. It can eliminate the
warning message you'll get if you've forgotten to include <stdlib.h>,
but only because you're lying to the compiler. It's like eliminating
a warning light on your dashboard by unscrewing the light bulb rather
than by fixing the problem it's trying to tell you about.
 
D

Dan Pop

In said:
http://www.yep-mm.com/res/soCrypt.c

I have 2 malloc's in my program, and when I write the contents of them to
the screen or to a file, there aren addition 4 characters.

As far as I can tell, both the code to register the malloc and to write
information into the malloc is solid. Why then ismy program returning an
additional 4 characters?

Write a *minimal* program reproducing your problem, post it here and
you may get some sensible answers.

Dan
 
D

Dan Pop

In said:
You got size_t from a header, which seems to fall afoul of the
spirit of the provision above. Whether it is actually undefined
behavior would be better judged in comp.std.c.

Why is undefined behaviour an option here?

<stddef.h> is the right header to include when you need a definition of
size_t without much additional baggage and his declaration of malloc is
identical to the one provided by the standard itself.

Dan
 
C

CBFalconer

Mark said:
Only, however, if you live in the US. (Or France).

Works everywhere, on machines with an 11 bit byte and
sizeof(size_t) == 4. :) That even allows for 2 padding bits.
 
R

Richard Bos

CBFalconer said:
Works everywhere, on machines with an 11 bit byte and
sizeof(size_t) == 4. :) That even allows for 2 padding bits.

But in civilised countries, which are not actively trying to make their
national deficit look larger, that won't fit two billion.

Richard
 
D

Dan Pop

In said:
Only, however, if you live in the US. (Or France).

In French, this usage of billion is considered an archaism. The current
meaning is 1e12, like in the rest of Europe.

Dan
 
J

Joona I Palaste

In French, this usage of billion is considered an archaism. The current
meaning is 1e12, like in the rest of Europe.

Unfortunately, in international discussion, the trend seems to be favour
American forms if at least one American is present. A safe rule of thumb
is "except for political issues, international means American".
 
C

CBFalconer

Richard said:
But in civilised countries, which are not actively trying to make
their national deficit look larger, that won't fit two billion.

Last time I counted that high 42 bits could describe +- 2e12,
which seems capable of describing most varieties of billion known
to me.
 
J

Joe Wright

Joona said:
Unfortunately, in international discussion, the trend seems to be favour
American forms if at least one American is present. A safe rule of thumb
is "except for political issues, international means American".
It's just that milliard doesn't seem to roll off the tongue that well. I
don't believe I've heard it in conversation.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,158
Latest member
Vinay_Kumar Nevatia
Top