Can I Trust Pointer Arithmetic In Re-Allocated Memory?

B

Bill Reid

Bear with me, as I am not a "professional" programmer, but I was
working on part of program that reads parts of four text files into
a buffer which I re-allocate the size as I read each file. I read some
of the items from the bottom up of the buffer, and some from the
top down, moving the bottom items back to the new re-allocated
bottom on every file read.

Then when I've read all four files, I sort the top and bottom items
separately using qsort(), which takes a pointer to a list of items, and
write the two sorted lists to two new files.

Problem is, I worry that if I just supply a pointer to the first item
in the bottom list to qsort(), it might point out to bozo-land during
the sort because I thought that dynamically re-allocated memory
is not necessarily contiguous. So I've done a little two step where
I write the bottom list to another buffer to do the sorting and writing,
and everything works great, but I'm wondering if I'm wasting time
and worrying about nothing...after all, if I can't trust a pointer to an
arbitrary point in the list, how can I trust a pointer to the start of
the list?

Any light you can shed on how pointers are handled in dynamically
allocated memory would be interesting and helpful...thanks.
 
B

Barry Schwarz

Bear with me, as I am not a "professional" programmer, but I was
working on part of program that reads parts of four text files into
a buffer which I re-allocate the size as I read each file. I read some
of the items from the bottom up of the buffer, and some from the
top down, moving the bottom items back to the new re-allocated
bottom on every file read.

I don't quite follow this description.
Then when I've read all four files, I sort the top and bottom items
separately using qsort(), which takes a pointer to a list of items, and
write the two sorted lists to two new files.

Problem is, I worry that if I just supply a pointer to the first item
in the bottom list to qsort(), it might point out to bozo-land during
the sort because I thought that dynamically re-allocated memory
is not necessarily contiguous. So I've done a little two step where

The block of memory whose non-NULL address is returned from
malloc/realloc/calloc is guaranteed to be contiguous. You memory is
allocated from address to address+size-1. Furthermore, calculating
the value address+size is always allowed but you may not dereference
this address.
I write the bottom list to another buffer to do the sorting and writing,
and everything works great, but I'm wondering if I'm wasting time
and worrying about nothing...after all, if I can't trust a pointer to an
arbitrary point in the list, how can I trust a pointer to the start of
the list?

Any light you can shed on how pointers are handled in dynamically
allocated memory would be interesting and helpful...thanks.

A pointer value between the limits mentioned above is within range of
the allocated memory. You have to insure alignment but if the pointer
has the correct type the compiler will do this for you.


Remove del for email
 
B

Bill Reid

Barry Schwarz said:
I don't quite follow this description.
Yeah, it's a little confusing, and not that relevant to what I'm
asking...the
bottom line is I want to separately sort two parts of a list...
The block of memory whose non-NULL address is returned from
malloc/realloc/calloc is guaranteed to be contiguous.

OK, that's the answer, I was just plain wrong that the memory
might not be contiguous...I've probably only read that guarantee
about 100000000000 times but just forgot it.

I think I got that confused with the idea that the re-allocated
block may have a different location than the original malloc, which
would mean...
You memory is
allocated from address to address+size-1. Furthermore, calculating
the value address+size is always allowed but you may not dereference
this address.
....you wouldn't want to dereference an address, right.
A pointer value between the limits mentioned above is within range of
the allocated memory. You have to insure alignment but if the pointer
has the correct type the compiler will do this for you.
OK, so this should be completely legal and flawless:

/* sort the symbol list alphabetically */
qsort((void *)curr_instrs,num_symbols,128,sort_alpha_list);

then...

/* sort the no-symbol list alphabetically */
qsort((void *)curr_instrs+num_symbols,num_no_symbols,128,sort_alpha_list);

First qsort() sorts down to the end of the symbols part of the list,
the second sorts down from the start of the no-symbols part of the
list to the end of the list. I guess it was the (void *) cast that scared
me...thanks.
 
K

Keith Thompson

Bill Reid said:
Yeah, it's a little confusing, and not that relevant to what I'm
asking...the
bottom line is I want to separately sort two parts of a list...

OK, that's the answer, I was just plain wrong that the memory
might not be contiguous...I've probably only read that guarantee
about 100000000000 times but just forgot it.

I think I got that confused with the idea that the re-allocated
block may have a different location than the original malloc, which
would mean...

One thing that I found a little confusing in your original message is
that you talked about "re-allocated" memory, but you didn't mention
the "realloc" function. The more specific your description, the more
likely it is that we can help.

[...]
OK, so this should be completely legal and flawless:

/* sort the symbol list alphabetically */
qsort((void *)curr_instrs,num_symbols,128,sort_alpha_list);

then...

/* sort the no-symbol list alphabetically */
qsort((void *)curr_instrs+num_symbols,num_no_symbols,128,sort_alpha_list);

Um, no.

Don't be afraid of whitespace. I put blanks around most operator
symbols, and after every comma. If I have to split something across
lines, that's ok. So I'd write your qsort call as:

qsort((void *)curr_instrs + num_symbols,
num_no_symbols,
128,
sort_alpha_list);

The third argument, 128, is a "magic number". It's very difficult to
tell what it means or whether it's even correct. Define a constant:
#define WHATEVER 128
so you only need to change it in one place (but pick a better name, of
course).

The first argument to qsort is:

(void *)curr_instrs + num_symbols

You can't do pointer arithmetic on a void* value. (Some compilers may
allow it; if you're using gcc, try "-ansi -pedantic -Wall -W", or
replace "-ansi" with "-std=c99").

If you're trying to get the address pointed to by curr_instrs plus an
offset of num_symbols bytes, you'll need to to the arithmetic using
char*:

qsort((char*)curr_instrs + num_symbols,
/* other args */);

assuming that curr_instrs isn't already a char*. Note that I didn't
cast the expression to void*; any pointer-to-object type can be
converted to void*, or vice versa.
 
B

Bill Reid

Keith Thompson said:
One thing that I found a little confusing in your original message is
that you talked about "re-allocated" memory, but you didn't mention
the "realloc" function. The more specific your description, the more
likely it is that we can help.
Well, OK, maybe, here's canonical specificity:

/* now re-allocate memory for the instrument strings */
if((curr_instrs=(instr_strs *)
realloc(curr_instrs,num_instrs*sizeof(instr_strs)))==NULL) {
printf("Not enough memory for instruments buffer\n");
goto CloseFiles;
}

Does that help you help me?
[...]
OK, so this should be completely legal and flawless:

/* sort the symbol list alphabetically */
qsort((void *)curr_instrs,num_symbols,128,sort_alpha_list);

then...

/* sort the no-symbol list alphabetically */
qsort((void
*)curr_instrs+num_symbols,num_no_symbols,128,sort_alpha_list);

Um, no.
By "legal and flawless" I DID mean "100% guaranteed functional",
not "pleasing to thine eyes"...
Don't be afraid of whitespace. I put blanks around most operator
symbols, and after every comma. If I have to split something across
lines, that's ok. So I'd write your qsort call as:

qsort((void *)curr_instrs + num_symbols,
num_no_symbols,
128,
sort_alpha_list);
That's the way YOU'D do it, I do it differently, and since I'm the only
one reading it (except in this one rare instance, or occasionally I'll post
some code somewhere on the net), I can read it just fine, and of
course it compiles all the same...
The third argument, 128, is a "magic number". It's very difficult to
tell what it means or whether it's even correct. Define a constant:
#define WHATEVER 128

In qsort(), it's basically 128 (character) bytes.

I've actually got "128" defined globally (and I do mean globally, for
several hundred thousand lines of code) for the purposes of reading
and writing strings of certain lengths. And those damned defines
have managed to screw me up royally several times, including a
really irritating "intermittent" problem I had when I first wrote this
particular section of code. So lately I've been using them less
and less...
so you only need to change it in one place (but pick a better name, of
course).
Even at file scope right now I'm more comfortable with the way it
is...
The first argument to qsort is:

(void *)curr_instrs + num_symbols

You can't do pointer arithmetic on a void* value. (Some compilers may
allow it; if you're using gcc, try "-ansi -pedantic -Wall -W", or
replace "-ansi" with "-std=c99").
Then how does qsort() do it? I'm assuming now that it must just
use pointer arithmetic internally, because it doesn't seem to want or
recognize my typedef of a 128-character string:

typedef char instr_strs[128];
instr_strs *curr_instrs;
If you're trying to get the address pointed to by curr_instrs plus an
offset of num_symbols bytes, you'll need to to the arithmetic using
char*:

qsort((char*)curr_instrs + num_symbols,
/* other args */);

assuming that curr_instrs isn't already a char*.

Nope, a pointer to the first of many 128-character strings, as above, so
are you saying the pointer cast should be (instr_strs *)? I have no problem
with that, as long as it works, and I must stress again at this point that
the current code:

/* sort the symbol list alphabetically */
qsort((void *)curr_instrs,num_symbols,128,sort_alpha_list);

Has worked flawlessly for months now; it's part of a particular section
of code that downloads about 3/4 meg of raw data from the net every
day at a specific time, parses out about 100,000 data items, and writes
them to a custom database in a matter of seconds.

The only reason I asked the original question was because I went
back and reviewed the code and wondered if I could shave a few
more milliseconds off the execution time...

Note that I didn't
cast the expression to void*; any pointer-to-object type can be
converted to void*, or vice versa.
Yeah, I noticed that, I just use (void *) because that's what
I thought qsort() wanted, and it definitely WORKS that way
(I've used qsort() dozens of times EXACTLY that way without
problems).

Now to get back to this:
If you're trying to get the address pointed to by curr_instrs plus an
offset of num_symbols bytes, you'll need to to the arithmetic using
char*:

qsort((char*)curr_instrs + num_symbols,
/* other args */);

I think I see what you're saying, maybe...and maybe not...

If curr_instrs is pointer to a 128-character string type, wouldn't
curr_instrs+num_symbols then point to a location offset from
curr_instrs by (num_symbols*128 bytes)? And if so, what's
the point of cast (char *) if qsort() already works by sorting
some specified number of sequences of some specified
number of character bytes?

I thought I had the answer to my original question, and then it
slipped away from me...
 
K

Keith Thompson

Bill Reid said:
Well, OK, maybe, here's canonical specificity:

/* now re-allocate memory for the instrument strings */
if((curr_instrs=(instr_strs *)
realloc(curr_instrs,num_instrs*sizeof(instr_strs)))==NULL) {
printf("Not enough memory for instruments buffer\n");
goto CloseFiles;
}

Does that help you help me?

A little, but there are still a bunch of identifiers whose
declarations I haven't seen.

I will make one comment: Don't cast the result of malloc() or
realloc(). See section 7 of the comp.lang.c FAQ,
[...]
OK, so this should be completely legal and flawless:

/* sort the symbol list alphabetically */
qsort((void *)curr_instrs,num_symbols,128,sort_alpha_list);

then...

/* sort the no-symbol list alphabetically */
qsort((void
*)curr_instrs+num_symbols,num_no_symbols,128,sort_alpha_list);

Um, no.
By "legal and flawless" I DID mean "100% guaranteed functional",
not "pleasing to thine eyes"...

The code isn't 100% guaranteed functional". You're performing
arithmetic on a void*. That's not allowed in standard C.
That's the way YOU'D do it, I do it differently, and since I'm the only
one reading it (except in this one rare instance, or occasionally I'll post
some code somewhere on the net), I can read it just fine, and of
course it compiles all the same...

Ok, but I find it more difficult to read without the whitespace.
Whenever you post code here, you can expect comments on its style.
You're under no obligation to pay attention.
In qsort(), it's basically 128 (character) bytes.

Ok, but why 128 rather than 127, or 100, or 256? That's a rhetorical
question; you don't need to answer it, but ideally your code should.
(And yes, it's a style issue.)
I've actually got "128" defined globally (and I do mean globally, for
several hundred thousand lines of code) for the purposes of reading
and writing strings of certain lengths. And those damned defines
have managed to screw me up royally several times, including a
really irritating "intermittent" problem I had when I first wrote this
particular section of code. So lately I've been using them less
and less...

Even at file scope right now I'm more comfortable with the way it
is...

Ok, it's your code, but I'm quite surprised that defining symbolic
constants would cause more problems than it would solve.

If someone else needs to maintain your code (and "someone else" could
be you a year from now), it's not going to be obvious that the 128 in
this function corresponds to the 128 (or 127) in another function, but
the 128 in that function over there is just coincidental. There's a
good discussion at said:
The first argument to qsort is:

(void *)curr_instrs + num_symbols

You can't do pointer arithmetic on a void* value. (Some compilers may
allow it; if you're using gcc, try "-ansi -pedantic -Wall -W", or
replace "-ansi" with "-std=c99").
Then how does qsort() do it? I'm assuming now that it must just
use pointer arithmetic internally, because it doesn't seem to want or
recognize my typedef of a 128-character string:

typedef char instr_strs[128];
instr_strs *curr_instrs;

qsort behaves in a manner consistent with its specification. That's
all you really need to know. It needn't even be implemented in C, and
if it is, it's free to use compiler-specific extensions.

But if it's implemented in standard C (which is entirely possible), it
presumably would convert the void* arguments to char* before
performing arithmetic on them. (Since char* and void* have the same
representation, the conversion doesn't cost anything at run time.)
Nope, a pointer to the first of many 128-character strings, as above, so
are you saying the pointer cast should be (instr_strs *)? I have no problem
with that, as long as it works, and I must stress again at this point that
the current code:

/* sort the symbol list alphabetically */
qsort((void *)curr_instrs,num_symbols,128,sort_alpha_list);

Has worked flawlessly for months now; it's part of a particular section
of code that downloads about 3/4 meg of raw data from the net every
day at a specific time, parses out about 100,000 data items, and writes
them to a custom database in a matter of seconds.

That qsort() call isn't the one with the problem.

Incidentally, a piece of code is either correct or not. The number of
times it "works" really doesn't prove anything. If your compiler
accepts some non-portable code, it's probably going to keep working
the same way indefinitely -- but it will fail the first time you
compile it with a different compiler, or with the same compiler and
different options. Correctness is not statistical.

One pitfall of C is that there are a lot of errors that your compiler
isn't required to tell you about. Many things invoke undefined
behavior; they may appear to work, but the language doesn't guarantee
anything. Other things may be compiler-specific extensions. The
language requires a conforming implementation to issue a diagnostic
message for many of these -- but many compilers (including gcc) are
not conforming in their default mode. Typically you can use
command-line options to enable a conforming mode and provide
additional warnings.
The only reason I asked the original question was because I went
back and reviewed the code and wondered if I could shave a few
more milliseconds off the execution time...

Note that I didn't
Yeah, I noticed that, I just use (void *) because that's what
I thought qsort() wanted, and it definitely WORKS that way
(I've used qsort() dozens of times EXACTLY that way without
problems).

Yes, it works, but it's not necessary. As a general rule, casts
should be avoided unless they're actually required. A cast is, among
other things, a promise to the compiler that you know what you're
doing, and will often inhibit warnings and error messages. In this
case, the argument will be implicitly converted to void* without the
cast (assuming you have a visible prototype for qsort() -- i.e., you
haven't forgotten the "#include <stdlib.h>".) The code is perfectly
correct either way, but the form with the cast is more "brittle". If
the cast had specified the wrong type, for example, the compiler
likely wouldn't have told you about the error.
Now to get back to this:


I think I see what you're saying, maybe...and maybe not...

If curr_instrs is pointer to a 128-character string type, wouldn't
curr_instrs+num_symbols then point to a location offset from
curr_instrs by (num_symbols*128 bytes)? And if so, what's
the point of cast (char *) if qsort() already works by sorting
some specified number of sequences of some specified
number of character bytes?

I haven't seen the full context of your code (or if I have, I've
forgotten it). Your original code had

(void*)curr_instrs + num_symbols

which is illegal, because you can't perform pointer arithmetic on
void* (the cast applies to "curr_instrs", not to "curr_instrs +
num_symbols"). Pointer arithmetic, as you probably know, is scaled by
the size of the pointed-to type.

Are you using gcc? If so, it supports arithmetic on void* as an
extension; it acts like arithmetic on char*. (IMHO, this extension is
a bad idea.) By casting curr_instrs to void*, you cause the "+
num_symbols" to denote an offset of num_symbols *bytes*. I had
guessed that that's what you wanted, but apparently it isn't.

I think what you *really* wanted was for the addition to be scaled by
sizeof *curr_instrs (128 bytes?). If so, you probably meant to use

(void*)(curr_instrs + num_symbols)

which should work. But since the argument will be implicitly
converted to void* anyway, all you need is

curr_instrs + num_symbols

In other words, all you need to do is drop the cast. This avoids
depending on a compiler-specific extension *and* corrects a bug. It's
also a very nice demonstration of why unnecessary casts should be
avoided.
 
B

Bill Reid

Keith Thompson said:
A little, but there are still a bunch of identifiers whose
declarations I haven't seen.
Exactly why I didn't want to post any code in the first place,
just wanted to ask a verbal question (see Subject). I'm calling
on all types of custom libraries for this data downloading function,
and the more you see, the more you won't recognize...
I will make one comment: Don't cast the result of malloc() or
realloc(). See section 7 of the comp.lang.c FAQ,
<http://www.c-faq.com/>, particularly questions 7.7b.
OK, I think I've heard some type of debate about this, I thought
(based on the DOCUMENTATION THAT CAME WITH MY
FRIGGIN' DEVELOPMENT PACKAGE) that was what you
were supposed to do; it has seemed to work OK...
[...]

OK, so this should be completely legal and flawless:

/* sort the symbol list alphabetically */
qsort((void *)curr_instrs,num_symbols,128,sort_alpha_list);

then...

/* sort the no-symbol list alphabetically */
qsort((void *)curr_instrs+num_symbols,num_no_symbols,128,sort_alpha_list);

Um, no.
By "legal and flawless" I DID mean "100% guaranteed functional",
not "pleasing to thine eyes"...

The code isn't 100% guaranteed functional". You're performing
arithmetic on a void*. That's not allowed in standard C.
Yes, sort of, I recognized my mistake after I last hit "send", and
way below, you hit on the actual error I made...
Ok, but I find it more difficult to read without the whitespace.
Whenever you post code here, you can expect comments on its style.
You're under no obligation to pay attention.


Ok, but why 128 rather than 127, or 100, or 256? That's a rhetorical
question; you don't need to answer it, but ideally your code should.
(And yes, it's a style issue.)
I have my reasons, the most important of which I would think would
be obvious, and a secondary reason which should also be both apparent
and not really important at the same time...
Ok, it's your code, but I'm quite surprised that defining symbolic
constants would cause more problems than it would solve.
As I've alluded, it partly depends on the scope. What I've found
the hard way is that you should really know EXACTLY what you
need AT THE MOST LOCAL LEVEL OF SCOPE. And I
found I kept forgetting what my global defines were, and I would
use an inappropriate one where I knew EXACTLY what I needed
RIGHT THERE. Among other problems...

In this case, maybe a sizeof() would be better...
If someone else needs to maintain your code (and "someone else" could
be you a year from now), it's not going to be obvious that the 128 in
this function corresponds to the 128 (or 127) in another function, but
the 128 in that function over there is just coincidental. There's a
good discussion at <http://c-faq.com/~scs/cclass/notes/sx9b.html>.
Everything is local to a single 1000-line function for this data downloading
operation; any (many) calls out to my custom libraries don't care about
string lengths because they do a strlen() on the passed string pointer
on entry.
The first argument to qsort is:

(void *)curr_instrs + num_symbols

You can't do pointer arithmetic on a void* value. (Some compilers may
allow it; if you're using gcc, try "-ansi -pedantic -Wall -W", or
replace "-ansi" with "-std=c99").
Then how does qsort() do it? I'm assuming now that it must just
use pointer arithmetic internally, because it doesn't seem to want or
recognize my typedef of a 128-character string:

typedef char instr_strs[128];
instr_strs *curr_instrs;

qsort behaves in a manner consistent with its specification. That's
all you really need to know. It needn't even be implemented in C, and
if it is, it's free to use compiler-specific extensions.

But if it's implemented in standard C (which is entirely possible), it
presumably would convert the void* arguments to char* before
performing arithmetic on them. (Since char* and void* have the same
representation, the conversion doesn't cost anything at run time.)
Yeah, apparently it only processes a char* at a time, and the
declaration of void* just prevents somebody from stupidly passing
the wrong starting point, or something...
That qsort() call isn't the one with the problem.
Exactly. It was the one I was going to add that had a problem...
Incidentally, a piece of code is either correct or not. The number of
times it "works" really doesn't prove anything. If your compiler
accepts some non-portable code, it's probably going to keep working
the same way indefinitely -- but it will fail the first time you
compile it with a different compiler, or with the same compiler and
different options. Correctness is not statistical.
Try telling that to a third-grade teacher grading tests...
One pitfall of C is that there are a lot of errors that your compiler
isn't required to tell you about. Many things invoke undefined
behavior; they may appear to work, but the language doesn't guarantee
anything. Other things may be compiler-specific extensions. The
language requires a conforming implementation to issue a diagnostic
message for many of these -- but many compilers (including gcc) are
not conforming in their default mode. Typically you can use
command-line options to enable a conforming mode and provide
additional warnings.
Most importantly in my case, a compiler is not a mind-reader...
Yes, it works, but it's not necessary.

Are you sure? Somehow, it seems like I tried it without the cast, and
got an error, but if that actually happened, it was years, hell, decades
ago.

Like most people, I'm a victim of experience: I just keep doing what
works...
As a general rule, casts
should be avoided unless they're actually required.

The question here would be: does qsort() require it? Here's the
documentation:

Syntax

#include <stdlib.h>
void qsort(void *base, size_t nelem, size_t width,
int (_USERENTRY *fcmp)(const void *, const void *));

and the example from the documentation:

int sort_function( const void *a, const void *b);
char list[5][4] = { "cat", "car", "cab", "cap", "can" };

int main(void)
{
int x;

qsort((void *)list, 5, sizeof(list[0]), sort_function);
for (x = 0; x < 5; x++)
printf("%s\n", list[x]);
return 0;
}

int sort_function( const void *a, const void *b)
{
return( strcmp((char *)a,(char *)b) );
}

Unlike some example documentation that I can think of, THAT one
actually works as advertised...but doesn't mean that the cast is
required...
A cast is, among
other things, a promise to the compiler that you know what you're
doing, and will often inhibit warnings and error messages. In this
case, the argument will be implicitly converted to void* without the
cast (assuming you have a visible prototype for qsort() -- i.e., you
haven't forgotten the "#include <stdlib.h>".) The code is perfectly
correct either way, but the form with the cast is more "brittle". If
the cast had specified the wrong type, for example, the compiler
likely wouldn't have told you about the error.
Well, OK, I just compiled it without the cast, and it came
up clean. Then I immediately pasted the cast back in place, since
this is "production code", and from a perfectly practical standpoint,
the void* cast is 100% functional IN THIS CASE, so I'm loathe
to mess anything up...
I haven't seen the full context of your code (or if I have, I've
forgotten it). Your original code had

(void*)curr_instrs + num_symbols

which is illegal, because you can't perform pointer arithmetic on
void* (the cast applies to "curr_instrs", not to "curr_instrs +
num_symbols"). Pointer arithmetic, as you probably know, is scaled by
the size of the pointed-to type.
Actually, SEQUENCE POINTS!!!! Yes, I know now this is
wrong...
Are you using gcc? If so, it supports arithmetic on void* as an
extension; it acts like arithmetic on char*. (IMHO, this extension is
a bad idea.) By casting curr_instrs to void*, you cause the "+
num_symbols" to denote an offset of num_symbols *bytes*. I had
guessed that that's what you wanted, but apparently it isn't.
Nope, 128-character strings...
I think what you *really* wanted was for the addition to be scaled by
sizeof *curr_instrs (128 bytes?). If so, you probably meant to use

(void*)(curr_instrs + num_symbols)

which should work.
EXACTLY!

But since the argument will be implicitly
converted to void* anyway, all you need is

curr_instrs + num_symbols

In other words, all you need to do is drop the cast. This avoids
depending on a compiler-specific extension *and* corrects a bug. It's
also a very nice demonstration of why unnecessary casts should be
avoided.
OK, I WILL try that to replace this nonsense, along with the unneeded
malloc and free for the no-symbols list:

/* put bottom of instruments list into no-symbols list */
swap_idx=0;
no_symbol_idx=num_symbols;
while(no_symbol_idx<num_instrs) {
strcpy(curr_no_symbls[swap_idx],
curr_instrs[no_symbol_idx]);
no_symbol_idx++;
swap_idx++;
}

/* sort the no-symbol list alphabetically */
qsort((void *)curr_no_symbls,num_no_symbls,128,sort_alpha_list);

Hopefully everything will go well at 6pm EST when it downloads
the data...
 
K

Keith Thompson

Bill Reid said:
OK, I think I've heard some type of debate about this, I thought
(based on the DOCUMENTATION THAT CAME WITH MY
FRIGGIN' DEVELOPMENT PACKAGE) that was what you
were supposed to do; it has seemed to work OK...

Then the DOCUMENTATION THAT CAME WITH YOUR FRIGGIN' DEVELOPMENT
PACKAGE is advising you to do something that's unnecessary and
potentially dangerous. (Unless it's intended to be called from C++,
which doesn't do implicit conversions to and from void* as freely as C
does, but that's a different language.)

[...]
As I've alluded, it partly depends on the scope. What I've found
the hard way is that you should really know EXACTLY what you
need AT THE MOST LOCAL LEVEL OF SCOPE. And I
found I kept forgetting what my global defines were, and I would
use an inappropriate one where I knew EXACTLY what I needed
RIGHT THERE. Among other problems...

Yes, one problem with macros is that they're not scoped.

If you want an integer constant (within the range of type int), there
is a trick you can use in C to limit it to the scope you want:

enum { WHATEVER = 128 };

It's arguably an abuse of the "enum" feature (you're doing it for the
sake of the constant, and not actualy using the type), but it does
work, and it's not an uncommon idiom.

Or you can use a macro and be careful about how you use it.
In this case, maybe a sizeof() would be better...

Probably so.
If someone else needs to maintain your code (and "someone else" could
be you a year from now), it's not going to be obvious that the 128 in
this function corresponds to the 128 (or 127) in another function, but
the 128 in that function over there is just coincidental. There's a
good discussion at <http://c-faq.com/~scs/cclass/notes/sx9b.html>.
Everything is local to a single 1000-line function for this data downloading
operation; any (many) calls out to my custom libraries don't care about
string lengths because they do a strlen() on the passed string pointer
on entry.
The first argument to qsort is:

(void *)curr_instrs + num_symbols

You can't do pointer arithmetic on a void* value. (Some compilers may
allow it; if you're using gcc, try "-ansi -pedantic -Wall -W", or
replace "-ansi" with "-std=c99").

Then how does qsort() do it? I'm assuming now that it must just
use pointer arithmetic internally, because it doesn't seem to want or
recognize my typedef of a 128-character string:

typedef char instr_strs[128];
instr_strs *curr_instrs;

qsort behaves in a manner consistent with its specification. That's
all you really need to know. It needn't even be implemented in C, and
if it is, it's free to use compiler-specific extensions.

But if it's implemented in standard C (which is entirely possible), it
presumably would convert the void* arguments to char* before
performing arithmetic on them. (Since char* and void* have the same
representation, the conversion doesn't cost anything at run time.)
Yeah, apparently it only processes a char* at a time, and the
declaration of void* just prevents somebody from stupidly passing
the wrong starting point, or something...

void* is a generic pointer type. In fact, it's *the* generic pointer
type (pointer-to-object, actually; you can't portably use it for
pointers to functions). That's why qsort() uses it. (Earlier
versions of qsort(), before the 1989 ANSI standard, probably would
have used char*.)

I'm not sure what you mean by "it only processes a char* at a time".
qsort() works with whatever size of data you tell it to. It likely
uses memcpy() or something similar to copy data around within the
array.

[...]
Try telling that to a third-grade teacher grading tests...

I'm not sure I see the point.

[...]
Are you sure? Somehow, it seems like I tried it without the cast, and
got an error, but if that actually happened, it was years, hell, decades
ago.

Yes, I'm sure.
Like most people, I'm a victim of experience: I just keep doing what
works...
As a general rule, casts
should be avoided unless they're actually required.

The question here would be: does qsort() require it? Here's the
documentation:

Syntax

#include <stdlib.h>
void qsort(void *base, size_t nelem, size_t width,
int (_USERENTRY *fcmp)(const void *, const void *));

and the example from the documentation:

int sort_function( const void *a, const void *b);
char list[5][4] = { "cat", "car", "cab", "cap", "can" };

int main(void)
{
int x;

qsort((void *)list, 5, sizeof(list[0]), sort_function);
for (x = 0; x < 5; x++)
printf("%s\n", list[x]);
return 0;
}

int sort_function( const void *a, const void *b)
{
return( strcmp((char *)a,(char *)b) );
}

Unlike some example documentation that I can think of, THAT one
actually works as advertised...but doesn't mean that the cast is
required...

Yes, it works with a cast. It also works without a cast, and there's
just no reason to use one.

What you quoted above is not *the* documentation for qsort(). You'll
find that in the C standard, and it doesn't say anything about casting
arguments.
Well, OK, I just compiled it without the cast, and it came
up clean. Then I immediately pasted the cast back in place, since
this is "production code", and from a perfectly practical standpoint,
the void* cast is 100% functional IN THIS CASE, so I'm loathe
to mess anything up...

Sure, if it already works, any change you make has a chance of
breaking something. But keep this in mind for any new code you write,
and when tracking down bugs in existing code. And if you're fixing a
piece of code anyway, you might as well remove any unnecessary casts
while you're at it; it will make the code more robust in the long run.

[...]
Actually, SEQUENCE POINTS!!!! Yes, I know now this is
wrong...

No, sequence points aren't involved. It's just a matter of operator
precedence (how an expression is parsed, and which operations apply to
which operands).

[snip]
 
B

Barry Schwarz

snip

...you wouldn't want to dereference an address, right.

It's a very common thing to do. How else do you get the value at that
address? All subscripts involve an implied dereference.



Remove del for email
 
B

Bill Reid

Keith Thompson said:
Then the DOCUMENTATION THAT CAME WITH YOUR FRIGGIN' DEVELOPMENT
PACKAGE is advising you to do something that's unnecessary and
potentially dangerous. (Unless it's intended to be called from C++,
which doesn't do implicit conversions to and from void* as freely as C
does, but that's a different language.)
Mmmmm, well it's actually a C++ package (with a lot of "Object Pascal"
crap laying around apparently just in a vain attempt to create a Microsoft
style monopoly--three guesses who made it), and I do call back and
forth between C and C++ and D----i, so maybe I DO want to keep
the "unneeded" casts...

To its credit, it seems to always issue warnings for any declarations
not in scope, so 7.7b has little to no practical relevance...
Yes, one problem with macros is that they're not scoped.

If you want an integer constant (within the range of type int), there
is a trick you can use in C to limit it to the scope you want:

enum { WHATEVER = 128 };

It's arguably an abuse of the "enum" feature (you're doing it for the
sake of the constant, and not actualy using the type), but it does
work, and it's not an uncommon idiom.
I'm not sure about using that particular trick, but I will say I did a
major overhaul of my code a few years back where I ditched about
80% of my defines and replaced them with enums and have saved
tremendous amounts of wasted effort as a result.
Or you can use a macro and be careful about how you use it.
The real point is always that you always have to be careful and
there is no magic trick that will completely relieve you of the duty
to know what the hell you are doing.
Probably so.
No offense but that is sooooo "old school" and "Mickey Mouse"...it
might have impressed me in 1975 writing a "hello world!" program in
my diappies, but I have much bigger fish to fry these days...I try to use
what tools are available in the best way possible, defines still have a
place in my code and always will, but I'm not kidding when I say
I got sick and tired of dealing with them, with one very important
exception; this is from the top of my c_inclds.h file that is included
in every C program I write:

#ifndef c_incldsH
#define c_incldsH

/* boolean boo-yah */
#define TRUE 1 /* what about negative logic? */
#define FALSE 0 /* not to mention situational ethics... */

After that there are about another 50 defines, including line length
maxs and crap like that, that I'd just as soon flush down the bit-crapper
than ever use again...
Again, I may become a "victim" of the "documentation"...but what're
ya goin' to do? As I've said, if it gets the job done flawlessly after
being
compiled, I don't care what anything does...
Yeah, but I can do everything as "correctly" as possible and will
still have portability issues, so again, what're ya goin' to do?
Correctness is not statistical.

I'm not sure I see the point.
In all walks of life, and in so much of my own work, everything is
"graded". Some things are measurably "better" than others, you know,
like Japanese cars are better than American cars, because, you know,
they actually use this thing called "statistical quality control" and
other disciplines, while Americans don't so much, even though it was
invented here...

I value speed and flawless execution in computer programs, and
have implemented a methodology for some level of portability, modularity,
and maintainability, but those are secondary concerns...
How about if I call it from C++ like you mentioned about malloc()?
I believe I actually do call malloc() in some xxx.cpp files...
Like most people, I'm a victim of experience: I just keep doing what
works...
As a general rule, casts
should be avoided unless they're actually required.

The question here would be: does qsort() require it? Here's the
documentation:

Syntax

#include <stdlib.h>
void qsort(void *base, size_t nelem, size_t width,
int (_USERENTRY *fcmp)(const void *, const void *));

and the example from the documentation:

int sort_function( const void *a, const void *b);
char list[5][4] = { "cat", "car", "cab", "cap", "can" };

int main(void)
{
int x;

qsort((void *)list, 5, sizeof(list[0]), sort_function);
for (x = 0; x < 5; x++)
printf("%s\n", list[x]);
return 0;
}

int sort_function( const void *a, const void *b)
{
return( strcmp((char *)a,(char *)b) );
}

Unlike some example documentation that I can think of, THAT one
actually works as advertised...but doesn't mean that the cast is
required...

Yes, it works with a cast. It also works without a cast, and there's
just no reason to use one.

What you quoted above is not *the* documentation for qsort(). You'll
find that in the C standard, and it doesn't say anything about casting
arguments.
Again, might be the C++ thing, or an urban legend or something...
Sure, if it already works, any change you make has a chance of
breaking something. But keep this in mind for any new code you write,
and when tracking down bugs in existing code. And if you're fixing a
piece of code anyway, you might as well remove any unnecessary casts
while you're at it; it will make the code more robust in the long run.
Unless I call it from C++?
No, sequence points aren't involved. It's just a matter of operator
precedence (how an expression is parsed, and which operations apply to
which operands).
Oh, I thought that was "sequence points", but yeah, what I wrote
wouldn't work right.

Oh, while I've got you here, here's another issue I noticed that I'm
not sure about concerning realloc(). Here's the NON-documentation:

Syntax

#include <stdlib.h>
void *realloc(void *block, size_t size);

....

If block is a NULL pointer, realloc works just like malloc.

....

I read this years ago, and thought "Great, I don't necessarily have to
malloc something first, I can use realloc in a loop and the first pass
through the loop it'll just be like malloc."

Problem is, it didn't seem to work out that way, and I'm not sure
what I did wrong, but I think I tried a number of things, such as
explicitly initializing my memory pointer to NULL, and always got
an error...is it actually possible to use realloc() to act like malloc
with a NULL pointer?
 
K

Keith Thompson

Bill Reid said:
news:[email protected]... [...]
Mmmmm, well it's actually a C++ package (with a lot of "Object Pascal"
crap laying around apparently just in a vain attempt to create a Microsoft
style monopoly--three guesses who made it), and I do call back and
forth between C and C++ and D----i, so maybe I DO want to keep
the "unneeded" casts...

If you have a genuine need to compile the same code as both C and C++,
that's a valid reason to cast the result of the *alloc() functions.

Very very few people have such a genuine need. We can count the ones
we've seen here on the fingers of P.J. Plauger's right hand (and even
that's overkill).

C++ provides mechanisms for interfacing to C code. Unless you're
providing a library to be used with either C or C++ code, you're
probably better off picking a language for each piece of your program
and using the appropriate compiler for it.

[...]
How about if I call it from C++ like you mentioned about malloc()?
I believe I actually do call malloc() in some xxx.cpp files...

Why? C++ has "new" and "delete". But in any case, C++ is a different
language, and comp.lang.c++ down the hall on the left, just past the
water cooler.

[...]
Again, might be the C++ thing, or an urban legend or something...

The Solaris man page has similar wording.

[...]
Oh, while I've got you here, here's another issue I noticed that I'm
not sure about concerning realloc(). Here's the NON-documentation:

Syntax

#include <stdlib.h>
void *realloc(void *block, size_t size);

...

If block is a NULL pointer, realloc works just like malloc.

...

I read this years ago, and thought "Great, I don't necessarily have to
malloc something first, I can use realloc in a loop and the first pass
through the loop it'll just be like malloc."

Yes. If it doesn't work that way, your implementation is broken.
(But that's an unlikely bug, since the behavior is clearly documented
in the standard.)
Problem is, it didn't seem to work out that way, and I'm not sure
what I did wrong, but I think I tried a number of things, such as
explicitly initializing my memory pointer to NULL, and always got
an error...is it actually possible to use realloc() to act like malloc
with a NULL pointer?

Yes. I can't guess why you were unable to get it to work.
 
B

Bill Reid

Keith Thompson said:
Bill Reid said:
news:[email protected]... [...]
Mmmmm, well it's actually a C++ package (with a lot of "Object Pascal"
crap laying around apparently just in a vain attempt to create a Microsoft
style monopoly--three guesses who made it), and I do call back and
forth between C and C++ and D----i, so maybe I DO want to keep
the "unneeded" casts...

If you have a genuine need to compile the same code as both C and C++,
that's a valid reason to cast the result of the *alloc() functions.
Nope, I don't think I ever do that. However, I do occasionally
call malloc in a xxx.cpp file, which is compiled by C++...
Very very few people have such a genuine need.

Yes, hard to imagine what the point of that would be, 'cept maybe
even greater programming confusion than I have!
We can count the ones
we've seen here on the fingers of P.J. Plauger's right hand (and even
that's overkill).

C++ provides mechanisms for interfacing to C code. Unless you're
providing a library to be used with either C or C++ code, you're
probably better off picking a language for each piece of your program
and using the appropriate compiler for it.
The only libraries I provide are for myself, and as you note it is
generally fairly painless to call into C++ object files from C and vice
versa.
Why? C++ has "new" and "delete".

Good question, maybe there was a good reason, maybe not, but
since I'm not looking at that particular code right now, it probably
had to do with keeping certain data structures as similar as possible
when used in C++ as they are when used in C, and something about
"new" just "scared" me...
But in any case, C++ is a different
language, and comp.lang.c++ down the hall on the left, just past the
water cooler.
Well, I didn't bring it up, but my code base is about 50/50...
The Solaris man page has similar wording.
Well, the Solaris man page would be just the old ucb man page,
right? In any event, I am highly displeased with this particular
development
package, and high on my list of specific displeasures is the documentation.
It is in some cases wrong, many cases stupidly written, incomplete,
and just plain difficult to use. So I'm not at all surprised that they
included an unnecessary cast in the example, but at least the
example works, as I said...
Yes. If it doesn't work that way, your implementation is broken.
(But that's an unlikely bug, since the behavior is clearly documented
in the standard.)
I would think it unlikely it is broken, the package is irritatingly bad
in many ways but seems to generally put out clean functioning programs
after fighting the "tools", but who knows. I may have just done something
stupid, wouldn't be the first time...
Yes. I can't guess why you were unable to get it to work.
Maybe I'll try it again. I made the changes to my data downloading
code yesterday, including deleting the "unnecessary casts", ran some tests,
everything worked fine, put it "into production", 6:15pm EST rolled
around and it did its thing apparently flawlessly, only about three
milliseconds quicker...
 
B

Bill Reid

Barry Schwarz said:
It's a very common thing to do. How else do you get the value at that
address? All subscripts involve an implied dereference.
OK, you were talking about dereferencing an address one element
past the end of the block, I thought you were talking about something like
saving the pointer, then trying to use it again after another realloc().
That WOULD be a recipe for diasaster, right?

So I'm not sure what distinction you're trying to make about
subscript "implied" dereferencing. Isn't "address+size" equivalent to
"address[size]"? Again, the only problem in doing anything with a
dereference of that address is that you're one element past the
end of the block...but that might actually work for you if you're Russian...
 
F

Flash Gordon

Bill said:
Barry Schwarz said:
It's a very common thing to do. How else do you get the value at that
address? All subscripts involve an implied dereference.
OK, you were talking about dereferencing an address one element
past the end of the block, I thought you were talking about something like
saving the pointer, then trying to use it again after another realloc().
That WOULD be a recipe for diasaster, right?

So I'm not sure what distinction you're trying to make about
subscript "implied" dereferencing. Isn't "address+size" equivalent to
"address[size]"?

No. "address[size]" and "*(address+size)" are equivalent. So the first
form does a dereference. "address+size" on the other hand does *not* do
a dereference, implied or otherwise.
> Again, the only problem in doing anything with a
dereference of that address is that you're one element past the
end of the block...but that might actually work for you if you're Russian...

Never dereference beyond the end of the block. It is "not allowed" by
the standard, i.e. anything can happen including, unfortunately, what
you happen to expect.
 
B

Bill Reid

Bill Reid said:
I would think it unlikely it is broken, the package is irritatingly bad
in many ways but seems to generally put out clean functioning programs
after fighting the "tools", but who knows. I may have just done something
stupid, wouldn't be the first time...

Maybe I'll try it again.

Oooooh, that was gnarly...

What I forgot was that if I don't malloc() the block first, if I
realloc() in a loop I get a memory access exception. I hate it
when that happens...

Maybe it IS a bug in the compiler, if it wasn't so easy to work
around, I might actually worry about it more. As it is, I did a
search on the compiler maker's web-site for any information
on known bugs, came up with nothing, and left a question on
the discussion forum about it, see if anybody knows anything...
 
B

Barry Schwarz

OK, you were talking about dereferencing an address one element
past the end of the block, I thought you were talking about something like
saving the pointer, then trying to use it again after another realloc().
That WOULD be a recipe for diasaster, right?

The point I was trying to make was: After the successful allocation,
you could dereference any address in the range address to
address+size-1. While it is legal to compute the value address+size
it is not legal to dereference it.

After calling realloc, any address based on the "before" location is
probably invalid. The only time it would be valid is if:

The address returned from realloc was the same as the address
passed to the function in argument 1 and

The offset into the area (address of interest - starting address
of area) <= size argument passed to realloc.
So I'm not sure what distinction you're trying to make about
subscript "implied" dereferencing. Isn't "address+size" equivalent to

You asked why someone would want to dereference an address. I tried
to give an example of why it is a very common thing to do.
"address[size]"? Again, the only problem in doing anything with a

In my discussion, I used the phrase address+size in its non-C
arithmetic meaning. In C, the meaning is equivalent only for pointers
where the sizeof the object pointed to is 1.

In C address[size] is defined to be *(address+size), remembering that
pointer arithmetic includes implied scaling by the sizeof the object
pointed to.
dereference of that address is that you're one element past the
end of the block...but that might actually work for you if you're Russian...

The "problem" is that dereferencing the address invokes undefined
behavior, even before you attempt to do something with the object that
may be retrieved from that address.


Remove del for email
 
H

Herbert Rosenau

What I forgot was that if I don't malloc() the block first, if I
realloc() in a loop I get a memory access exception. I hate it
when that happens...

void *p = NULL; /* we have no memory yet */
void *temp; /* realloc will set it */

size_t size = 0; /* we calculate the size in the loop before we call
realloc */
....
for (....) {
....
if ((temp = realloc(p, size) != NULL) {
/* realloc failed */
return NULL; /* or some other error code */
}
p = temp;
....
}
free(p);

will work always - except your implementation is really broken. But
hten trow your compiler into trash and get another one.
Maybe it IS a bug in the compiler, if it wasn't so easy to work
around, I might actually worry about it more. As it is, I did a
search on the compiler maker's web-site for any information
on known bugs, came up with nothing, and left a question on
the discussion forum about it, see if anybody knows anything...

I would say you have forgotten to initialise the pointer given to
realloc with NULL signalling it that thre is currently nothing to
realloc but malloc.

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 1.2 Deutsch ist da!
 
B

Bill Reid

Herbert Rosenau said:
void *p = NULL; /* we have no memory yet */
void *temp; /* realloc will set it */

size_t size = 0; /* we calculate the size in the loop before we call
realloc */
...
for (....) {
....
if ((temp = realloc(p, size) != NULL) {
/* realloc failed */
return NULL; /* or some other error code */
}
p = temp;
....
}
free(p);

will work always - except your implementation is really broken. But
hten trow your compiler into trash and get another one.


I would say you have forgotten to initialise the pointer given to
realloc with NULL signalling it that thre is currently nothing to
realloc but malloc.
You are correct sir!

I made yet a further fool of myself and posted the question on
the message board for the compiler. Sure enough, I "forgot" that
local pointers are not initialized, therefore are not NULL and
could be anything, so realloc() tries to allocate memory in
god-knows-where.

Of course, I could have sworn that I explicitly set the pointer
to NULL as the FIRST thing I tried to fix the problem when
it first cropped up years ago, but I must have mucked that up
somehow.

As to your code, I'm not sure why you do the two-step
with "*p" and "*temp"...it seems like all is required is to set
*p=NULL when declared, at least that's what I did, and
it worked just fine. Am I (again!) missing something?
 
B

Bill Reid

Barry Schwarz said:
Russian...

The "problem" is that dereferencing the address invokes undefined
behavior, even before you attempt to do something with the object that
may be retrieved from that address.
Oh really. Well, throw that onto the giant pile of stuff I did not
know about C programming...

In any event, I did solve a bunch of "problems" in my code as a result
of asking these stupid questions. My code basically runs exactly as it
did before, would compile just as cleanly on any compiler as before, but
it now no longer has certain "problems"!

Thanks guys!
 
H

Herbert Rosenau

On Wed, 16 Aug 2006 01:01:06 UTC, "Bill Reid"

Anybody blames her/himself as s/he can. :) You will learn that
mostenly when you tries to blame the compiler you blames yourself
instead.
As to your code, I'm not sure why you do the two-step
with "*p" and "*temp"...it seems like all is required is to set
*p=NULL when declared, at least that's what I did, and
it worked just fine. Am I (again!) missing something?

p is the pointer holding the address of the memory block. As realloc
an fail (returning NULL) you needs another pointer to assign the
result of realloc until you knows that realloc returns (new) memory
address.

When realloc fails you needs to either work with the memory (p) you
have already or to cleanup (free(p). Overwriting p with NULL gives you
a memory leak as you lost the address of the memory you have laready
allocated.

Another tip:

You should initialise any variable when defining it to be sure to fail
on that because you've not already assigned a known valid value. So
initialise a 0 (or a value you can easyly identify as invalid to data
and NULL to pointer. Then learn how to use a debugger and set a
breakpoint immediately before the fail occures, analyse the date found
there and then, when anything seems ok step a single step forward and
analyse again until the failture occures.

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 1.2 Deutsch ist da!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top