resizing an array, is there a better way?

M

Malcolm

Arthur J. O'Dwyer said:
But your original complaint was about 'sizeof *small' versus
'sizeof(int)'; you preferred the latter, whereas it's the former
that is actually composed of English words. (It is also more
useful, since it describes the algorithm on a higher level --- the
code is allocating space for some number of 'small' elements.
'int' has nothing to do with it, except for the coincidence that
'small' happens to be a pointer to 'int'.)

In short: Your position is completely bogus. C is not English;
yet even if we pretend that it /is/, 'sizeof *small' is /still/
more descriptive!
If it is a choice beween sizeof x and sizeof( int ) then I agree there is
not much to choose between them.
However "sizeof *small" means that we dereference small (but not really) and
take the size of the object obtained. That's not a very intuitive concept.

Now sure, but itself use of the construct won't make a well-written program
unreadable. However the cumulative effect of using many such constructs will
render the code hard to read. Similarly a French mot juste doesn't make
English too hard to read, but larding and English sentence with foriegn
words does eventually make it unreadable.
 
F

Flash Gordon

If it is a choice beween sizeof x and sizeof( int ) then I agree there
is not much to choose between them.
However "sizeof *small" means that we dereference small (but not
really) and take the size of the object obtained. That's not a very
intuitive concept.

I would disagree. I read *small as meaning the object small points to.
Therefor sizeof *small reads, to me, as the size of the object small
points to. If you use sizeof( int ) then I have to check that small is a
pointer to int because I've seen code where this type of thing has been
wrong.
Now sure, but itself use of the construct won't make a well-written
program unreadable. However the cumulative effect of using many such
constructs will render the code hard to read. Similarly a French mot
juste doesn't make English too hard to read, but larding and English
sentence with foriegn words does eventually make it unreadable.

Again, I would disagree. Using English words in amongst French makes it
harder for a French speaker to understand, even if they also understand
English. Similarly, using non idiomatic C makes it harder for a C
programmer to read the program, even if the C programmer knows the
language from which the constructs are derived. I know this because I
have to work with code which is BASIC written in C and, despite being
both a Basic programmer and a C programmer I still find it harder to
read.
 
M

Michael Wojcik

In practise realloc() will usually be obliged to perform a copy.

That's a dubious assumption. There are allocation schemes which
over-allocate areas (typically rounding them up to a power of two);
realloc under such a scheme might well be able to simply extend the
"in use" portion of the block in many cases.

In some implementations, allocated blocks might be scattered through
a large virtual memory space such that they can be trivially extended.

In short, you *don't know* what realloc does.
Big O
analysis and underlying efficiency won't change much, at most you might get
a few percentage points improvement because realloc() doesn't always call
memcpy(), and if memcpy() is slightly faster than native C.

The memcpy implementations I've seen have in fact been much faster
than naive copy loops. They make use of copy operations provided
by the CPU; they unroll loops; they pre-warm cache lines; and so
forth.
 
K

Keith Thompson

That's a dubious assumption. There are allocation schemes which
over-allocate areas (typically rounding them up to a power of two);
realloc under such a scheme might well be able to simply extend the
"in use" portion of the block in many cases.

In some implementations, allocated blocks might be scattered through
a large virtual memory space such that they can be trivially extended.

In short, you *don't know* what realloc does.

I just did an experiment. I wrote a small program that initially sets
a pointer to the result of malloc(1), then repeatedly sets to the
result of realloc with successively larger values (1, 2, 3, ...).

When I iterate from 1 to 1000000, the pointer value only changes a few
times for small values. On one implementation, the value changes only
when realloc() is called with a size of 13; on another, it changes at
9, 17, 25, and 33. For all other sizes, all the way up to a megabyte,
it just extends the existing region of memory; presumably no copying
is done.

If I add a call to malloc(1) before each realloc() (with no
corresponding free()), the pointer value changes much more often on
one implementation, but not on the other. Probably the small
allocation is (at least sometimes) done just after the existing block,
preventing it from being expanded; on the other implementation, it
looks like the small allocation happens to be done just *before* the
existing block, allowing it to expand.

All this is, of course, extremely system-specific. Michael is quite
correct: you *don't know* what realloc does.

Here's the program. One caveat: the "ptr != prev" comparison probably
invokes undefined behavior when realloc() moves the memory block and
ptr becomes invalid.

It takes a command-line argument to specify the number of iterations,
defaulting to 1024. The "-d" option tells it to do the dummy malloc()
call. There's little or no error checking.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char **argv)
{
void *ptr = malloc(1);
void *prev = ptr;
size_t bytes;
size_t max = 1024;
void *dummy;
int malloc_dummy = 0;
int i;

for (i = 1; i < argc; i++) {
if (strcmp(argv, "-d") == 0) {
malloc_dummy = 1;
}
else if (strcmp(argv, "-h") == 0) {
printf("Usage: %s [-h] [-d] [count\n");
exit(EXIT_FAILURE);
}
else {
max = atol(argv);
}
}
printf("Iterating from 1 to %ld\n", (long)max);

printf("malloc(1) --> [%p]\n", ptr);

for (bytes = 1; bytes <= max; bytes ++) {
if (malloc_dummy) {
dummy = malloc(1);
}
ptr = realloc(ptr, bytes);
if (ptr != prev) {
printf("realloc(%ld) --> [%p]\n", (long)bytes, ptr);
prev = ptr;
}
}
return 0;
}
 
J

Jens.Toerring

I just did an experiment. I wrote a small program that initially sets
a pointer to the result of malloc(1), then repeatedly sets to the
result of realloc with successively larger values (1, 2, 3, ...).
When I iterate from 1 to 1000000, the pointer value only changes a few
times for small values. On one implementation, the value changes only
when realloc() is called with a size of 13; on another, it changes at
9, 17, 25, and 33. For all other sizes, all the way up to a megabyte,
it just extends the existing region of memory; presumably no copying
is done.
If I add a call to malloc(1) before each realloc() (with no
corresponding free()), the pointer value changes much more often on
one implementation, but not on the other. Probably the small
allocation is (at least sometimes) done just after the existing block,
preventing it from being expanded; on the other implementation, it
looks like the small allocation happens to be done just *before* the
existing block, allowing it to expand.
All this is, of course, extremely system-specific. Michael is quite
correct: you *don't know* what realloc does.

System-specific or not, would you be prepared to tell which implemen-
tations you were testing that with?

Using your program on gcc (on Linux) exhibits the interesting property
that as the number of bytes you realloc() increases the less often it
needs to move the block. And when it gets slightly above 128 kB it does
not move the block anymore - that's probably due to, as I learned just
recently, its switching the method for memory allocation above 128 kB
(from sbrk() to mmap()).
Here's the program. One caveat: the "ptr != prev" comparison probably
invokes undefined behavior when realloc() moves the memory block and
ptr becomes invalid.

As far as I understand the constraints, comparing pointers for equality
or inequality is absolutely ok, only trying to establish a relation like
that one is larger than the other gets you into muddy waters.

Regards, Jens
 
K

Keith Thompson

System-specific or not, would you be prepared to tell which implemen-
tations you were testing that with?

Cygwin under Windows XP and Solaris 8. On Cygwin, the behavior was
unaffected by the extra malloc() call; on Solaris, it caused the
realloc() to move the block more often.

Actually, I should have stated it more strongly. It's not just
system-specific; it can depend strongly on the current state of the
heap (assuming there is such a thing), which can depend on what else
the program has been doing, and quite possibly on the phase of the
moon.

[...]
As far as I understand the constraints, comparing pointers for equality
or inequality is absolutely ok, only trying to establish a relation like
that one is larger than the other gets you into muddy waters.

If the realloc() call moves the block of memory, the old pointer value
becomes invalid, just like a pointer that's been free()d.. Any
attempt to refer to that value will invoke undefined behavior. In
nearly all real-world implementations, an equality comparison will
just compare the bits, and it will work as expected -- but an
implementation could, for example, load the value into an address
register for the comparison, and this could trigger a validity check.
 
C

CBFalconer

.... snip ...


System-specific or not, would you be prepared to tell which
implementations you were testing that with?

Using your program on gcc (on Linux) exhibits the interesting
property that as the number of bytes you realloc() increases the
less often it needs to move the block. And when it gets slightly
above 128 kB it does not move the block anymore - that's probably
due to, as I learned just recently, its switching the method for
memory allocation above 128 kB (from sbrk() to mmap()).


As far as I understand the constraints, comparing pointers for
equality or inequality is absolutely ok, only trying to
establish a relation like that one is larger than the other
gets you into muddy waters.

I suggest you might have more interesting results making a
constant size allocation between each realloc. If you tie down
the malloc system (more later) you will be testing whatever basic
method the system uses to provide memory, such as sbrk().

On systems with the usual linear addressing and where comparisons
between pointers can be made, and relying on sbrk() for the
fundamental memory supply, you could use my nmalloc system. It
is, apart from the above, written in standard C except that the
GNU flavor of variadic macros are used. Without the debugger
energized, the macros all do nothing, but the problem is that
non-gcc compilers don't realize that :-(

That system goes to lengths to avoid moving data on a realloc, and
under the conditions Keith describes would end up simply extending
the sbrk secured block after a very short time. The early few
movements are due to using small blocks left free after
initialization, and would depend on that initialization code. The
malldbg module in nmalloc will let you watch this, without using
the variadic macro debuggery mentioned above. There is a manual
for the malldbg calls in info format, together with an example
usage and test program. I believe these calls to be compatible
with most Unix systems.

See <http://cbfalconer.home.att.net/download/nmalloc.zip>

--
"I'm a war president. I make decisions here in the Oval Office
in foreign policy matters with war on my mind." - Bush.
"Churchill and Bush can both be considered wartime leaders, just
as Secretariat and Mr Ed were both horses." - James Rhodes.
"If I knew then what I know today, I would still have invaded
Iraq. It was the right decision" - G.W. Bush, 2004-08-02
 
K

Keith Thompson

CBFalconer said:
I suggest you might have more interesting results making a
constant size allocation between each realloc. If you tie down
the malloc system (more later) you will be testing whatever basic
method the system uses to provide memory, such as sbrk().

Yes, my program has an option to do just that. (On Solaris, it caused
numerous reallocations; on Cygwin, it didn't.)
 
S

someone else

thanks Malcolm

I read most of the post (the ones that where still on my server...).
I noticed that someone was complaining about me using sizeof(small) rather
then sizeof(mystruct).
in my opinion, creating readable code is crucial, since it reduces the time
and cost of maintaining or updating the code, unfortunately there's no
common rule of how to get your code to be readable, so I use my personal
judgement, it all depends on the specific code. however, I think that the
code I wrote was fairly readable.


Malcolm said:
someone else said:
i'm looking for good ways to resize a structure. for example:


typedef struct mystruct {
...
}

mystruct small[1024];
...

/*need to resize */

mystruct big[2048];
for(int i=0;i<1024;i++)
big = small;
...

is there a more efficiant way? this could be very ineficiant, since it
creates a new array, and then copies all the values from the old array, is
there a way to keep the old array, with its variables, and just add new
elements to it?

The only thing you can do is

mystruct big[2048];
int current_size;

Set current_size to the number of elements that are actual entries.

You might also want to look at the functions malloc() and realloc(). Most C
programmers would achieve the above by.

mystruct *small = malloc( sizeof(mystruct) * 1024);

to resize

mystruct *big = realloc( small, sizeof(mystruct) * 2048);

However this is not really much improvement, because internally realloc()
usually allocates a new area of memory and then copies.

If you absolutely need an array which can be extended quickly, your only
option is to use a more complicated structure. Something like

struct extensiblearray
{
int N chunks;
int chunksize;
mystruct **chunklist;
}:

The you have functions readextensible(), writeextensible() and
resizeextensible() which manipulate it transparently. This is a lot of fuss,
and time you save on extension will to some extent be paid for by the acces
overhead, but it will mean that you can change the size of the array without
moving all the elements in memory.

I tried working out how to do this, but I had a hard time. lack of practice
with pointer to pointer I guess.
could you please show an example? I don't care if it is a working example or
not, as long as it demonstrates how to do it (and is readable :).

thanks in advance
someone else
 
M

Malcolm

someone else said:
thanks Malcolm


I tried working out how to do this, but I had a hard time. lack of practice
with pointer to pointer I guess.
could you please show an example? I don't care if it is a working example or
not, as long as it demonstrates how to do it (and is readable :).

mystruct readextensible( struct extensiblearray *ea, int i)
{
int index;
int offset;

assert(i < ea->chunksize * ea->Nchunks);

index = i / ea->chunksize;
offset = i % ea->chunksize;

return ea->chunklist[index][offset];
}

void mystruct writextensible(mystruct val, int i)
{
int index;
int offset;

index = i / ea->chunsize;
offset = i % ea->chunksize;

if(i < ea->chunksize * ea->Nchunks)
{
ea->chunklist[index][offset] = val;
}
else
{
int j;
int newsize = i / ea-.chunksize + 1;

ea->chunklist = realloc(ea->chunklist, newsize * sizeof(mystruct *));
if(!ea-<chunklist)
{
/* out of memory */
}
for(j=ea->Nchunks; j<newsize;j++)
{
ea->chunklist = malloc(ea->Nchunks * sizeof(mystruct));
if(!ea->chunklist)
{
/* out of memory */
}
ea->chunklist[index][offset] = val;
}
}
}


This one just extends the array as you try to write out of bounds. it should
be easy to alter so that you have to extend explicitly.

To initialise, choose a value for chunksize which is reasonable and set
chunklist and Nchunks to 0. Alternatively pass in the chunksize and initial
size and do the allocations in your setup code.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top