Container library (continued)

jacob navia · Mar 16, 2010

Data allocation strategies for containers can vary wildly, depending on
the specific container and on the application. Environment
considerations play also a major role: if there is enough RAM many
things can be handled QUITE differently than when there isn't.

It is impossible to find a strategy that can be always good in all
situations, so naturally, we need an object with (roughly) 3 function
pointers:

(1) MALLOC
(2) FREE
(3) REALLOC

This table of functions will be used by a container for
allocating/releasing memory. It will default to the standard C
functions, but can be changed so that a GC is used, for instance. In
that case we would have

MALLOC --> GC_malloc
FREE --> no operation
REALLOC --> GC_realloc.

Now, should this object be a global part of the library, i.e. a single
allocation object for the whole library, or should each container class
(hash tables, lists, dictionaries) have one, or should each individual
container have one?

If we have a single global object, changing the behavior of our
containers is very easy, we have a single object to change and we are
done. Obviously, this is VERY global and forces the user to have always
the same allocation strategy for all containers.

If we have a per class of container design, we have more flexibility, we
could use the GC for hash tables but not for lists, etc. The price to
pay is increased difficulty to change the behavior of all objects... To
change from the default object to the GC, for instance, we would have to
go through all container classes. True, the library could provide a
function to do that, but if we add a container we would have to modify
that function again and again.

If we have an allocation object per individual container we have the
maximum flexibility but changing the allocation strategy becomes QUITE
complicated.

Personally, I think that is easier to understand the first option:
having a single object that allocates/frees memory. It is rare that we
would want to use a GC, say, and at the same time malloc/free at the
same time.

Obviously I am not sure, hence this message. What do you think?

jacob

Seebs · Mar 16, 2010

Now, should this object be a global part of the library, i.e. a single
allocation object for the whole library, or should each container class
(hash tables, lists, dictionaries) have one, or should each individual
container have one?
Hmm.

Personally, I think that is easier to understand the first option:
having a single object that allocates/frees memory. It is rare that we
would want to use a GC, say, and at the same time malloc/free at the
same time.

Obviously I am not sure, hence this message. What do you think?

I would say probably per-container. It sounds like the individual
containers are going to end up performing roughly the role that a
less-library-able container implementation would assign to an individual
struct type. I am not sure, though.

See, what I would normally do is expect that there'd be an object type,
and those objects would have their own alloc/free routines. But the
container itself needs allocation too. Depending, it might make sense
to have a single global "container objects allocator", but then each
container can be given the alloc/free routines for the specific things
it contains.

So, if I'm doing a container of foos and a container of bars, both of
them use container_alloc() for allocating internal data structures,
but one uses foo_alloc() to allocate space for contained things, and
the other uses bar_alloc(). And in all cases, the default is to just
use malloc.

-s

Andrew Poelstra · Mar 17, 2010

Data allocation strategies for containers can vary wildly, depending on
the specific container and on the application. Environment
considerations play also a major role: if there is enough RAM many
things can be handled QUITE differently than when there isn't.

It is impossible to find a strategy that can be always good in all
situations, so naturally, we need an object with (roughly) 3 function
pointers:

(1) MALLOC
(2) FREE
(3) REALLOC

This table of functions will be used by a container for
allocating/releasing memory. It will default to the standard C
functions, but can be changed so that a GC is used, for instance. In
that case we would have

MALLOC --> GC_malloc
FREE --> no operation
REALLOC --> GC_realloc.

Now, should this object be a global part of the library, i.e. a single
allocation object for the whole library, or should each container class
(hash tables, lists, dictionaries) have one, or should each individual
container have one?

I like the third option.

I started out defending the second, but when I thought about common use
cases, it makes more sense to select an allocator when allocating an
object; otherwise, if I needed to find out what allocator was in use, I'd
have to trace the code backwards to wherever that was set.

The reason I don't like the first option is that is restricts extensibility.
If I used your libraries, but decided I didn't like your hash table (IME
hash tables have the most to gain by /not/ being generalized) I would be in
the position of handing one allocator to my hash table, and one allocator
to all N of your classes.

To me that feels yucky, to have some classes sharing allocators and some
classes needing their own.

ng2010 · Mar 17, 2010

So sad to be dying and seeing youth tackling problems already solved in
my own youth.

jacob navia · Mar 17, 2010

Andrew Poelstra a écrit :

The reason I don't like the first option is that is restricts extensibility.
If I used your libraries, but decided I didn't like your hash table (IME
hash tables have the most to gain by /not/ being generalized) I would be in
the position of handing one allocator to my hash table, and one allocator
to all N of your classes.

Mmm this is a good point. I did not think about that.

Obviously, the solution is (if we retain a single global object) that each class contains a pointer
to a specific allocator for the class. If that pointer is NULL (default situation), we use the
global object. If it is not NULL, it is assumed that it points to a similar table of functions
(malloc/free/realloc) that the user has set up for this class.

Within this schema, it is still easy to change the global behavior of all objects, but subclasssing
with a custom allocator is still easy and is NOT affected by any changes in the global behavior.

Thanks for your contribution.
jacob

jacob navia · Mar 17, 2010

ng2010 a écrit :

So sad to be dying and seeing youth tackling problems already solved in
my own youth.

You misunderstand. I am not trying to invent a new way of doing those things. I am trying to invent
a generalized standard way of doing that in C. This has never been done before for C.

Nick Keighley · Mar 17, 2010

So sad to be dying and seeing youth tackling problems already solved in
my own youth.

and the solution was? I'm not convinced there *is* a single right
answer for all situations.

Nick Keighley · Mar 17, 2010

Data allocation strategies for containers can vary wildly, depending on
the specific container and on the application. Environment
considerations play also a major role: if there is enough RAM many
things can be handled QUITE differently than when there isn't.

It is impossible to find a strategy that can be always good in all
situations, so naturally, we need an object with (roughly) 3 function
pointers:

(1) MALLOC
(2) FREE
(3) REALLOC

This table of functions will be used by a container for
allocating/releasing memory. It will default to the standard C
functions, but can be changed so that a GC is used, for instance. In
that case we would have

MALLOC --> GC_malloc
FREE --> no operation
REALLOC --> GC_realloc.

Now, should this object be a global part of the library, i.e. a single
allocation object for the whole library, or should each container class
(hash tables, lists, dictionaries) have one, or should each individual
container have one?

If we have a single global object, changing the behavior of our
containers is very easy, we have a single object to change and we are
done. Obviously, this is VERY global and forces the user to have always
the same allocation strategy for all containers.

If we have a per class of container design, we have more flexibility, we
could use the GC for hash tables but not for lists, etc. The price to
pay is increased difficulty to change the behavior of all objects... To
change from the default object to the GC, for instance, we would have to
go through all container classes. True, the library could provide a
function to do that, but if we add a container we would have to modify
that function again and again.

If we have an allocation object per individual container we have the
maximum flexibility but changing the allocation strategy becomes QUITE
complicated.

Personally, I think that is easier to understand the first option:
having a single object that allocates/frees memory. It is rare that we
would want to use a GC, say, and at the same time malloc/free at the
same time.

Obviously I am not sure, hence this message. What do you think?

both? That is a global allocation object that is the default allocator
that can be over-ridden on a per container basis.

If I have a number of small conatiners and a single gigantic container
maybe only gigantic container needs a special memory allocator. I
could imagine containers needing special memory (eg. for DMA).

Also the STL does it this way...

Dr Malcolm McLean · Mar 17, 2010

Now, should this object be a global part of the library, i.e. a single

allocation object for the whole library, or should each container class
(hash tables, lists, dictionaries) have one, or should each individual
container have one?

Most of the time user won't really care abput the memory allocation
strategy wants. So the default should be to use some sensible scheme
(probably stdlib malloc) and to kep this away from the user.
However you can provide a function to set the defaults for all
containers. It should apply only to containers created after the call.
Then you can also provide special contructors that take memory
management function pointers as arguments. Most of the time these
won't be used, but they provide fine control to those who need it.

Andrew Poelstra · Mar 17, 2010

Andrew Poelstra a écrit :

Mmm this is a good point. I did not think about that.

Obviously, the solution is (if we retain a single global object) that each class contains a pointer
to a specific allocator for the class. If that pointer is NULL (default situation), we use the
global object. If it is not NULL, it is assumed that it points to a similar table of functions
(malloc/free/realloc) that the user has set up for this class.

Within this schema, it is still easy to change the global behavior of all objects, but subclasssing
with a custom allocator is still easy and is NOT affected by any changes in the global behavior.

Thanks for your contribution.
jacob

I like this (and Nick suggested it as well downthread). Malcolm
suggested having separate constructors for people who want their
own memory management. The regular constructor would just use
malloc().

Seebs · Mar 17, 2010

You misunderstand. I am not trying to invent a new way of doing
those things. I am trying to invent a generalized standard way
of doing that in C. This has never been done before for C.

It's been done repeatedly, except for the "standard" part.

I honestly don't think that generalizing to "container" is going to be
useful in C. In a language that doesn't have meaningful inheritance,
it's very unusual for this kind of approach to get widespread adoption.
I think you'd do better to make a list library and a hash library,
for instance.

-s

jacob navia · Mar 17, 2010

Seebs a écrit :

It's been done repeatedly, except for the "standard" part.

I honestly don't think that generalizing to "container" is going to be
useful in C. In a language that doesn't have meaningful inheritance,
it's very unusual for this kind of approach to get widespread adoption.

I do not know of any other attempt in C.

Inheritance is not necessary at all. The code works without it.

Seebs · Mar 17, 2010

Seebs a écrit :

I do not know of any other attempt in C.

I don't know of another generalization to container in C. I know of
many lists and vectors and the like.

Inheritance is not necessary at all. The code works without it.

But without inheritance, "container" is not a useful level of abstraction.

The thing that makes "container" useful is that it provides for commonality
between "list" and "hash". But that's only useful when list and hash are
the types you actually work with, but because of inheritance, you can treat
them both as containers.

Without inheritance, either you have to call things "container" regardless
of which they are, and thus have access to an API that only makes sense for
a list when you're using a hash, and an API that only makes sense for a hash
when you're using a list, or you lose the ability to write a function which
just takes a "container".

I have often used list libraries. I'd use hash libraries if I needed a hash.
In neither case would I use a "container" library in C -- it's the wrong
level of abstraction for C, so far as I can tell. I've never in any language
wanted "a container". In OO languages, it's useful that the specific things
I request are also generically "containers". In C, it's not useful, because
I can't do anything with it.

-s

Andrew Poelstra · Mar 17, 2010

I don't know of another generalization to container in C. I know of
many lists and vectors and the like.

But without inheritance, "container" is not a useful level of abstraction.

The thing that makes "container" useful is that it provides for commonality
between "list" and "hash". But that's only useful when list and hash are
the types you actually work with, but because of inheritance, you can treat
them both as containers.

Without inheritance, either you have to call things "container" regardless
of which they are, and thus have access to an API that only makes sense for
a list when you're using a hash, and an API that only makes sense for a hash
when you're using a list, or you lose the ability to write a function which
just takes a "container".

I have often used list libraries. I'd use hash libraries if I needed a hash.
In neither case would I use a "container" library in C -- it's the wrong
level of abstraction for C, so far as I can tell. I've never in any language
wanted "a container". In OO languages, it's useful that the specific things
I request are also generically "containers". In C, it's not useful, because
I can't do anything with it.

You can do OO in C. I think most people don't because they have
it drilled into their heads that "C is not an object-oriented"
language. Which to be fair, it isn't, but that doesn't mean that
there are reasonably-elegant[1] ways to do OO.

What Jacob has described doing (for a couple of projects) is
making the first element of his structs a vptr table, and interally
casting all his containers to this vtpr table, then calling the
appropriate "method" from there.

So that's inheritance. But from the user's perspective, it requires
casting each list/vector/hash/whatever to a Container * before
passing it to a function.

eg.
int data = 5;
List mylist = new_list();
append((Container *) mylist, &data);

Which is kinda ugly, but the cast can be hidden by a macro around
the function, and then you're golden:

#define append(a, b) (append((Container *)a, b)

append(mylist, &data);

I think this is a nice interface, has appropriate data-hiding, and
provides genericity. If he published the vptr structure, it would
also be extensible by other libraries.

[1] And "Reasonably elegent" often means "more elegant than C++"

Seebs · Mar 17, 2010

You can do OO in C. I think most people don't because they have
it drilled into their heads that "C is not an object-oriented"
language. Which to be fair, it isn't, but that doesn't mean that
there are reasonably-elegant[1] ways to do OO.

Oh, certainly. I just don't think that it's a good enough fit to reward
the kind of hierarchy the "container" library implies.

I think this is a nice interface, has appropriate data-hiding, and
provides genericity. If he published the vptr structure, it would
also be extensible by other libraries.

True. But I think there's a limit -- every library needs to know what could
be in the vptr structure, and you can't easily add or remove things. I
suppose some kind of type indicator could help.

But at this point, you end up with things that have a fair bit of overhead,
and most of the time when I see lists used in C, they're used directly
so they don't have to pay the overhead of function calls at all, let alone
indirection through function pointers...

[1] And "Reasonably elegent" often means "more elegant than C++"

-s

ImpalerCore · Mar 17, 2010

Data allocation strategies for containers can vary wildly, depending on
the specific container and on the application. Environment
considerations play also a major role: if there is enough RAM many
things can be handled QUITE differently than when there isn't.

It is impossible to find a strategy that can be always good in all
situations, so naturally, we need an object with (roughly) 3 function
pointers:

(1) MALLOC
(2) FREE
(3) REALLOC

This table of functions will be used by a container for
allocating/releasing memory. It will default to the standard C
functions, but can be changed so that a GC is used, for instance. In
that case we would have

MALLOC --> GC_malloc
FREE --> no operation
REALLOC --> GC_realloc.

Now, should this object be a global part of the library, i.e. a single
allocation object for the whole library, or should each container class
(hash tables, lists, dictionaries) have one, or should each individual
container have one?

If we have a single global object, changing the behavior of our
containers is very easy, we have a single object to change and we are
done. Obviously, this is VERY global and forces the user to have always
the same allocation strategy for all containers.

If we have a per class of container design, we have more flexibility, we
could use the GC for hash tables but not for lists, etc. The price to
pay is increased difficulty to change the behavior of all objects... To
change from the default object to the GC, for instance, we would have to
go through all container classes. True, the library could provide a
function to do that, but if we add a container we would have to modify
that function again and again.

If we have an allocation object per individual container we have the
maximum flexibility but changing the allocation strategy becomes QUITE
complicated.

Personally, I think that is easier to understand the first option:
having a single object that allocates/frees memory. It is rare that we
would want to use a GC, say, and at the same time malloc/free at the
same time.

Obviously I am not sure, hence this message. What do you think?

jacob

One option is to just use malloc, wait for user complaints and see
what areas in performance they are complaining about and address
them. If the interface is easy enough to use and there are enough
examples to lead them in how to use your library, there will be
interest in improving the performance issues, which can be argued and
fleshed out as a community.

I simply use malloc for my stuff since I have a very small frame of
reference (just myself) for my container code and it works for me.
I'd release it that way if I decided to jump in the pool, as I
personally would like to get a broader user experience (if anyone
would be willing to use it) to make an more informed decision between
the trade-offs of complexity and efficiency before making that kind of
choice (and hopefully you'd have a following of willing test subjects
to try out your changes in "real world" applications).

Again, you likely have a more robust background in this stuff than I.
I've never used a garbage collector in C or even C++, and haven't been
pushed to the point where efficiency was that critical to require
writing special allocators. All I've done is create a wrapper that
allows me to inject faults after specified number of malloc, calloc,
or realloc function calls to test my internal container allocation
error handling.

I don't intend this to discourage the allocator discussion at all, but
just to point out that if I'm not using the library, I have no vantage
point to make any informed suggestion. While I think that the
semantics of memory allocation is important, I just feel it's not
important to a library that no one is committed to using *yet*.

Best regards,
John D.

ImpalerCore · Mar 17, 2010

You can do OO in C. I think most people don't because they have
it drilled into their heads that "C is not an object-oriented"
language. Which to be fair, it isn't, but that doesn't mean that
there are reasonably-elegant[1] ways to do OO.

Click to expand...

Oh, certainly. I just don't think that it's a good enough fit to reward
the kind of hierarchy the "container" library implies.

I think this is a nice interface, has appropriate data-hiding, and
provides genericity. If he published the vptr structure, it would
also be extensible by other libraries.

Click to expand...

True. But I think there's a limit -- every library needs to know what could
be in the vptr structure, and you can't easily add or remove things. I
suppose some kind of type indicator could help.

But at this point, you end up with things that have a fair bit of overhead,
and most of the time when I see lists used in C, they're used directly
so they don't have to pay the overhead of function calls at all, let alone
indirection through function pointers...

Imo, there's probably a few people willing to allow the overhead if

1. It's easy to use correctly.
2. Rolling their own version is significantly more difficult and
expensive.
3. It's free.
4. The performance overhead is not detrimental to the application.

For example, I would gladly use a correct but slowish hash-table to
build a working demo of an application just to verify the proof of
concept. At the very least, it would allow developers to prototype a
design faster and fine tune with more type-specific/efficient data
structures if need be.

Best regards,
John D.

[1] And "Reasonably elegent" often means "more elegant than C++"

Click to expand...

-s

Ian Collins · Mar 17, 2010

I don't know of another generalization to container in C. I know of
many lists and vectors and the like.

But without inheritance, "container" is not a useful level of abstraction.

I don't think it's inheritance that makes "container" a useful level of
abstraction. The most important language feature for containers is
support for generics. For example, the containers part of the C++
standard library makes very little use of inheritance but relies heavily
on templates (hence the old name of standard template library).

The thing that makes "container" useful is that it provides for commonality
between "list" and "hash". But that's only useful when list and hash are
the types you actually work with, but because of inheritance, you can treat
them both as containers.

Or because of generics support.

Without inheritance, either you have to call things "container" regardless
of which they are, and thus have access to an API that only makes sense for
a list when you're using a hash, and an API that only makes sense for a hash
when you're using a list, or you lose the ability to write a function which
just takes a "container".

Again, generics enables you to use the common subset of container
interfaces.

So the interface can be done in C, just not very cleanly.

Ian Collins · Mar 17, 2010

Data allocation strategies for containers can vary wildly, depending on
the specific container and on the application. Environment
considerations play also a major role: if there is enough RAM many
things can be handled QUITE differently than when there isn't.

It is impossible to find a strategy that can be always good in all
situations, so naturally, we need an object with (roughly) 3 function
pointers:

(1) MALLOC
(2) FREE
(3) REALLOC

This table of functions will be used by a container for
allocating/releasing memory. It will default to the standard C
functions, but can be changed so that a GC is used, for instance. In
that case we would have

MALLOC --> GC_malloc
FREE --> no operation
REALLOC --> GC_realloc.

Now, should this object be a global part of the library, i.e. a single
allocation object for the whole library, or should each container class
(hash tables, lists, dictionaries) have one, or should each individual
container have one?

If possible, provide both. That's the was C++ does it and it works
well. It's uncommon to provide a specific allocator to a container
instance, but when this is required, it is very useful.

Nick Keighley · Mar 18, 2010

It's been done repeatedly, except for the "standard" part.

I honestly don't think that generalizing to "container" is going to be
useful in C. In a language that doesn't have meaningful inheritance,
it's very unusual for this kind of approach to get widespread adoption.
I think you'd do better to make a list library and a hash library,
for instance.

C++ doesn't use inheritance in its container classes. Having seen
inheritance based conatiner classes I tend to think its the wrong
mechanism.

Container library (continued)	21	Dec 25, 2009
Observing a container	2	Dec 10, 2013
How can I upload a tar.bz2 file to OpenStack swift object storage container using the Python swift client?	1	Mar 22, 2024
Engineering a list container. Part 1.	71	Dec 7, 2013
Container library, second attempt	0	Oct 4, 2009
Improving memory consumption in the container library	6	Oct 9, 2009
Container Interfaces	17	May 9, 2010
Container library progress	61	Nov 9, 2009

Container library (continued)

jacob navia

Seebs

Andrew Poelstra

ng2010

jacob navia

jacob navia

Nick Keighley

Nick Keighley

Dr Malcolm McLean

Andrew Poelstra

Seebs

jacob navia

Seebs

Andrew Poelstra

Seebs

ImpalerCore

ImpalerCore

Ian Collins

Ian Collins

Nick Keighley

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads