Small Variant Records in C

B

BartC

Marco Devillers said:
Well, the current problem is that I allocate *cough* 15M chars. So,
apart from that I didn't do some of the algorithms very bright
*cough*, that works out to
15 M chars
32 bytes for a char,
48 bytes for a cons cell,

OK, I'll ask the obvious question: what's in the other 31 (or 28) bytes of
the memory used by a single char?
 
K

Kaz Kylheku

Well, the current problem is that I allocate *cough* 15M chars. So,
apart from that I didn't do some of the algorithms very bright
*cough*, that works out to
15 M chars
32 bytes for a char,
48 bytes for a cons cell,
a factor 2 overhead for GC, gives...
2.4GB of character data and 15mins for a self-compile. See my
problem ;-).

Stop heap-allocating types like characters and small numbers that
can fit into a machine word with a couple of tag bits to spare.
Your garbage collector should just be able to recognize a character
value and skip it.

In TXR, I used two bits to distinguish a pointer to a heap descriptor from
several other cases: fixnum integer, character and a static/ephemeral C string.

So on a system with 32 bit pointers, a character is 32 bits wide, and not heap
allocated. Since a wchar_t is often 32 bits wide also, there is no waste for
characters. (And a 16 bit wchar_t is broken anyway because it doesn't properly
cover all of Unicode.) For strings, I used arrays of wchar_t,
null-terminated, so strings interoperate smoothly with C interfaces.

Heap objects have descriptors that are all the same size: four pointers wide.
This is because the type is a union of various structures (union obj).
The same size simplifies the garbage collector which allocates heaps
as arrays of this type.

Most types make full use of the descriptor, and in some cases I used bitfields
to make sure that this four pointer limit is not exceeded. (For instance, for
bignums, I used the MPI library which wanted to use an entire int to encode the
sign of a number!) The cons cell wastes a field since it has only three
fields. But even on a 64 bit system, it still fits into 32 bytes (8 * 4),
less than your 48. On a 32 bit system, it's down to 16.

Since the tag for a heap pointer is the bits 00, values that are pointers
don't have to be masked in any way to be used. If a check shows that a
value is a (non-null) pointer, you can dereference it directly to access the
structure (since the type "val" is a typedef for "obj_t *". The object nil
gets to be represented by a C null pointer, which helps to program in
a Lispy style:

val func(val list) {
if (list) { /* list is not empty */ }
}

I wrote up some notes here: http://www.kylheku.com/cgit/txr/tree/HACKING

It may be worthwhile to look into some implementations of dynamic languages
to see what they do. Corman Lisp's documentation nicely describes how types
are represented; it's worth havin a look. IIRC it uses a three-bit field.
 
M

Marco Devillers

OK, I'll ask the obvious question: what's in the other 31 (or 28) bytes of
the memory used by a single char?

By memory: A tag stating it's an opaque series of bits (a 0), the size
of the series of bits, the record and the type info (it's a char), the
value of the char. Each field aligned at 64 bits/8 bytes, four fields,
so 32 bytes.

It's a wasteful scheme, I knew that. But at a given point I was only
interested in a bootstrap, didn't give it a lot of thought, and ended
up at least two orders above of the imagined memory consumption. Guess
I should have done the numbers before.

Thing is, I am not even sure I want to go all the way of extreme bit
manipulation to end up with, say, 16 bit char representations. So the
thing takes a lot of waste, so do Java/Python chars. And in this case,
the real problem -given memory in abundance- is that I do too much low-
level stuff in what is supposed to be a high-level language. I rather
aim at a slow/robust high-level language al la Python with good
bindings to C, than aim at a high-level language with low-level
features which will never achieve C speed anyway.

We live and learn, cheers, Marco
 
E

Eric Sosman

By memory: A tag stating it's an opaque series of bits (a 0), the size
of the series of bits, the record and the type info (it's a char), the
value of the char. Each field aligned at 64 bits/8 bytes, four fields,
so 32 bytes.

Guggh. You don't need to "optimize" by playing games with type
punning structures; you need to re-think your representation.
[...] And in this case,
the real problem -given memory in abundance- [...]

Can you octuple your cache when you octuple your memory use?
 
M

Marco Devillers

     Can you octuple your cache when you octuple your memory use?

Come again? I hope I can 'octuple' a lot of things if I know what
it means.

I guess you are referring to the above wasteful scheme? I know I
need to design something which aims at the current hybrid of 32bit/
64bit processor models such that I can hopefully get away with packing
RTTI information into 32 or 64 bits headers instead of representing
each field with 64 bits. And that also supposes some understanding of
how, say, on Intel x64, C instructions and data representations are
usually compiled to specific data and microinstructions and the
resulting memory/cache behavior of programs, instead of the logically
correct but not very world aware representation I have now.

My current thoughts are that I estimate I need something with a
cell size which is either a multiple of 32 or 64, I currently have no
idea of how either choice affects runtime behavior on Intel x64, and I
think I need a VM which is designed as a minimal series of convenience
macros so I can easily compile to that. And then note that to me the
biggest problem is not the representation, but the compiler which
doesn't use texts but uses lists of chars, as well as some other poor
choices I made internally.

If you have specific thoughts on 64 bit VMs, please state them!

Ah well, cheers, Marco

PS. Note that this is just a project I once started to learn some
stuff and implement some new and funny ideas given what I read. It is
in no means a production compiler or a research project. Consequently,
some design choices I made are goofy and nice, some are plain horrible.
 
B

BartC

Marco Devillers said:
By memory: A tag stating it's an opaque series of bits (a 0), the size
of the series of bits, the record and the type info (it's a char), the
value of the char. Each field aligned at 64 bits/8 bytes, four fields,
so 32 bytes.

It's a wasteful scheme, I knew that. But at a given point I was only
interested in a bootstrap, didn't give it a lot of thought, and ended
up at least two orders above of the imagined memory consumption. Guess
I should have done the numbers before.

So you have an array of 15 million characters, but each one is stored inside
a variant type.

That's always going to be wasteful: 32 bytes each here, 12 bytes if I did
the same, unless you use a fiddly representation that uses spare bits. This
is where a packed array should be considered, then each element can be 1, 2
or 4 bytes depending on how much you need unicode.

But if this array represents a string, then definitely implement those
first.

(I'm assuming your 15 million characters *are* together in a collection, but
you also mention C code being generated that takes 15 minutes to compile.
I'm not sure how that would come about, unless you represent your
15-million-element data by initialising such an array of variants as 15
million lines of C! That shouldn't be necessary, unless each element has a
different initial value.)
 
M

Marco Devillers

So you have an array of 15 million characters, but each one is stored inside
a variant type.

Nope, I have a total of 15 million character data, distributed over a
few thousand lists of chars, which is probably the result of
flattening the names in namespaces. I.e., a function foo in namespace
bar becomes a non-unique list ['b','a','r','.','f','o',o']. So it's
more a problem of the compiler than of the representation - I need
native texts. If you would do the same in Haskell, you'ld end up with
a similar wasteful memory consumption. It's just exacerbated by the
current representation scheme too, so I want a bit more bitpacking and
native string support in the VM.

The 15 mins is the result of all those list operations implemented
directly in my language, instead of calling C, and the memory hog.
I.e., everything scales worse than linear, and probably exponential,
with the string operations. (Most list operations are at least linear
in the length of the list, and performance also degrades linear with
memory consumption in GCed languages. And I am not sure I should add,
or multiply, all these costs at the moment.)

As I said, it won't compile in 5 secs if I implement native texts, but
it will become more bearable than 15 mins. (I am hoping for something
around 45 secs at the moment.)

Cheers, Marco
 
E

Eric Sosman

Come again? I hope I can 'octuple' a lot of things if I know what
it means.

Double = times two, triple (treble) = times three, ...
octuple = times eight. My point being that it is often possible
to add RAM to a system, up to some limit, but less frequently
possible to expand a system's caches. The lowest-level cache can
sometimes be grown (if so, usually by doubling, once), but higher-
level caches almost never. The only way to grow the on-chip cache
is to get a new CPU chip.
I guess you are referring to the above wasteful scheme?
Yes.

I know I
need to design something which aims at the current hybrid of 32bit/
64bit processor models such that I can hopefully get away with packing
RTTI information into 32 or 64 bits headers instead of representing
each field with 64 bits. And that also supposes some understanding of
how, say, on Intel x64, C instructions and data representations are
usually compiled to specific data and microinstructions and the
resulting memory/cache behavior of programs, instead of the logically
correct but not very world aware representation I have now.

No. Forgive the mixed metaphor, but you've gone down into the
weeds before taking the fifty-thousand-foot view. At this stage you
do *not* need to aim at any particular family of processors, you do
*not* need to pay heed to instruction sets, you do *not* need to think
about microinstructions or pipeline architecture or any such matters.
And the only thing you need to know about memory and caches (again,
at this stage) is that memory is slow and cache is limited. This
means that bloat has a speed penalty, sometimes a dramatic penalty.
It's not worth while to save 1K out of a data structure with six
instances, but it probably *is* worth while to save 8B out of a data
structure you will create by the hundreds of thousands.

More mixed metaphors: You're driving a car that hasn't been tuned
since the Clinton administration, on under-inflated tires, with an empty
boat trailer dragging behind, and you're fond of jackrabbit starts.
Your fuel economy, of course, is dismal -- and it absolutely does not
matter one tiny little bit what brand of motor oil you use. Cure your
bad driving habits, unhitch the trailer, pump up the tires, give the
jalopy some maintenance, and *then* you can start thinking about
which oils are slipperier under what conditions.

Tricks of the kinds Kaz mentions are the super-fancy motor oil,
to be used after you've already wrung the utmost out of "normal"
methods and find you still need more zip. They will not solve your
problem with the eight sacks of concrete mix in your trunk.
PS. Note that this is just a project I once started to learn some
stuff and implement some new and funny ideas given what I read. It is
in no means a production compiler or a research project. Consequently,
some design choices I made are goofy and nice, some are plain horrible.

There's nothing wrong with learning by exploration; in fact, it's
got a lot to recommend it. But this doesn't mean you should ignore
good sense altogether!
 
M

Marco Devillers

     There's nothing wrong with learning by exploration; in fact, it's
got a lot to recommend it.  But this doesn't mean you should ignore
good sense altogether!

Good sense arrives after a bootstrap! Anyway, we got a bit too much
sidetracked.

Thanks for the help, cya'll later, Marco
 
J

Joe keane

How important is it to have the minimum number (of bytes usually rather than
bits)?

supposing that you *know* the subclasses

e.g.

class Foo
{
int foo_common;
}

class RedFoo : Foo
{
char redbit;
}

class GreenFoo : Foo
{
double greenvalue;
}

struct bluefooextra { ... };

class BlueFoo : Foo
{
struct bluefooextra *blueextra;
}

[and there aren't any purple or oranges foos]

class AllFoos
{
union {struct RedFoo af_r; struct GreenFoo af_g; struct BlueFoo af_b;} y;
}

typedef class AllFoos *CanHoldAnyFoo;

int f(...)
{
...
CanHoldAnyFoo x;

x = malloc(len * sizeof(AllFoos));
if (x = NULL)
do_something();
...
}

then it may be useful
 
T

Tim Rentsch

Seebs said:
You can't necessarily. You really have to use the union type in most
cases or the compiler will assume the types can't alias each other.

I believe the Standard expects otherwise. That's why 6.5.2.3 p5
(and now p6) gives the requirement that "a declaration of the
completed type is visible" at the point of access. If the union
type had to be used to access the contained struct's there would be
no reason to include such a statement. Indeed, that's what makes
this "a special guarantee" at some level - because accesses to
struct's _not_ through a union must be understood to potentially
alias each other, being as they are both present in the same union
type.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,796
Messages
2,569,645
Members
45,371
Latest member
TroyHursey

Latest Threads

Top