Advanced data structures

Dann Corbit · Dec 14, 2009

The reason the solutions are huge in C++ templates, is that the C++ template
language is extremely clumsy. It's driven by declarations, and can fails to do
even some obvious inference. Classes have to be used in order to emulate
type-valued functions. If you want a function f:U, V -> T where U, V and T are
types, rather than values, then you can't just write a function. You have to
write a class, where the return value T is hacked as some member. And you
can't write a body of algorithmic code to compute that member; it all has to be
done in terms of templates: the class member which returns the function's
type-value is either just declared literally, or computed using calls
to other templates, which is very limited in terms of expressivity.
Things like if/else decisions all have to be coded as template classes.
Then you have to write reams of specializations for that class for various
instantiations of U and V. Oh yeah, and the C++ template system sometimes
doesn't even know when someting is intended to be a type, so you have to remind
it with ``typename''. The whole thing is the avoidance of giving the
programmer what is really needed: namely a compile-time meta-programming

I don't really know the Haskell language, though it does sound
interesting.

I will download and read that cited paper.

Kaz Kylheku · Dec 14, 2009

I don't really know the Haskell language, though it does sound
interesting.

I will download and read that cited paper.

That's a terrible starting point for Haskell, describing the workings of a
Template extension, but good luck!

bartc · Dec 15, 2009

Richard Heathfield said:
Anyone who is not comfortable with 0-based indexing needs to buy a new
mattress. The contortions you have to go through to use other bases just
melt away when you switch to 0.

I think there are more contortions with starting from zero; you need to deal
with 0, N-1 and N. With 1-based, you deal with 1,N and only sometimes N-1.
(This is for counting or enumerating things; for measuring, then starting
from zero is better.)

For its power to communicate algorithmic ideas readably, I would choose C
not just over MIX, but over Java, C++, Li*p... everything. Have you a more
apt language in mind?

Pseudo-code will do me. That is, 1-based pseudo-code.

And printed to make nice use of bold and italics. Printed C code always used
to look anaemic because of the thin courier font used (but perhaps that has
changed recently, and with stuff being on-line).

Ian Collins · Dec 15, 2009

jacob said:
Dann Corbit a écrit :

Yes. The problem with C++ is in other parts of the language and
in its enormous complexity.

Complexity which, as we have discussed before, you don't have to use.
In C++, just like in C, you don't have to pay for what you don't use.

As I posted a couple of days ago, any given problem will require a fixed
amount of complexity to solve. The programmer has the choice of
implementing the complexity himself, or leaving it to the compiler writer!

Templates are a good idea. You can
accomplish pretty much the same thing with a preprocessed text
file in C, that takes a template file and adapts it to the specific
data type given.

Which is what early C++ implementations did until they grew up.

jacob navia · Dec 15, 2009

Kaz Kylheku a écrit :

The reason the solutions are huge in C++ templates, is that the C++ template
language is extremely clumsy. It's driven by declarations, and can fails to do
even some obvious inference. Classes have to be used in order to emulate
type-valued functions. If you want a function f:U, V -> T where U, V and T are
types, rather than values, then you can't just write a function. You have to
write a class, where the return value T is hacked as some member. And you
can't write a body of algorithmic code to compute that member; it all has to be
done in terms of templates: the class member which returns the function's
type-value is either just declared literally, or computed using calls
to other templates, which is very limited in terms of expressivity.
Things like if/else decisions all have to be coded as template classes.
Then you have to write reams of specializations for that class for various
instantiations of U and V. Oh yeah, and the C++ template system sometimes
doesn't even know when someting is intended to be a type, so you have to remind
it with ``typename''. The whole thing is the avoidance of giving the
programmer what is really needed: namely a compile-time meta-programming
language, in which you have functions and variables whose values are types and
pieces of syntax.

I developed a compiler that does exactly that. You have a library of
entry points into the compilation environment, where you can specify
templates in C. There are two types of paradigms:

1) Event oriented:
The compiler generates a stream of compilation events that you can
subclass, in a similar way as GUI event oriented programming where
the GUI generates events that you subclass.
The types of events are:
begin/end compilation
begin/end function
begin/end statement
Syntax error (here you add syntax as you wish)
etc (many others omitted)

2) Template oriented:
You define a compile time function that resides in some dynamic
library, and you call it from your program. For instance you
define a function "list" that will generate a list type, and
you call it with
list<int>
That function will generate a type definition.

All code generation is based in a text interface: in All cases, the
generated code is a text stream. That text can contain further templates
that will provoke other events. When an event is active however, the
system will never generate an event for the event that is currently
running.

Now, this would need an army of people to do that, and I am alone.

The C++ template system is the work of people who are so gifted that they don't
think they have to study the computer science which came before them.

Look at, say, Template Haskell. There we don't have the huge algorithmic
solution sets of STL and BOOST. (Much of what C++ templates do doens't
even need the TH extension, just the straight language).
Speaking of which, I'm now revisiting the TH paper
(http://www.haskell.org/th/papers/meta-haskell.ps) and have noticed it has some
juicy things to say about C++ templates, in section 10.1, which is mostly
praise for C++ templates, but also this:

``It is (now) widely recognized that this type-system computation language
[of C++ templates] is simply an extraordinarily baroque functional language,
full of /ad hoc/ coding tricks and conventions.''

Bingo!

In my system you *can* do assignments, since everything is programmed in
C. You can save context from an event into the next, for instance just
to gather statistics, etc.

jacob

gwowen · Dec 15, 2009

``It is (now) widely recognized that this type-system computation language
[of C++ templates] is simply an extraordinarily baroque functional language,
full of /ad hoc/ coding tricks and conventions.''

Bingo!

The problems you identify with C++ templates is that they're now being
routinely used for things which the initial design never imagined. As
originally envisaged, they were the typesafety-enhanced-macros,
similar to those which Jacob. Then people discovered you could use
them as a metaprogramming language, and all the baroque tricks that
you mention appeared, and people began to utilise the full power,
which required contorting their clunky syntax beyond the comprehension
of mere mortals. It's a baroque functional language, because it was
never designed to be functional language -- it was an (unhappy)
accident.

Nick Keighley · Dec 15, 2009

``It is (now) widely recognized that this type-system computation language
[of C++ templates] is simply an extraordinarily baroque functional language,
full of /ad hoc/ coding tricks and conventions.''

Click to expand...

Bingo!

Click to expand...

The problems you identify with C++ templates is that they're now being
routinely used for things which the initial design never imagined. As
originally envisaged, they were the typesafety-enhanced-macros,
similar to those which Jacob. Then people discovered you could use
them as a metaprogramming language, and all the baroque tricks that
you mention appeared, and people began to utilise the full power,
which required contorting their clunky syntax beyond the comprehension
of mere mortals. It's a baroque functional language, because it was
never designed to be functional language -- it was an (unhappy)
accident.

I'm reading Alexandrescu's "Modern C++ Design" at the moment. I'm not
sure if its brilliant, or crazed. A lot of what he says makes
incredible sense (exactly how your "class" works depends on a set of
policies which are small "classes" that decide things [how is memory
allocted, can you copy the thing etc. etc.]) and then there's
machinary with things like "partial template specialization" and
"template templates". Still I'm sure I'll have a merry christmas.

Nick Keighley · Dec 15, 2009

I'm interested in why you'd pick zero-based indexing as something people
would specifically have problems with in C.

I personally find zero-based indexing fairly natural, other languages which
1 based indexing such as Fortran and Cobol used to bug me, especially after
I had started programming in C and "knew there was a better way"

why do we have to be hand-cuffed to any particular base?

zero based indexing certainly makes more sense mathematically.

I don't see why. It makes the index calculation slightly easier but
shouldn't that be the compiler writers concern, not ours?

Nick Keighley · Dec 15, 2009

Bruce Cook wrote:

It's a pity K&R2 abandoned 0 based chapter numbers!

I've read computer science books that used zero-based chapter
numbering (bit of a pose I thought...)

Nick Keighley · Dec 15, 2009

Because of real life considerations with real life
processors.

I think real life considerations with er... real life should be taken
into consideration. Some things map naturally onto non-zero based
indexing. At least Pascal agrees with me...

Processors weren't always as powerful and flexible as now
with arbitrary "for free" indexing.

Starting at zero offset simply saves calculations/additions or even a
look up to calculate the delta value (index base) in order to correctly
address the correct memory range.

I'd have thought the cost would be at compile time. Just add one (or
whatever) to the base address.

[Incidently I don't expect to win a "zero-indexing is bad" argument in
clc but it's fun trying]

jacob navia · Dec 15, 2009

Nick Keighley a écrit :

I'd have thought the cost would be at compile time. Just add one (or
whatever) to the base address.

[Incidently I don't expect to win a "zero-indexing is bad" argument in
clc but it's fun trying]

The programming language APL had a system variable QuadIO, for Index Origin that
would be either 1 or 0. It was systematically added to all array indices, and allowed
you to start arrays indices either with 1 or zero, as you wish.

gwowen · Dec 15, 2009

The programming language APL had a system variable QuadIO, for Index Origin that
would be either 1 or 0. It was systematically added to all array indices, and allowed
you to start arrays indices either with 1 or zero, as you wish.

And, of course, Fortran allows the user to define an arbitrary range
for the indexing of an array/matrix, which can be quite useful if (for
example, if you're exploiting symmetry in your algorithm its sometimes
helpful to declare an array "real arr(-100:100,201)", which is a
201x201 array indexed from -100 to +100 on the first co-ordinate, and
1:201 on the second...).

I'm not sure it helps all that much, though.

Keith Thompson · Dec 15, 2009

jacob navia said:
Nick Keighley a Ã©crit :

I'd have thought the cost would be at compile time. Just add one (or
whatever) to the base address.

[Incidently I don't expect to win a "zero-indexing is bad" argument in
clc but it's fun trying]

Click to expand...

The programming language APL had a system variable QuadIO, for Index
Origin that would be either 1 or 0. It was systematically added to
all array indices, and allowed you to start arrays indices either
with 1 or zero, as you wish.

Perl also has (had) a system variable, set to 0 by default,
that determines the index of the first element in an array.
It's hardly ever used, and its use is strongly discouraged.
(Perl-arrays are 0-based.)

This isn't an argument for or against 0-based or 1-based indexing;
controlling it via a language-defined global variable was a bad idea.

Stefan Ram · Dec 15, 2009

jacob navia said:
"Advanced data structures" from Peter Brass.

From a technical point of view, there is no means to tell,
how "advanced" a data structure is, it depends on the
reader, whether he

- deems a data structure to be "advanced",

- deems this data structure not to be "advanced" or

- is unable or unwilling to assess wether a
given data struture is "advanced".

A more technical and objective title might be
»graph processing in C«, since the books seems to deal with
lists and trees (IIRC), which are special kinds of graphs.

Kaz Kylheku · Dec 15, 2009

From a technical point of view, there is no means to tell,
how "advanced" a data structure is, it depends on the
reader, whether he

We can safely understand ``advanced'' to mean ``anything more more
complicated than a linked list, unbalanced binary search tree, or
directly addressed array''.

Flash Gordon · Dec 15, 2009

Nick said:
I think real life considerations with er... real life should be taken
into consideration. Some things map naturally onto non-zero based
indexing. At least Pascal agrees with me...

Zero based indexing is good. It's simple, easy to use, works nicely when
you need to do odd things such as manual index calculations, it's easy
to implement efficiently. I found them intuitive and obvious when I
first used them.

One based indexing is good, it's easy to understand etc. I found them
intuitive and obvious when I first used them.

Indexing on whatever you want is good (when it's determined per array
and not by a global option), it means you can do whatever is intuitive
for the problem you are trying to solve. I've used enumerated types as
indicies in another language when it made sense.

I'd have thought the cost would be at compile time. Just add one (or
whatever) to the base address.

Now consider a 2d array, or 3d or 4d, it starts getting more complex.

[Incidently I don't expect to win a "zero-indexing is bad" argument in
clc but it's fun trying]

C uses 0 based because it makes it simple pointer arithmetic and what
you write is what you get. For higher level abstraction other things are
better.

bartc · Dec 15, 2009

Flash Gordon said:
Nick Keighley wrote:

[1-based array indexing]

Now consider a 2d array, or 3d or 4d, it starts getting more complex.

Yeah, I use 4D arrays every day...

If these are static (ie. flat) arrays, then the address offsets can still be
done at compile time without any runtime cost.

With dynamic arrays, it's a bit more fiddly but I believe the -1 adjustment
can again be eliminated, either by each array pointer pointing at a dummy
[0] element (this is in the implementation, so can be made to work), or
building the offset into the machine addressing mode.

But anyway a decrement operation or two, when all these dereferences are
happening to get at the array element, probably wouldn't be a huge cost.

Flash Gordon · Dec 15, 2009

bartc said:
Flash Gordon said:

Nick Keighley wrote:

Click to expand...

[1-based array indexing]

Now consider a 2d array, or 3d or 4d, it starts getting more complex.

Click to expand...

Yeah, I use 4D arrays every day...

I've used far more complex structures in real production code which was
being developed for a paying customer.

If these are static (ie. flat) arrays, then the address offsets can
still be done at compile time without any runtime cost.

a, b, c and d are all calculated with complex expressions, possibly
including function calls...
arr[a][c][d] = something;
How are you going to calculate the address without extra runtime cost?
It may be small, but it does exist.

With dynamic arrays, it's a bit more fiddly but I believe the -1
adjustment can again be eliminated, either by each array pointer
pointing at a dummy [0] element (this is in the implementation, so can
be made to work),

Click to expand...

There are all sorts of problems this can lead to, especially in C, which
would have to be dealt with. If it's an array of a large struct (yes,
I've done this for good reason) that dummy is pretty big, and you can't
have other things in that dummy[0] space or you might get pointers to
different objects which compare equal! I foresee all sorts of unforeseen
problems...

or building the offset into the machine addressing mode.

Click to expand...

Do you know of any processors which have such addressing modes? Ones
which can be set up without cost given that the size in bytes of the
offset is not the same for all arrays?

But anyway a decrement operation or two, when all these dereferences are
happening to get at the array element, probably wouldn't be a huge cost.

Click to expand...

Ah well, sometimes you really don't want any extra costs because the
timings really are tight.

bartc · Dec 16, 2009

Flash Gordon said:
bartc said:

Flash Gordon said:

Nick Keighley wrote:

Click to expand...

[1-based array indexing]

I'd have thought the cost would be at compile time. Just add one (or
whatever) to the base address.

Now consider a 2d array, or 3d or 4d, it starts getting more complex.

Click to expand...

Yeah, I use 4D arrays every day...

Click to expand...

I've used far more complex structures in real production code which was
being developed for a paying customer.

If these are static (ie. flat) arrays, then the address offsets can still
be done at compile time without any runtime cost.

Click to expand...

a, b, c and d are all calculated with complex expressions, possibly
including function calls...
arr[a][c][d] = something;
How are you going to calculate the address without extra runtime cost? It
may be small, but it does exist.

I've put that line (using int ****arr) into gcc 3.4.5 for x86, it gave me a
4 instruction sequence.

Then I tried it with arr[a-1][b-1][c-1][d-1] = something (the sort of thing
you might do to convert 1-based to zero-based); gcc gave me the same 4
instructions, but the middle two now had a -4 offset in their addressing
modes; two extra bytes I think.

If a,b,c,d are actually that complex, it's possible the -1 can be absorbed
into one or two of them (for example if a was i+1 then it reduces to just
i).

With dynamic arrays, it's a bit more fiddly but I believe the -1
adjustment can again be eliminated, either by each array pointer pointing
at a dummy [0] element (this is in the implementation, so can be made to
work),

Click to expand...

There are all sorts of problems this can lead to, especially in C, which
would have to be dealt with. If it's an array of a large struct (yes, I've
done this for good reason) that dummy is pretty big, and you can't have
other things in that dummy[0] space or you might get pointers to different
objects which compare equal! I foresee all sorts of unforeseen problems...

Click to expand...

You're thinking of C as used at the programmer level. I was thinking of the
implementation side where you only have to worry about a specific
architecture.

Do you know of any processors which have such addressing modes? Ones which
can be set up without cost given that the size in bytes of the offset is
not the same for all arrays?

Click to expand...

Just about every processor I've ever used can apply an offset to an indirect
address mode. Smaller processors may involve a tangible cost, but then you
probably wouldn't be using 4-dimensional dynamic arrays on those, or you
just spend ten minutes converting your code to a zero-base.

Bruce Cook · Dec 16, 2009

Richard wrote:

[...]

Because of real life considerations with real life
processors. Processors weren't always as powerful and flexible as now
with arbitrary "for free" indexing.

Certainly that was the case when I started programming.

What I was thinking of however was that with zero-based indexing it all
"fits together" neatly, I find I don't have to deal with off-by-one problems
with arrays.

for example with character maps (we used to do a lot of this in the early
days):

(This code doesn't compile it's for demonstration of concept only)

unsigned info[] = ( 0, /* null */
INFO1 | INFO2 | INFO3 ,
INFO1,
INFO3,
INFO5,
... pretend I've filled the character map up to 255
);

unsigned get_characteristics (char *c)
{
return (info[c]);
}

If we used 1-based indexing that would be
return(info[c+1]);

No real big strain, but with zero-based indexing it just falls into place
neatly.

Bruce

Data alignment questin, structures	46	Jan 12, 2013
tools for manipulating (or pre-processing) data structures tosimplify source	59	Oct 23, 2013
How to push data from one HTML page to another	4	Jan 3, 2024
Are there books about C data structures?	42	Aug 23, 2007
Same structures, different names	31	Apr 3, 2011
New To Javascript - Accessing Data	3	Nov 26, 2023
Unpacking the structures	1	Dec 19, 2013
Suggestions about transparent data structures	12	Feb 12, 2009

Advanced data structures

Dann Corbit

Kaz Kylheku

bartc

Ian Collins

jacob navia

gwowen

Nick Keighley

Nick Keighley

Nick Keighley

Nick Keighley

jacob navia

gwowen

Keith Thompson

Stefan Ram

Kaz Kylheku

Flash Gordon

bartc

Flash Gordon

bartc

Bruce Cook

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads