Advanced data structures

Bruce Cook · Dec 16, 2009

Keith said:
Perl also has (had) a system variable, set to 0 by default,
that determines the index of the first element in an array.
It's hardly ever used, and its use is strongly discouraged.
(Perl-arrays are 0-based.)

This isn't an argument for or against 0-based or 1-based indexing;
controlling it via a language-defined global variable was a bad idea.

Ouch !

That would be particularly fun with libraries

Bruce

bartc · Dec 16, 2009

Bruce Cook said:
Richard wrote:

Because of real life considerations with real life
processors. Processors weren't always as powerful and flexible as now
with arbitrary "for free" indexing.

Click to expand...

Certainly that was the case when I started programming.

What I was thinking of however was that with zero-based indexing it all
"fits together" neatly, I find I don't have to deal with off-by-one
problems
with arrays.

for example with character maps (we used to do a lot of this in the early
days):

(This code doesn't compile it's for demonstration of concept only)

unsigned info[] = ( 0, /* null */
INFO1 | INFO2 | INFO3 ,
INFO1,
INFO3,
INFO5,
... pretend I've filled the character map up to 255
);

unsigned get_characteristics (char *c)
{
return (info[c]);
}

If we used 1-based indexing that would be
return(info[c+1]);

No real big strain, but with zero-based indexing it just falls into place
neatly.

Some things are better with 0-based, and some with 1-based.

It doesn't have to be a choice; you can just allow both **.

However, if you have to choose one form, 0-based might be better, but only
because you can use 1-based indexing by allocating an extra element and
ignoring element 0. It's more difficult the other way around.

(** the design of some languages seems to demand that one base is the
default or 'natural' base for indexing, even when both (or any) base is
possible.)

Michael Foukarakis · Dec 16, 2009

Anyone who is not comfortable with 0-based indexing needs to buy a new
mattress. The contortions you have to go through to use other bases just
melt away when you switch to 0.

For its power to communicate algorithmic ideas readably, I would choose
C not just over MIX, but over Java, C++, Li*p... everything. Have you a
more apt language in mind?

Although I am reluctant to use 'equally apt' or 'more apt', I'd say
Erlang or Haskell are very good at expressing algorithms in a neat
fashion. Other high level languages such as Ruby or Python spring to
mind as well. I think an advantage of C over those is that it's
adequately readable (at least when we're not trying to write an IOCCC
entry..) even as it's significantly "closer" to manipulating the
machine.

On topic, the book is actually very comprehensive, very concise, and
deals with some quite interesting data structures. I liked it.

gwowen · Dec 16, 2009

I have started reading
"Advanced data structures" from Peter Brass.
...
I find this book excellent. Everything is written in C, it
is clear, and explains very well the (quite difficult) stuff.

I see that I will have to read this before going further in the
container library. This is exactly the book I was looking for.

On a not-entirely-unrelated-note, I acquired a copy of Sedgwick's
"Algorithms In C" at the weekend (3 pounds in the Wooden Canal Boat
Society Charity Shop == result). He covers lists, arrays and various
balanced trees. Is there anything in Barnes that's not in Sedgwick?

jacob navia · Dec 16, 2009

gwowen a écrit :

On a not-entirely-unrelated-note, I acquired a copy of Sedgwick's
"Algorithms In C" at the weekend (3 pounds in the Wooden Canal Boat
Society Charity Shop == result). He covers lists, arrays and various
balanced trees. Is there anything in Barnes that's not in Sedgwick?

I will tell you in a few weeks. I bought Sedgewick in Amazon but
it was sold out. I had to wait several weeks until they shipped it and
it is still not here yet.

jacob

Nick · Dec 16, 2009

gwowen said:
On a not-entirely-unrelated-note, I acquired a copy of Sedgwick's
"Algorithms In C" at the weekend (3 pounds in the Wooden Canal Boat
Society Charity Shop == result). He covers lists, arrays and various
balanced trees. Is there anything in Barnes that's not in Sedgwick?

Despite what you see in my sig, it's not mine! Mine's just "Algorithms"
- they're in either Pascal or a pascal-like pseudo code. I actually
prefer to implement myself from pseudo code than have them given to me.

Flash Gordon · Dec 16, 2009

bartc said:
Flash Gordon said:

bartc said:

Nick Keighley wrote:

[1-based array indexing]

I'd have thought the cost would be at compile time. Just add one (or
whatever) to the base address.

Now consider a 2d array, or 3d or 4d, it starts getting more complex.

Yeah, I use 4D arrays every day...

Click to expand...

I've used far more complex structures in real production code which
was being developed for a paying customer.

If these are static (ie. flat) arrays, then the address offsets can
still be done at compile time without any runtime cost.

Click to expand...

a, b, c and d are all calculated with complex expressions, possibly
including function calls...
arr[a][c][d] = something;
How are you going to calculate the address without extra runtime cost?
It may be small, but it does exist.

Click to expand...

I've put that line (using int ****arr) into gcc 3.4.5 for x86, it gave
me a 4 instruction sequence.

Not all of the world is i x86.

Then I tried it with arr[a-1][b-1][c-1][d-1] = something (the sort of
thing you might do to convert 1-based to zero-based); gcc gave me the
same 4 instructions, but the middle two now had a -4 offset in their
addressing modes; two extra bytes I think.

Click to expand...

Now try on a DSP or some other embedded processor. Not everything can do
that trick.

If a,b,c,d are actually that complex, it's possible the -1 can be
absorbed into one or two of them (for example if a was i+1 then it
reduces to just i).

Click to expand...

It might be a function call. If you build your software for 0 based
arrays you build it in to the calculation, if it is 1 based you can't do
that and if it is a function doing the calculation it might not be
possible for the compiler to optimise away.

With dynamic arrays, it's a bit more fiddly but I believe the -1
adjustment can again be eliminated, either by each array pointer
pointing at a dummy [0] element (this is in the implementation, so
can be made to work),

Click to expand...

There are all sorts of problems this can lead to, especially in C,
which would have to be dealt with. If it's an array of a large struct
(yes, I've done this for good reason) that dummy is pretty big, and
you can't have other things in that dummy[0] space or you might get
pointers to different objects which compare equal! I foresee all sorts
of unforeseen problems...

Click to expand...

You're thinking of C as used at the programmer level. I was thinking of
the implementation side where you only have to worry about a specific
architecture.

Click to expand...

I was referring to the problems you can have in the implementation for a
specific architecture.

Just about every processor I've ever used can apply an offset to an
indirect address mode.

Click to expand...

On some of the ones I've used you can't.

Smaller processors may involve a tangible cost,
but then you probably wouldn't be using 4-dimensional dynamic arrays on
those,

Click to expand...

You might be surprised at the complexity of problem which is solved
using small processors, or at least processors which don't have such
fancy addressing modes.

or you just spend ten minutes converting your code to a zero-base.

Click to expand...

If the language uses one-based arrays you can't convert it to zero based!

As I said earlier, in a language which allows I'll use whatever fits
best. However, there are good reasons for using zero-base in the
language when performance is critical.

bartc · Dec 16, 2009

Flash Gordon said:
bartc said:

Flash Gordon said:

bartc wrote:
If these are static (ie. flat) arrays, then the address offsets can
still be done at compile time without any runtime cost.

a, b, c and d are all calculated with complex expressions, possibly
including function calls...
arr[a][c][d] = something;
How are you going to calculate the address without extra runtime cost?
It may be small, but it does exist.

Click to expand...

I've put that line (using int ****arr) into gcc 3.4.5 for x86, it gave me
a 4 instruction sequence.

Click to expand...

Not all of the world is i x86.

Then I tried it with arr[a-1][b-1][c-1][d-1] = something (the sort of
thing you might do to convert 1-based to zero-based); gcc gave me the
same 4 instructions, but the middle two now had a -4 offset in their
addressing modes; two extra bytes I think.

Click to expand...

Now try on a DSP or some other embedded processor. Not everything can do
that trick.

I didn't fully understand the gcc code, but it seemed to find a way of doing
with just two extra offsets instead of the four one might expect. If this is
just clever logic then it can be applied to whatever address calculations
are needed for any processor.

On the DSP, do you have an example part-number? It might be interesting to
find out just what it is capable of.

It might be a function call. If you build your software for 0 based arrays
you build it in to the calculation, if it is 1 based you can't do that and
if it is a function doing the calculation it might not be possible for the
compiler to optimise away.

Click to expand...

With up to 4 function calls the relative cost of any offset operations falls
even further.

On some of the ones I've used you can't.

Click to expand...

Then they might also have trouble with some ordinary array and struct
accesses.

You might be surprised at the complexity of problem which is solved using
small processors, or at least processors which don't have such fancy
addressing modes.

If the language uses one-based arrays you can't convert it to zero based!

Click to expand...

OK, I would have expected a compiled language targeted at basic hardware to
have a choice of array lower bounds.

Then it would be no different from many other decisions programmers have to
make regarding the efficiency of their code: dynamic arrays are slower than
static (flat) ones; 32-bit elements can be slower than 16-bit elements; and
a lower bound of 1 can be slower than one of 0, in some cases.

Flash Gordon · Dec 16, 2009

bartc said:
Flash Gordon said:

bartc said:

bartc wrote:
If these are static (ie. flat) arrays, then the address offsets can
still be done at compile time without any runtime cost.

a, b, c and d are all calculated with complex expressions, possibly
including function calls...
arr[a][c][d] = something;
How are you going to calculate the address without extra runtime
cost? It may be small, but it does exist.

I've put that line (using int ****arr) into gcc 3.4.5 for x86, it
gave me a 4 instruction sequence.

Click to expand...

Not all of the world is i x86.

Then I tried it with arr[a-1][b-1][c-1][d-1] = something (the sort of
thing you might do to convert 1-based to zero-based); gcc gave me the
same 4 instructions, but the middle two now had a -4 offset in their
addressing modes; two extra bytes I think.

Click to expand...

Now try on a DSP or some other embedded processor. Not everything can
do that trick.

Click to expand...

I didn't fully understand the gcc code, but it seemed to find a way of
doing with just two extra offsets instead of the four one might expect.
If this is just clever logic then it can be applied to whatever address
calculations are needed for any processor.

If the array is not a static array then you can't use the fixed offset
trick without doing extra arithmetic at run time.

On the DSP, do you have an example part-number? It might be interesting
to find out just what it is capable of.

Click to expand...

I spent a lot of time programming the TMS320C25 both in assembler and C
some years back, and the TMS320C80, both of which are Texas Instruments
DSPs. I've actually been out of the embedded world for a while though.

With up to 4 function calls the relative cost of any offset operations
falls even further.

Click to expand...

On some processors a call is cheap but address arithmetic is expensive.

Then they might also have trouble with some ordinary array and struct
accesses.

Click to expand...

One is careful about what you use where.

OK, I would have expected a compiled language targeted at basic hardware
to have a choice of array lower bounds.

Click to expand...

I don't know of any where zero doesn't work well, but one does not
always work well.

Then it would be no different from many other decisions programmers have
to make regarding the efficiency of their code: dynamic arrays are
slower than static (flat) ones; 32-bit elements can be slower than
16-bit elements; and a lower bound of 1 can be slower than one of 0, in
some cases.

Click to expand...

I've got no objection to a language allowing each array to be based
differently, or even allowing other things such as an enumerated types
an array index, it gives you the choice. I've got no major objection to
using one as a base either, I just don't think it would have been the
right choice for C.

Richard Bos · Jan 6, 2010

jacob navia said:
Ruby also, has a way of overloading operators that sometimes looks quite
weird:

[1,2,3]+[3,4,5] --> [4,6,8] ?
[1,2,3]-[3,4,5] --> [-2,-2,-2] ?

No. [1,2,3]-[3,4,5]-->[1,2], and [1,2,3]+[3,4,5]-->[1,2,3,4,5]

This is surely not really obvious for people not speaking ruby
fluently...

It's perfectly obvious if you are slightly familiar with set theory, and
completely opaque otherwise. That's what's wrong with all operator
overloading, in fact: it allows you to create code of arbitrary
impenetrableness that looks like it _should_ be obvious, but does
unpredictably complicated things behind the scenes.

Richard

Rui Maciel · Jan 6, 2010

Stefan said:
It seems to allocate a stack entry

st = (stack_t *) malloc( ...

and then, in the next line

st->base = ...

it seems to use Â»stÂ« without ever taking account for the
possibility that Â»stÂ« might be 0?

This might be fine when a program is used by the programmer
himself, who can cope with segment violations or other
strange runtime-behavior. But it might not be appropriate
when software is intended to be delievered as a product to a
customer who needs to depend on it.

You may be right that error checking should be performed. Nonetheless, what's
important in the real world source code (validate everything) isn't necessarily
what's important in a technical or academic publication (illustrate concepts and
convey ideas).

When an author starts to include each and every validation test in a code
snippet example then the example code becomes verbose. As a consequence, the
reader's attention will also be on minor and even irrelevant details and it will
be a bit more difficult to get the gist of what the author tried to communicate
through the example code, all of this while filling page after page of
untestable code. After all, who is going to happily waste their precious time
transcribing pages after pages of source code examples? So, to put it in other
words, if you explicitly include validations in your examples then getting to
the point becomes a little bit less efficient. And needlessly so.

There are authors who understood perfectly the need for succinct source code
examples and also placed a great value on the need to perform validations
thoroughly. One such author was Richard Stevens. In his series of books he
made a point in providing source code examples which were straight to the point
and unencumbered by those long, tedious series of sanity tests, which are pretty
much irrelevant for the topic being discussed. As a result, his explanations
are clear, straight to the point and very easy to understand. Yet, all
validations were kept in the code. The thing is, he achieved that by writing
wrappers for the system calls he used in the examples, which he included in
custom-tailored header files, in effect sweeping the cruft under the rug.

The problem with that approach is that, although it's a blessing for those who
are patiently reading the book from start to finish, not missing a beat, it
represents a small source of problems for those who use the book more as a
reference instead of a casual read. The code snippets don't compile, the
functions being used don't actually exist and some headers are nowhere to be
found. Although that problem can be easily surpassed by RTFB, it still poses as
a problem to clueless newbies.

So, if an author omits a series of irrelevant validations from a book's source
code snippets then that is not a problem, much less a demonstration of how to
write sloppy code. It is, in fact, a great way to both help the reader
understand what the book is actually about and preserve the book's usefulness as
a reference work. It's far better than both diluting your example in a sea of
if() blocks and relying on clever tricks such as wrappers. Therefore the author
should be commended by omitting the tests, instead of being criticized.

And on a side note, testing the return of a malloc() call is terribly basic
stuff, in the sense that if you fail to understand the need for that, even when
missing from the examples, then probably you should be reading a book on basic C
programming before delving into that one.

Rui Maciel

Stefan Ram · Jan 7, 2010

Rui Maciel said:
When an author starts to include each and every validation test in a code
snippet example then the example code becomes verbose. As a consequence, the
reader's attention will also be on minor and even irrelevant details

Yes, but for C programming memory management is not an
irrelevant detail.

If the author would have wanted to focus on how to do
it /without manual memory management/ he could have chosen
a language with a garbage collector, like Java, or - if it
had to be C-like - a C-like language with a garbage collector,
like C with the Boehm collector. But he had chosen to use C.

and unencumbered by those long, tedious series of sanity tests, which are pretty

Taking the possibility of a 0 result of malloc into account
is not what I deem to be a »sanity« test ...

much irrelevant for the topic being discussed.

... nor irrelevant for C programing.

And then the author /does not even discuss/ this problem.
It would not have been such a bad book, if he would have first
explained how to do proper memory management in C and then why
he will not use this in the rest of the book. But he did not
mention this. Which gives the impression that he might not even be
aware of it.

And on a side note, testing the return of a malloc() call is
terribly basic stuff, in the sense that if you fail to
understand the need for that, even when missing from the
examples, then probably you should be reading a book on basic
C programming before delving into that one.

I agree that the author of "Advance Data Structures" should
have read a book on basic C programming.

Stefan Ram · Jan 7, 2010

superpollo said:
excuseme if the question is trivial ... suppose we are in a hosted
environment with virtual memory management by the os. is it true that
malloc *never* returns NULL?

I don't know - maybe someone else can answer this question?

(All I know is that according to ISO/IEC 9899:1999 (E) 7.20.3.3p3
malloc might return a null pointer.)

gwowen · Jan 7, 2010

I agree that the author of "Advance Data Structures" should
have read a book on basic C programming.

Must every didactic book assume that the reader must be educated on
the basics? One might as well complain that "Algorithms in C" doesn't
tell me what a compiler is, or how to plug my computer. It would
surely have sufficed for the introduction to say "Of course, every
call to malloc() should be checked. These checks have been elided for
the sake of clarity."?

ObTroll: Of course, if malloc() threw an exception, we could abstract
all these sanity checks elsewhere

jacob navia · Jan 7, 2010

Stefan Ram a écrit :

Yes, but for C programming memory management is not an
irrelevant detail.

Yes it is. If malloc fails, the program crashes. So what?

That's a correct behavior for a sample program!

If the author would have wanted to focus on how to do
it /without manual memory management/ he could have chosen
a language with a garbage collector, like Java, or - if it
had to be C-like - a C-like language with a garbage collector,
like C with the Boehm collector. But he had chosen to use C.

You can use C as an implementation language without discussing C
memory management, C treatment of errors, C errno function,
C file handling, etc.

Taking the possibility of a 0 result of malloc into account
is not what I deem to be a »sanity« test ...

Why do you think that a crash is a bad behavior there?
The author is showing how to implement abstract data structures
not bullet proof C code!

... nor irrelevant for C programing.

And then the author /does not even discuss/ this problem.
It would not have been such a bad book, if he would have first
explained how to do proper memory management in C and then why
he will not use this in the rest of the book. But he did not
mention this. Which gives the impression that he might not even be
aware of it.

What do you know?

And why is so important that he knows C?

When I buy a book about advanced data structures I am NOT ineterested
in C but in data structures!

And besides, nobody cares. What you fail to understand is that
the subject of the book is NOT "Correct C programming" but
implementation of advanced data structures!

I agree that the author of "Advance Data Structures" should
have read a book on basic C programming.

You haven't read the book, you do not understand the subject, and the
only thing you are able to do is to repeat your pedantic detail
wisdom again and again.

Those programs are designed for giving you an understanding of what is
going on when implementing advanced data structures; You are supposed to
ADAPT them to your needs, add the error checking and all the other
stuff.

For instance when usning keyed trees, the author simplifies keys
comparisons, assumes all data is passed by pointers only, and many other
things.

Even with those drastic simplifications the programs are hard to follow.
If he would have integrated all the necessary things the programs would
be MUCH bigger and the subject matter would be lost in a maze of
details that would make much more difficult to understand what is
going on.

To teach something it is better to make an abstraction that is simple
for the student to grasp and not present every irrelevant detail!

Stefan Ram · Jan 7, 2010

gwowen said:
surely have sufficed for the introduction to say "Of course, every
call to malloc() should be checked. These checks have been elided for
the sake of clarity."?

In C, memory management is an integral part of the algorithm,
and it is hard. Just ignoring it and then implementing a list
or a tree is trivial (beginner's programming style).

What you suggest sounds to me as if someone would suggest to
implement a factorial as:

int factorial( int arg ){ return arg * factorial( arg - 1 ); }

and then, when someone mentions that »arg« might be 0, someone
else would defend this function by arguing that this check was
elided for the "sake of clarity". But it just is wrong.

Ben Bacarisse · Jan 7, 2010

superpollo said:
Stefan Ram ha scritto:

excuseme if the question is trivial ... suppose we are in a hosted
environment with virtual memory management by the os. is it true that
malloc *never* returns NULL?

No. malloc can return NULL even is such cases. Some systems don't do
this by default, but even then the behaviour can often be changed by
mechanisms outside of your C implementation.

lacos · Jan 7, 2010

I don't know - maybe someone else can answer this question?

(All I know is that according to ISO/IEC 9899:1999 (E) 7.20.3.3p3
malloc might return a null pointer.)

malloc() does return a null pointer on GNU/Linux.

http://groups.google.com/group/comp.lang.c/msg/b6b1aff01a34738d?dmode=source

(Message-ID: <wzGFrnpa6nP$@ludens>)

Cheers,
lacos

gwowen · Jan 7, 2010

In C, memory management is an integral part of the algorithm,

So you say. I don't think its part of the algorithm at all, I think
its part of the implementation. I disagree, but I respect that you
differ. However, you have not convinced me. If I want to read about
the complexities of error handling in low memory situations I will buy
"C Error Handling Unleashed". If I want to read about data
structures, and worry about error handling later, I'll buy a book on
data structures. Personally I don't need to be reminded to check
malloc() in didactic examples, unless the thing being taught is error
handling...

Alternatively, if the lack of error handling upsets you so much,
pretend every malloc call is replaced by:

void *xmalloc(size_t sz){
void *ret = malloc(sz);
if(ret) return ret;
fprintf(stderr,"Out of memory"\n");
fflush(NULL);
abort();
}

Any more complicated error handling than that is likely to be so
dependent on application domain and platform that it would not be
relevant to a general purpose algorithm book. I'd be interested to
see you suggest what the error handling should look like in such a
general-purpose book.

Stefan Ram · Jan 7, 2010

gwowen said:
I'd be interested to see you suggest what the error handling
should look like in such a general-purpose book.

Yes, this question is exactly what the author of such
a book should discuss! In the case of the book given,
he missed to do this.

When implementing data structures in C, the implementation
needs to have a documented policy towards run-time errors.
Authors of such implementations need to be aware of the
different options. This is not beginner's material. This is
hard. Beginner's material is implementing data structures
and ignoring this question.

Data alignment questin, structures	46	Jan 12, 2013
tools for manipulating (or pre-processing) data structures tosimplify source	59	Oct 23, 2013
How to push data from one HTML page to another	4	Jan 3, 2024
Are there books about C data structures?	42	Aug 23, 2007
Same structures, different names	31	Apr 3, 2011
New To Javascript - Accessing Data	3	Nov 26, 2023
Unpacking the structures	1	Dec 19, 2013
Suggestions about transparent data structures	12	Feb 12, 2009

Advanced data structures

Bruce Cook

bartc

Michael Foukarakis

gwowen

jacob navia

Nick

Flash Gordon

bartc

Flash Gordon

Richard Bos

Rui Maciel

Stefan Ram

Stefan Ram

gwowen

jacob navia

Stefan Ram

Ben Bacarisse

lacos

gwowen

Stefan Ram

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads