Advanced data structures

Eric Sosman · Dec 12, 2009

[...]
If you have another definition of »algorithm« than I do,
feel free to post its wording.

Knuth devotes nine full pages to this topic, and I decline
to type them (besides, that much bulk might stretch the bounds
of the "fair use" doctrine for copyright).

In those nine pages, the only part that seems to come at
all close to your mechanistic definition is the requirement of
"definiteness:" We must know how to carry out each step the
algorithm specifies. An algorithm cannot say "Set N equal to
the smallest even number not expressible as a sum of two primes,"
because we do not know how to find such a number (we do not even
know whether such a number exists). In your example of a system
without a direct means of calculating remainders, an algorithm
cannot just say "take the remainder."

... except that it can! We *do* know how to calculate
remainders without an explicit "remainder" operation, or even
without "divide" and "multiply," if it comes to that. We have
(wait for it) algorithms for doing such calculations, and we
can use them as sub-algorithms in our master algorithm. This
does not, I claim, change the master algorithm in the slightest:
It says "Find the remainder," and we use whatever means we choose
to do so: A built-in remainder operator like `%', or a divide-
multiply-subtract sequence, or even just a subtraction loop:

unsigned int remainder(unsigned int val, unsigned int div)
{
while (val >= div)
val -= div;
return val;
}

I agree that an implementation of Euclid's Algorithm using
this remainder() function will be a different implementation than
one using `%', and will probably have different timing and so on.
But I do not subscribe to the idea that Euclid's Algorithm itself
is in any way dependent on the mechanisms used to carry out the
steps it calls for. It's a musical score; the implementation is
a performance.

jacob navia · Dec 12, 2009

Kaz Kylheku a écrit :

That is false.

Algorithms which use memory reveal it by the calls that they make to
functions which allocate memory.
True

Garbage collection does not eliminate
the allocation calls.

But eliminates the disposal and memory management calls.Yes, they
reveal half of the job, but not all of it.

An algorithm that manages a free list to avoid free/malloc calls is
much more complex that one that doesn't and have just the allocation
calls.

If we can count the allocation calls, we know how much the algorithm
``conses'', and that tells us how much pressure it exerts on the garbage
collector.

But we have no control about the workings of the release part. Unless
of course, we do our own memory management using preallocated arrays
(what is possible in Java but then, we would be writing C in java...)

Keith Thompson · Dec 12, 2009

Especially, one effect of such thinking is to write C code
by applying Â»general algorithmsÂ« for list structures,
ignoring that in C, memory management is an integral part of
such algorithms, while in, say, Java, it is not.

It shouldn't have such an effect for competent programmers.

You can, for example, implement an AVL tree insertion algorithm in C,
C++ Java, Perl, Ada, INTERCAL, x86 assembly language, and so forth.
The resulting code is likely to be very different, particularly
regarding memory management and error handling.

I'd say that the *algorithm* is an abstraction, the part of the
method that doesn't necessarily change from one language to another.
(And if it does change, perhaps because some language provides
some fancy feature that makes the whole process easier, then I'd
say you're implementing a different algorithm in that language.)

There's a lot of handwaving here, and non-procedural languages
mess up the argument in ways that I haven't thought through.

[Stefan, why do you indent your text by two columns? I find
that it makes your articles slightly more difficult to read.
Indentation on Usenet normally denotes quoted text.]

Eric Sosman · Dec 12, 2009

When one considers a family of certain procedural languages
(like C and Java) that share certain features, one can
indeed map some corresponding features to each other and
then say that two different code units in those languages
use an equivalent approach, which might be called »algorithm«.

I only do not deem such an approach to be »language
independent« - as it is sometimes claimed - because it might
not be possible with other types of languages (like
functional or logical languages). Insofar, I say that there
is no "language-independent algorithm", i.e., a general
recipe, that can be translated nearly mechanically into
every programming language.

Okay, fine: As I said at the outset, you're entitled to use
words to mean whatever you please. Just don't expect to be
understood, because others are not likely to have access to your
private dictionary. (Even when you offer excerpts, few people
will be highly motivated to memorize them just for the pleasure
of deciphering your idiosyncratic usage.)

Communication is founded on convention; if you disregard the
conventions you communicate less ecstatically.

Stefan Ram · Dec 12, 2009

Keith Thompson said:
[Stefan, why do you indent your text by two columns? I find
that it makes your articles slightly more difficult to read.
Indentation on Usenet normally denotes quoted text.]

I do indent the main body of natural language text so as to
distinguish it from quotations of formal language text,
while I often do not indent these quotations of formal
language text.

For example, if I would quote Python here, I might do
this as follows (i.e., with a starting indentation of 0)

def factorial(x):
if x > 1:
return x * fakultaet(x - 1)
else:
return 1

The indentation by 0 is necessary for texts of formal
languages, because sometimes indentation is meaningful
in those languages, so I can not always add an offset
in the case of formal languages. For example, in Python
indentation is meaningful. Some older C preprocessors
also might require the »#« to be the first character
of the line, while ISO/IEC 9899:1999 (E) seems to allow
for indentation of this character.

Also, the header lines of a usenet article and the
quotations from other posters are not indented by me.

Thus, for the natural language text body of the post to be
indented differently than formal language text quoted
therein, the natural language text needs to have the offset,
because for those natural languages that I am aware of such
an offset does /not/ change its meaning.

Actually I am often using four different levels of
indentation

0 Certain quotations of formal language texts
including URIs

Certain other quotations, such as quotations
of Usenet posts I reply to

The header lines of my posting, of course

Sometimes, citations for preceding quotations,
when those citations are a formal language
text, like URIs

Rarely, very long lines to avoid overflowing
the right margin

2 Natural language text body, often containing
my own words/wordings

6 Certain quotations of natural language text
and sometimes also of formal language text

6+ My own words are sometimes indented to mark
a hierarchical structure, such as this
paragraph here itself

4 Citations for preceding quotations that were
indented to position 6

Stefan Ram · Dec 12, 2009

Eric Sosman said:
Communication is founded on convention; if you disregard the
conventions you communicate less ecstatically.

Sometimes, some people need to disagree with some of those
conventions. For example, there is a certain convention
allowing one to speak of entities, such as »god«, »absolute
time«, »sadness« or »dark matter«. But some speakers might
like to emphasize that they does not believe there is an actual
entity in the world that corresponds to some of these words.

Antoninus Twink · Dec 12, 2009

There's a lot of handwaving here, and non-procedural languages
mess up the argument in ways that I haven't thought through.

I find no mention of non-procedural languages in the ISO C Standard, so
you have once again violated your idiotic "topicality" constraints.

Hypocrite.

[Stefan, why do you indent your text by two columns? I find
that it makes your articles slightly more difficult to read.
Indentation on Usenet normally denotes quoted text.]

Interesting that you don't challenge your buddy Sossman's eccentric
indentation and annoyingly small line width.

Hypocrite.

Phil Carmody · Dec 12, 2009

Keith Thompson said:
It shouldn't have such an effect for competent programmers.

You can, for example, implement an AVL tree insertion algorithm in C,
C++ Java, Perl, Ada, INTERCAL, x86 assembly language, and so forth.

<humour type="phil's" explanation="in headers">
I don't necessarily agree with your logical inference there. Just
because you can implement it in all those imperative languages doesn't
mean you can implement it in a stack language.

The resulting code is likely to be very different, particularly
regarding memory management and error handling.

And determinism, please! I'm sure you can see what angle I come from.

Phil

Phil Carmody · Dec 12, 2009

Keith Thompson said:
(e-mail address removed)-berlin.de (Stefan Ram) writes: [...]
[Stefan, why do you indent your text by two columns? I find
that it makes your articles slightly more difficult to read.
Indentation on Usenet normally denotes quoted text.]

I'm tempted to dub the style "blockhead quoting".

Phil

Flash Gordon · Dec 12, 2009

Stefan said:
Sometimes, some people need to disagree with some of those
conventions. For example, there is a certain convention
allowing one to speak of entities, such as »god«, »absolute
time«, »sadness« or »dark matter«. But some speakers might
like to emphasize that they does not believe there is an actual
entity in the world that corresponds to some of these words.

Fine, say that algorithms don't exist. There are plenty of words for
things that don't exist, such as bandersnatch. Appropriating the word
for some other purpose just because you believe what it refers to does
not exist only leads to confusion. If you don't believe dark matter
exists, you could just as well use that as your term!

Bruce Cook · Dec 13, 2009

Nick said:
Once you get into "friendly" string manipulation, there are a few
arguments for 1-based indexing, particularly when combined with "zero is
false,
everything else is true". For example, it would be "nicer" in
Javascript to be able to write Â»if(somestring.indexOf("test"))Â« rather
than having to add Â»!=-1Â« at the end. Of course, C using pointers for
this manages to do it quite well with the NULL return from strstr.

Yeah I agree, zero-based indexing is a pain in function results. I think
this is more a problem of utilizing the single value range for signaling in
function returns rather than zero-based indexing itself.

Perl almost gets it right by having the undef & !exists consitions on
scalars but it's usefulness falls down by treating 0 and undef in the same
manner in bool expressions, so you can't just use the same return result to
test function failure & result values.

Try-except steps around this by providing a separate signaling channel & and
a standard error/exception handling mechanism for the language but
introduces it's own set of problem I think.

Bruce

Bruce Cook · Dec 13, 2009

Stefan said:
Eric Sosman said:

I have another definition of Â»algorithmÂ«: An algorithm is
the implementation of some feature using the means of a
specific target language. [...]

Click to expand...

one "algorithm" for finding the GCD of two numbers in C,
another for C++, another for Basic, another for Java, ...

Click to expand...

The GCD requires the mod operation. So, when a language
does not have a mod operation, an implementation of the
mod operation becomes part of the implementation of the
GCD, otherwise it don't.

In Haskell, the GCD is very Â»straightforwardÂ«, as it's
fundamental assertions can be written Â»directlyÂ«:

http://en.literateprograms.org/Euclidean_algorithm_(Haskell)

But in Prolog, a Â»Tricky versionÂ« sometimes is needed:

http://en.literateprograms.org/Euclidean_algorithm_(Prolog)

, which is another means to calculate the GCD.

[...]

In my experience any algorithm I implement I will implement is the same
regardless of the language. I will use the tools the language supplies &
program within the constraints of the language but the algorithm is
recognizably the same.

A programming language simply provides a toolkit to express algorithms,
different languages have different ways of looking at things & different
tools you can use to attack problems but essentially a quicksort in one
language is the same as a quicksort in another (otherwise by definition it's
not a quicksort).

C implementations provide a quicksort library function (other languages may
have this built in as a language primitive). If you use those (and you
should) you're not implementing a quicksort algorithm, you're using a
language tool.

(and just for the pedants I'm using the generic form of "you")

Bruce

Keith Thompson · Dec 13, 2009

Keith Thompson said:
Keith Thompson said:

[Stefan, why do you indent your text by two columns? I find
that it makes your articles slightly more difficult to read.
Indentation on Usenet normally denotes quoted text.]

Click to expand...

I do indent the main body of natural language text so as to
distinguish it from quotations of formal language text,
while I often do not indent these quotations of formal
language text.

[fairly reasonable explanation snipped]

Ok, but since the vast majority of other posters consistently write
original text left-justified, using a different and inconsistent
convention is more likely to cause confusion than to convey whatever
information you're trying to convey.

Python indentation is unlikely to be relevant here in comp.lang.c,
and C code can almost always be indented without loss of meaning.
Even if you'd rather not indent C code, it looks different enough
from plain text that including a block of code between paragraphs
of English won't cause any problems.

See "beautiful new impediments to understanding".

Give it some thought.

Keith Thompson · Dec 13, 2009

Bruce Cook said:
C implementations provide a quicksort library function (other languages may
have this built in as a language primitive). If you use those (and you
should) you're not implementing a quicksort algorithm, you're using a
language tool.

[...]

Quibble: C provides a qsort() library function, which may or may
not be an implementation of the Quicksort algorithm (though the
name was almost certainly inspired by the name of the algorithm).

Kenny McCormack · Dec 13, 2009

Keith Thompson said:
See "beautiful new impediments to understanding".

Give it some thought.

Isn't it funny how easy it is to cause Kiki to lose his precarious sense
of balance?

Bruce Cook · Dec 13, 2009

Keith said:
Quibble: C provides a qsort() library function, which may or may
not be an implementation of the Quicksort algorithm (though the
name was almost certainly inspired by the name of the algorithm).

yep, true.

Curtis Dyer · Dec 13, 2009

I would have to agree, and I actually think PHP would be pretty
good, too. I'm biased, thogh; it was the first programming language
I learned, so has a special place in my heart. (I know, I'm cheesy
and lame.)

C is pretty expressive, but a lot of things end up with a bunch
of overhead that isn't actually the implementation of the
algorithm, but rather, the bookkeeping to allow that
implementation to work.

Although I don't have a large amount of experience with C, it still
seems that, for the most part, you could modularize the bookkeeping
code so that the actual implementation of the algorithm will read
more clearly. Also, as Jacob mentioned, sometimes the explicit
handling of low-level things like memory handling can be relevant in
crafting the actual implementation of an algorithm.

Dann Corbit · Dec 14, 2009

Anyone who is not comfortable with 0-based indexing needs to buy a new
mattress. The contortions you have to go through to use other bases just
melt away when you switch to 0.

For its power to communicate algorithmic ideas readably, I would choose
C not just over MIX, but over Java, C++, Li*p... everything. Have you a
more apt language in mind?

If you use C++ templates, then you can write algorithm generically.
These generic algorithms are still typesafe.

That is the reason that we find huge algorithmic solution sets like STL
and BOOST in C++ but not in C.

You can, of course, accomplish the same thing in C, but it is harder and
less natural.

jacob navia · Dec 14, 2009

Dann Corbit a écrit :

If you use C++ templates, then you can write algorithm generically.
These generic algorithms are still typesafe.

Yes. The problem with C++ is in other parts of the language and
in its enormous complexity. Templates are a good idea. You can
accomplish pretty much the same thing with a preprocessed text
file in C, that takes a template file and adapts it to the specific
data type given.

That is the reason that we find huge algorithmic solution sets like STL
and BOOST in C++ but not in C.

I think that no development is done in C in this field because C has been
abandoned as development language for this type of solutions, not
because C can't do it. The book I proposed has red/black trees, for instance
that make the bassis for several data structures in the STL. They can
be programmed in c, obviously using void pointers but still in a
fairly simple manner.

You can, of course, accomplish the same thing in C, but it is harder and
less natural.

Yes, you can accomplish the same thing in C++, but using that complex language
is harder and less natural than in C.

jacob

Kaz Kylheku · Dec 14, 2009

If you use C++ templates, then you can write algorithm generically.
These generic algorithms are still typesafe.

That is the reason that we find huge algorithmic solution sets like STL
and BOOST in C++ but not in C.

The reason the solutions are huge in C++ templates, is that the C++ template
language is extremely clumsy. It's driven by declarations, and can fails to do
even some obvious inference. Classes have to be used in order to emulate
type-valued functions. If you want a function f:U, V -> T where U, V and T are
types, rather than values, then you can't just write a function. You have to
write a class, where the return value T is hacked as some member. And you
can't write a body of algorithmic code to compute that member; it all has to be
done in terms of templates: the class member which returns the function's
type-value is either just declared literally, or computed using calls
to other templates, which is very limited in terms of expressivity.
Things like if/else decisions all have to be coded as template classes.
Then you have to write reams of specializations for that class for various
instantiations of U and V. Oh yeah, and the C++ template system sometimes
doesn't even know when someting is intended to be a type, so you have to remind
it with ``typename''. The whole thing is the avoidance of giving the
programmer what is really needed: namely a compile-time meta-programming
language, in which you have functions and variables whose values are types and
pieces of syntax.

The C++ template system is the work of people who are so gifted that they don't
think they have to study the computer science which came before them.

Look at, say, Template Haskell. There we don't have the huge algorithmic
solution sets of STL and BOOST. (Much of what C++ templates do doens't
even need the TH extension, just the straight language).
Speaking of which, I'm now revisiting the TH paper
(http://www.haskell.org/th/papers/meta-haskell.ps) and have noticed it has some
juicy things to say about C++ templates, in section 10.1, which is mostly
praise for C++ templates, but also this:

``It is (now) widely recognized that this type-system computation language
[of C++ templates] is simply an extraordinarily baroque functional language,
full of /ad hoc/ coding tricks and conventions.''

Bingo!

Data alignment questin, structures	46	Jan 12, 2013
tools for manipulating (or pre-processing) data structures tosimplify source	59	Oct 23, 2013
How to push data from one HTML page to another	4	Jan 3, 2024
Are there books about C data structures?	42	Aug 23, 2007
Same structures, different names	31	Apr 3, 2011
New To Javascript - Accessing Data	3	Nov 26, 2023
Unpacking the structures	1	Dec 19, 2013
Suggestions about transparent data structures	12	Feb 12, 2009

Advanced data structures

Eric Sosman

jacob navia

Keith Thompson

Eric Sosman

Stefan Ram

Stefan Ram

Antoninus Twink

Phil Carmody

Phil Carmody

Flash Gordon

Bruce Cook

Bruce Cook

Keith Thompson

Keith Thompson

Kenny McCormack

Bruce Cook

Curtis Dyer

Dann Corbit

jacob navia

Kaz Kylheku

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads