Advanced data structures

B

Ben Bacarisse

James Dow Allen said:
[Check out Peter Brass's book. Source code on-line.]

I looked at the source for tries (digital search trees) and was
disappointed to see that he didn't use "traversor structures."
I know Ben Pfaff uses such things, and it seems excellent idea.
Do others use them? Can Brass' examples be recommended if he
*doesn't* use them?

I'll pass on that for the moment (for one thing, I have not read the
book so there may be good reasons to keep the emphasis on other
aspects of the implementation -- it would not be fair to comment on
the basis of the code alone).

However, it seems a shame not to have had someone who knows C really
well read over the code first. There is, of course, cast malloc calls
(almost everyone does this so I suppose that fight is over) but there
is also sizeof(char) and (int)'\0'. In fact lots of expressions that
will promote to int are cast to int, often with extra parentheses.

We also have an idiosyncratic layout that is never consistent. Almost
every possible arrangement of spaces are used with parentheses ([] and
() alike) and operators, for example.

That last is just my being fussy, but there is also a fixed size stack
that can overflow in the trie delete code and comments like:

next_char += 1; /* move to next character */

I'd also prefer to see returns from malloc checked and a rather less
cavalier attitude to integer overflow. There are real cases
where un-trapped overflow leads to a negative hash table index and
where other assumptions about arithmetic leads to similar UB. I could
provoke a segmentation fault from the example code very easily.

If the book is other wise good, all of these are a real shame.
Someone could probably have corrected the code with only few days
work.

<snip>
 
B

Ben Bacarisse

jacob navia said:
Stefan Ram a écrit :
He does that, as he explains in the text, because they are replaced
in the next section by a heap implementation where he builds an
intermediate layer (as I did for the list container) that allocates
nodes from a linked list of blocks.

I don't see how that solves the problem for a malloc call. Also, if
the free list code is as in the online examples, it too suffers from
un-tested malloc returns so the problem is simply duplicated rather
then solved.
 
E

Eric Sosman

jacob navia said:
What I mean with this sentence is that when reading code about an
algorithm, it is easier to read if the error handling code is not
shown.

I have another definition of »algorithm«: An algorithm is
the implementation of some feature using the means of a
specific target language. [...]

You're free to use words in your own way ("Who's to be
master?" -- H.D.), but your definition of "algorithm" is at
variance with the accepted meaning. That's likely to create
confusion and raise barriers to communication, just as if
you made an appointment to meet someone at noon, using your
own private definition of "noon" as "when the Moon rises."

But back to "algorithm:" Under your definition, there's
one "algorithm" for finding the GCD of two numbers in C,
another for C++, another for Basic, another for Java, ...
As a competent programmer, it seems you must learn a new
"algorithm" for every programming language, perhaps even for
every framework. That seems an awfully heavy cognitive burden.

In fact, under your definition "Euclid's Algorithm" is
an empty fiction, and ought to be something like "Euclid's
Recipe" or "Euclid's Way Of Going About It." There is
clearly an implementation-independent Something about this
GCD-finding approach; what do you call that i-i S, since
you use "algorithm" for something else?

Please answer before "noon." ;-)
 
S

Stefan Ram

Eric Sosman said:
I have another definition of »algorithm«: An algorithm is
the implementation of some feature using the means of a
specific target language. [...]
one "algorithm" for finding the GCD of two numbers in C,
another for C++, another for Basic, another for Java, ...

The GCD requires the mod operation. So, when a language
does not have a mod operation, an implementation of the
mod operation becomes part of the implementation of the
GCD, otherwise it don't.

In Haskell, the GCD is very »straightforward«, as it's
fundamental assertions can be written »directly«:

http://en.literateprograms.org/Euclidean_algorithm_(Haskell)

But in Prolog, a »Tricky version« sometimes is needed:

http://en.literateprograms.org/Euclidean_algorithm_(Prolog)

, which is another means to calculate the GCD.

The language

http://en.wikipedia.org/wiki/Brainfuck

has yet another set of elementary operations, so someone
who knows how to calculate the GCD using the elementary
operations of C might not be able to calculate it using
the elementary operations of this language »Brain...«.

If you have another definition of »algorithm« than I do,
feel free to post its wording.
 
J

jacob navia

Stefan Ram a écrit :
If you have another definition of »algorithm« than I do,
feel free to post its wording.

www.dictionary.com:

alâ‹…goâ‹…rithm
–noun
A set of rules for solving a problem in a finite number of steps, as for
finding the greatest common divisor.

Origin:
1890–95; var. of algorism, by assoc. with Gk arithmós number. See
arithmetic

Note that no programming language (or language at all) is mentioned.
An algorithm is a set of rules to do something mathematically. The only
condition is that the number of steps should be finite.
 
J

jacob navia

Stefan Ram a écrit :
I would not actually issue from a library function in the
sense of a side-effect with access to external files or
consoles or so, but just return 0.

I call the established error function for the container where the error
happens. Depending on the severity of the error, that function
(that the user of the library can replace with a function of
his/her own) the library exists, shows an error message or just returns.
If the library error function returns, then I just pass the error
to the calling function.

jacob
 
S

Stefan Ram

In Haskell, the GCD is very »straightforward«, as it's
fundamental assertions can be written »directly«:

»it's« should be »its« instead.
But in Prolog, a »Tricky version« sometimes is needed:

And the most obvious example: In a language, where the GCD
can be calculated with the operator »o«, the algorithm for
the GCD of x and y is just »x o y«.
 
S

Stefan Ram

jacob navia said:
alâgoârithm
ânoun
A set of rules for solving a problem in a finite number of steps, as for
finding the greatest common divisor.
Note that no programming language (or language at all) is mentioned.
An algorithm is a set of rules to do something mathematically. The only
condition is that the number of steps should be finite.

Those »steps« must assume that some »elementary operations«
are already given. Therefore, the steps depend on those
elementary operations.

You can not give any algorithm without this assumption
(of a given set of elementary operations) (just try it).
 
P

Phil Carmody

Those »steps« must assume that some »elementary operations«
are already given. Therefore, the steps depend on those
elementary operations.

You can not give any algorithm without this assumption
(of a given set of elementary operations) (just try it).

Yes, but those steps do not need to assume that those
elementary operations are native to the language, merely
implementable in that language.

Phil
 
S

Stefan Ram

Phil Carmody said:
Yes, but those steps do not need to assume that those
elementary operations are native to the language, merely
implementable in that language.

For a calculation to be actually executable in a language,
it is not sufficient for those operations to be merely
"implementable". They have to be /actually implemented/. Thus,
their implementation is a necessary part of the implementation
of the calculation.

In Haskell, the calculation the GCD of x and y just seems to be:

gcd x y

http://www.zvon.org/other/haskell/Outputprelude/gcd_f.html

. In C, it might be written instead:

int gcd( int a, int b )
{ int c; while( a ){ c = a; a = b % a; b = c; }return b; }

(...)

gcd( x, y )

. We see now that in C, where there is no elementary »gcd«
operation as in Haskell, we have to write its implementation
using a loop and other elementary operations (such as »%«).
Thus, this implementation becomes part of the code necessary
to perform the calculation in C.

Therefore, a loop with a remainder operation is a part of
the implementation in C, but not part of the implementation
in Haskell.

So there is not much common between those two
implementations to extract a »general course of action« to
calculate the GCD in an arbitrary language when we do not
know its set of elementary operations.
 
B

Ben Bacarisse

For a calculation to be actually executable in a language,
it is not sufficient for those operations to be merely
"implementable". They have to be /actually implemented/. Thus,
their implementation is a necessary part of the implementation
of the calculation.

In Haskell, the calculation the GCD of x and y just seems to be:

gcd x y

http://www.zvon.org/other/haskell/Outputprelude/gcd_f.html

. In C, it might be written instead:

int gcd( int a, int b )
{ int c; while( a ){ c = a; a = b % a; b = c; }return b; }

(...)

gcd( x, y )

But these are different algorithms in both senses -- i.e. both Eric's
and your use of the term -- so the example is not a good one. Eric's
question is what, if anything, you call the idea shared by these
two functions (one Haskell, the other C):

igcd a 0 = a
igcd a b = igcd b (a `rem` b)

int igcd(int a, int b)
{
return b ? igcd(b, a % b) : a;
}

Most people are happy to say that they embody some common thing -- a
recursive GCD algorithm -- despite the huge differences in form and,
indeed, semantics. Do you not bother to name or talk about this
commonality?

<snip>
 
P

Phil Carmody

For a calculation to be actually executable in a language,
it is not sufficient for those operations to be merely
"implementable". They have to be /actually implemented/.

You're obviously not a theorist. Actual executability is pretty
darn irrelevant when it comes to theoretical algorithmics.
Thus,
their implementation is a necessary part of the implementation
of the calculation.

As that depends on what is false to theoretists, it's just as false.
In Haskell, the calculation the GCD of x and y just seems to be:

gcd x y

No. _A_ calculation of the GCD is that. There's nothing preventing
an explicit Euclidean algorithm.

Phil
 
S

Stefan Ram

Ben Bacarisse said:
igcd a 0 = a
igcd a b = igcd b (a `rem` b)

int igcd(int a, int b)
{
return b ? igcd(b, a % b) : a;
}
Most people are happy to say that they embody some common thing -- a
recursive GCD algorithm -- despite the huge differences in form and,
indeed, semantics. Do you not bother to name or talk about this
commonality?

I agree that they share something. It is the exploitation of
the same mathematical theorem.

I just do not like the idea of this theorem being useful to
calculate the GCD in each and every programming language,
because, for example:

- when the language already does have a GCD, it is not
required, and

- when the language does not have a primitive remainder
operation, there might be better means to perform
the calculation (possibly exploiting other mathematical
theorems).
 
B

Ben Bacarisse

I agree that they share something. It is the exploitation of
the same mathematical theorem.

That's fair enough, but I won't be using your terminology any time
soon and I think you'll have to accept that most people call this
commonality "an algorithm" not "the exploitation of a mathematical
theorem".
I just do not like the idea of this theorem being useful to
calculate the GCD in each and every programming language,
because, for example:

Yes, but I don't think anyone said that. If you have recursion and a
mod operation in language X, the above "algorithm" can be represented
in X. No one has said it is good to so or the best way to calculate
GCD or any such thing.
 
S

Stefan Ram

The key point is that the algorithm specifies
what the steps must do rather than how they do what they must do.

The wording »steps« seems to be adequate for imperative
languages. In functional or logical languages, there are no
»sequences of steps«. Insofar, I agree, that, yes, you can
define »algorithm« that way, but it seems tailored somewhat
towards imperative languages.
 
S

Seebs

In ruby, you have built-in hash tables, sets, and many other containers.
You have to use the implementation that the language offers. Obviously
you *could* write your own, but all performance considerations would
be difficult to measure in an interpreter.

Yeah. On the other hand, there's usually not much REASON to use other
implementations.
This is surely not really obvious for people not speaking ruby
fluently...

Not especially, no.
But exactly THAT makes you AWARE of that overhead, that is an inherent
part of the algorithm's cost!

Yes. I use Ruby for things where I'm pretty sure modern computers are
Fast Enough. Then I'm pretty much okay with ignoring the overhead in most
cases, and for most of what I write, the cost of the time it would take me
to implement it in a less expressive language far exceeds the cost of the
computer Ruby will need to get it done faster anyway. :)
Hiding "details" can be OK, but when designing an algorithm and
evaluating its cost, those details do matter. In ruby or java
memory management costs are hidden by the language, and you are
forced to use the OO paradigm.

Often, the details matter much less than you might expect. I did an
iterative fractal design once, and spent quite a while paying attention
to those low-level details, finally managing to get about a 40% increase.
Then I eliminated the 0-length lines from the end of one segment to the
beginning of the next, and by about depth 5-6, I had eliminated more
than half of the points drawn, yielding a >50% increase, and >1000% for
the levels of depth where that pushed the data set outside what you could
keep in memory at a time...

-s
 
F

Flash Gordon

Stefan said:
»it's« should be »its« instead.


And the most obvious example: In a language, where the GCD
can be calculated with the operator »o«, the algorithm for
the GCD of x and y is just »x o y«.

That is no more an algorithm for GCD than the word "algorithm" is a
definition of the word "algorithm". At least, it is not an algorithm in
the sense the word is commonly used in English, computer science or
mathematics. Consider that that it is common to talk about whether the C
library function qsort actually uses the quicksort algorithm. It is also
possible to talk about which algorithm a Haskal implementation used to
calculate the GCD when you use its GCD operator (without needing to
worry about the language Haskal has been implemented in). You can also
ask if there is an algorithm to solve the halting problem without
needing to specify the language you would implement it in.
 
S

Stefan Ram

Eric Sosman said:
In fact, under your definition "Euclid's Algorithm" is
an empty fiction, and ought to be something like "Euclid's
Recipe" or "Euclid's Way Of Going About It." There is
clearly an implementation-independent Something about this
GCD-finding approach; what do you call that i-i S, since
you use "algorithm" for something else?

When one considers a family of certain procedural languages
(like C and Java) that share certain features, one can
indeed map some corresponding features to each other and
then say that two different code units in those languages
use an equivalent approach, which might be called »algorithm«.

I only do not deem such an approach to be »language
independent« - as it is sometimes claimed - because it might
not be possible with other types of languages (like
functional or logical languages). Insofar, I say that there
is no "language-independent algorithm", i.e., a general
recipe, that can be translated nearly mechanically into
every programming language.
 
K

Kaz Kylheku

For instance if we design an algorithm without explicit free, using a
garbage collector (and with JAVA you do not have any other choice) the
cost of an algorithm that uses memory sparingly and another that
needlessly thrashes memory is not visible in the code you write.

That is false.

Algorithms which use memory reveal it by the calls that they make to
functions which allocate memory. Garbage collection does not eliminate
the allocation calls.

If we can count the allocation calls, we know how much the algorithm
``conses'', and that tells us how much pressure it exerts on the garbage
collector.
 
S

Stefan Ram

I only do not deem such an approach to be »language
independent« - as it is sometimes claimed - because it might
not be possible with other types of languages (like
functional or logical languages). Insofar, I say that there
is no "language-independent algorithm", i.e., a general
recipe, that can be translated nearly mechanically into
every programming language.

Especially, one effect of such thinking is to write C code
by applying »general algorithms« for list structures,
ignoring that in C, memory management is an integral part of
such algorithms, while in, say, Java, it is not.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top