Experiment: functional concepts in C

  • Thread starter Ertugrul Söylemez
  • Start date
E

Ertugrul Söylemez

Richard said:
Ertugrul Söylemez said:
jacob navia said:
You are completely WRONG.

You could use some improvement in your reasoning. =)
A small program has many advantages:

[a lot of irrelevant crap]

I'm not coding like a pig. Cleaning up properly is not bloat, and
that's all I'm talking about.

However, even if I were talking about bloat, your reasoning is still
nonsense. You're programming in C after all. To get maximum speed,
you need to program in Assembler.

No you dont. More often than not a modern compiler will pick better
instruction sequences than you will. Sure you might revert to
assembler in some corner cases where it makes sense and you know
certain tricks or have certain needs that compiler can not be hinted
at.

Programming in C means giving up performance. Yes, of course the
compiler will optimize better than you, but to get the _best_ code there
is only one option: Assembler. In fact to get the fastest executable
file, even assembler won't be enough. The linker may choose nonoptimal
arrangements and make the file unnecessarily big.

Of course the differences are neglible, but Jacob seems to care about
them. And if he says that he doesn't care about them, he implicitly
admits that he might have got his priorities wrong.

In one instance yes. Not in millions of iterations. Are you a student?

No, I'm not a student, but I prefer correct code. In no realistic
scenario will freeing memory make a noticable difference. I have
benchmarked it. Allocating _and_ freeing five million (!) blocks of 256
bytes take 1.6 seconds on my system. No inner loop allocates five
million blocks of memory with each iteration.


Greets
Ertugrul
 
D

Dimiter \malkia\ Stanev

Chris said:
[...]
A very simple region allocator. Unfortunately, you are not aligning
things properly. Perhaps you should do something like:
________________________________________________________
#include <assert.h>
#include <stddef.h>


#define BLOCK_SIZE 8192U
#define DOUBLE_PER_BLOCK (BLOCK_SIZE / sizeof(double))

I forgot to mention that `BLOCK_SIZE' should be evenly divisible by
`sizeof(double)'.

Aligning is much needed, except if your program is not dealing
specifically say only with 8-bit strings (parsers for example).
 
S

Seebs

Programming in C means giving up performance.

This has not been adequately demonstrated.
Yes, of course the
compiler will optimize better than you, but to get the _best_ code there
is only one option: Assembler.

This is not clear.

First off, it's quite possible that the compiler will in fact generate the
best possible code for a given hunk of code. At that point, you have two
options for best.

Secondly, in real-world testing, we have consistently seen that assembler
tends to underperform C, often by large margins. There's a number of
reasons for this.

1. The sorts of people who think they can beat compilers usually can't.
2. The assembly programmer is likely to spend so much time debugging stuff
that would never have gone wrong in C that insufficient time is available
for optimization.
3. Optimizations which alter the algorithm or structure of code are easy
to perform in C, and extremely difficult in assembler, so the assembly
programmer is likely to end up with a hyper-optimized bubblesort while the C
programmer ends up with a mediocre quicksort.

In short: What you say could probably be true in an abstract case, or in a
hypothetical alternative universe scenario where we have infinite developer
time and effort available. For real programs, it is fairly rare for assembler
to beat C.
No, I'm not a student, but I prefer correct code. In no realistic
scenario will freeing memory make a noticable difference.

This is simply untrue.
I have
benchmarked it. Allocating _and_ freeing five million (!) blocks of 256
bytes take 1.6 seconds on my system. No inner loop allocates five
million blocks of memory with each iteration.

A benchmark which doesn't ask the right question is useless.

I've had programs take noticable time to exit due to cleanup. I regularly
run jobs which run the same programs ten thousand times or more. A tenth
of a second change in performance, ten thousand times, is plenty noticeable.

We're just talking about the question of the final cleanup before an exit,
where it is common for developers on POSIX systems to take advantage of
the documented, guaranteed, standard feature that memory is deallocated on
exit. It really does make sense to rely on that for many programs.

-s
 
E

Ertugrul Söylemez

Seebs said:
This has not been adequately demonstrated.


This is not clear.

First off, it's quite possible that the compiler will in fact generate
the best possible code for a given hunk of code. At that point, you
have two options for best.

Secondly, in real-world testing, we have consistently seen that
assembler tends to underperform C, often by large margins. There's a
number of reasons for this.

1. The sorts of people who think they can beat compilers usually can't.
2. The assembly programmer is likely to spend so much time debugging stuff
that would never have gone wrong in C that insufficient time is available
for optimization.
3. Optimizations which alter the algorithm or structure of code are easy
to perform in C, and extremely difficult in assembler, so the assembly
programmer is likely to end up with a hyper-optimized bubblesort while the C
programmer ends up with a mediocre quicksort.

In short: What you say could probably be true in an abstract case, or
in a hypothetical alternative universe scenario where we have infinite
developer time and effort available. For real programs, it is fairly
rare for assembler to beat C.

I don't disagree a second about all this, but that wasn't my point. How
could Jacob possibly ever be satisfied? There is a theoretical optimal
implementation, and the only language you can write that in is
Assembler.

To get to the point: IMO there are better places to waste your time at.
It really doesn't matter whether an IDE responds a nanosecond faster.
If we were talking about seconds, it would be different, but we're
talking about nanoseconds here.

This is simply untrue.

Of course your applications may be different. At least in my
applications a slight increase in shutdown time is neglible.

A benchmark which doesn't ask the right question is useless.

It doesn't ask for specific timings or specific applications. The
question is just whether it takes horribly long to free millions of
buffers. The answer is clearly no.

I've had programs take noticable time to exit due to cleanup. I
regularly run jobs which run the same programs ten thousand times or
more. A tenth of a second change in performance, ten thousand times,
is plenty noticeable.

We're just talking about the question of the final cleanup before an
exit, where it is common for developers on POSIX systems to take
advantage of the documented, guaranteed, standard feature that memory
is deallocated on exit. It really does make sense to rely on that for
many programs.

Well, maybe running a program ten thousand times is not the right
solution. You could run the program once and let it perform ten
thousand operations instead.

The Unix philosophy is great in general, but there are places, where it
doesn't work so well.


Greets
Ertugrul
 
O

Oliver Jackson

I don't disagree a second about all this, but that wasn't my point.  How
could Jacob possibly ever be satisfied?  There is a theoretical optimal
implementation, and the only language you can write that in is
Assembler.

To get to the point:  IMO there are better places to waste your time at..
It really doesn't matter whether an IDE responds a nanosecond faster.
If we were talking about seconds, it would be different, but we're
talking about nanoseconds here.



Of course your applications may be different.  At least in my
applications a slight increase in shutdown time is neglible.



It doesn't ask for specific timings or specific applications.  The
question is just whether it takes horribly long to free millions of
buffers.  The answer is clearly no.



Well, maybe running a program ten thousand times is not the right
solution.  You could run the program once and let it perform ten
thousand operations instead.

The Unix philosophy is great in general, but there are places, where it
doesn't work so well.

That's rich. You insist on performing completely unnecessary
operations, and when it results in unsatisfactory performance, it's
clearly the OS's fault.
nightmare = unsafePerformIO (getWrongWife >>= sex)

Sigh...
 
S

Seebs

I don't disagree a second about all this, but that wasn't my point. How
could Jacob possibly ever be satisfied? There is a theoretical optimal
implementation, and the only language you can write that in is
Assembler.
Probably.

To get to the point: IMO there are better places to waste your time at.
It really doesn't matter whether an IDE responds a nanosecond faster.
If we were talking about seconds, it would be different, but we're
talking about nanoseconds here.

I think we're talking about milliseconds.

And that's the thing -- we run stuff often enough that milliseconds add up.
Of course your applications may be different. At least in my
applications a slight increase in shutdown time is neglible.

Yeah. I'm looking at things like compilers, shells, and filters -- commands
which are going to be run very often.
It doesn't ask for specific timings or specific applications. The
question is just whether it takes horribly long to free millions of
buffers. The answer is clearly no.

The question is whether it takes long enough that you should consider
bypassing that step, because the cost of performing it is too high. And
the answer is clearly yes.
Well, maybe running a program ten thousand times is not the right
solution. You could run the program once and let it perform ten
thousand operations instead.

That would be a less modular design, which was less clearly provably correct.
Don't like it.
The Unix philosophy is great in general, but there are places, where it
doesn't work so well.

It works great in our case -- as long as programmers understand it and
react accordingly. In particular, they should write programs to start up
and shut down reasonably quickly, and that means not spending a lot of
time shuffling internal buffers and data which are never used or reacted to
in any way -- for instance, tearing down elaborate structures which are
about to be disintegrated instantly by the kernel anyway.

As I said, my default plan is to build things so that it's easy to free them,
but if I know there's no particular reason to do so in a given context, I may
well omit that free. There's a lot of static buffers in some of my code,
which will get allocated only if they don't exist, but never freed. That's
not a bug, it's a carefully-considered design decision.

-s
 
S

Seebs

Wait. You cannot generalize.

As long as I use the word "tends", why, yes, yes I can.

In fact, I can generalize pretty much whenever I want; most readers
will realize that there are probably some exceptions. :)
Well, human is always in advantage.

Empirically, this is often not the case.
If one can't beat compiler in speed,
certainly it can in code size. Human can always examine compiler
output and tweak code in order to beat compiler. (I said nothing new)

It depends on how clever the compiler is being, and how much it's relying
on being able to remember what it did elsewhere. Humans have many advantages,
but they also have some disadvantages.
Not so sure. Another generalization.

True, but a moderately reasonable one; if you're using tools that allow
the sort of abstractions necessary to allow easy refitting of data structures
or algorithms, you're likely relying on those tools for much of the
glue between pieces of code, which means you've lost some of the detailed
control that can make assembler a good choice.
Who writes pure assembler these days except enthusiasts?
Assembler + C is more logical combination.

Agreed, but outside of kernels, I've seen VERY little explicit assembly that
had good cause to exist.

-s
 
C

Chris M. Thomasson

Dimiter "malkia" Stanev said:
Chris said:
[...]
A very simple region allocator. Unfortunately, you are not aligning
things properly. Perhaps you should do something like:
________________________________________________________
#include <assert.h>
#include <stddef.h>


#define BLOCK_SIZE 8192U
#define DOUBLE_PER_BLOCK (BLOCK_SIZE / sizeof(double))

I forgot to mention that `BLOCK_SIZE' should be evenly divisible by
`sizeof(double)'.

Aligning is much needed, except if your program is not dealing
specifically say only with 8-bit strings (parsers for example).

Touché!

It is definitely aligned for char.

:^)
 
B

blmblm

[ sorry about mangling your name -- I'm not up to figuring out
how to include non-ASCII characters .... ]

[ snip ]
Note for example that in Haskell it is common to create an infinite
list, extend it to an infinite list of infinite lists, zip all sublists
with an infinite list of integers, map a function over all elements,
extract a finite portion, split it into sublists, etc. The compiler
translates all these data structures into variables and loops. The
lists go away entirely. The resulting code is often just as fast as the
equivalent (but much longer and much less readable) C code.

Nitpick: How is it possible to create an infinite data structure
on a machine with finite resources? (Maybe "unbounded" or "arbitrarily
long" would be a better term?)

[ snip ]
 
E

Ertugrul Söylemez

[ sorry about mangling your name -- I'm not up to figuring out
how to include non-ASCII characters .... ]

No problem, even though I could nitpick back on this. =P

Nitpick: How is it possible to create an infinite data structure on a
machine with finite resources? (Maybe "unbounded" or "arbitrarily
long" would be a better term?)

Note that Haskell uses lazy evaluation. The data structure isn't just
arbitrarily long. It is indeed infinite and it works, as long as you
never ask for the last element or the length. In fact you can even
print such a data structure entirely and it will work (but the program
will print forever, of course).

primes = filter isPrime [2..]
main = print (take 10 primes)

You could print 'primes' instead of 'take 10 primes' and it would print
a list of all prime numbers instead of only the first 10 primes.
Anyway, wrong newsgroup. If you're interested, just go to the homepage.
You'll find a lot more information about this there.


Greets
Ertugrul
 
R

Richard Tobin

Nitpick: How is it possible to create an infinite data structure
on a machine with finite resources?

By having a finite representation of it!

Of course, if you equate the data structure with the representation,
you are stuck.

-- Richard
 
P

Phil Carmody

Ertugrul Söylemez said:
Programming in C means giving up performance. Yes, of course the
compiler will optimize better than you, but to get the _best_ code there
is only one option: Assembler.

Invalid generalisation. While I was programming fast numeric code
(stuff like FFTs and other transforms) on the register-rich Alpha
and Power architectures, my C was being turned into almost exactly
the assembly language that I would have written myself. One recent
example, the very first C algorithm I wrote, when I passed it through
the FSL Power simulator/profiler, showed that I/gcc was retiring
absolutely the maximum number of operations per clock tick, which
means I didn't even need to write a second optimised version - there
was no optimisation possible. (And the algorithm was already optimised
so that the task was being done in the smallest number of operations.)

Cranky instruction sets might benefit from guru internal knowledge
and use of assembler, but register-rich RISCs generally don't have
the same horrific quantity and variety of pitfalls that cranky arch's
do, and there's less room for the human in assembly to beat the
compiler. (Assuming his C coding style suits the architecture.)

Phil
 
B

blmblm

[ sorry about mangling your name -- I'm not up to figuring out
how to include non-ASCII characters .... ]

No problem, even though I could nitpick back on this. =P

Maybe we're not using "nitpick" to mean the same thing (I think
it has a connotation of "pointing out a potential flaw not already
noticed") .... Not arguing with you here, though.
Nitpick: How is it possible to create an infinite data structure on a
machine with finite resources? (Maybe "unbounded" or "arbitrarily
long" would be a better term?)

Note that Haskell uses lazy evaluation. The data structure isn't just
arbitrarily long. It is indeed infinite and it works, as long as you
never ask for the last element or the length. In fact you can even
print such a data structure entirely and it will work (but the program
will print forever, of course).

primes = filter isPrime [2..]
main = print (take 10 primes)

You could print 'primes' instead of 'take 10 primes' and it would print
a list of all prime numbers instead of only the first 10 primes.
Anyway, wrong newsgroup. If you're interested, just go to the homepage.
You'll find a lot more information about this there.

Hm. Well, all right, if the "infinite list" doesn't actually have
to exist in the computer's memory .... (?) Okay.

As you say, though, off-topic here.
 
N

Nick Keighley

[ sorry about mangling your name -- I'm not up to figuring out
how to include non-ASCII characters .... ]

[ snip ]
Note for example that in Haskell it is common to create an infinite
list, extend it to an infinite list of infinite lists, zip all sublists
with an infinite list of integers, map a function over all elements,
extract a finite portion, split it into sublists, etc.  The compiler
translates all these data structures into variables and loops.  The
lists go away entirely.  The resulting code is often just as fast as the
equivalent (but much longer and much less readable) C code.

Nitpick:  How is it possible to create an infinite data structure
on a machine with finite resources?  (Maybe "unbounded" or "arbitrarily
long" would be a better term?)

I think such languages work on a "lazy" principle- they only evaluate
the things that are actually needed so if you create a couple of
infinite structures you have no problem unless you try to output an
infinite result.

So you might write a program that computes a list of all the primes,
but you only output the first 1000. The compiler is smart enough not
to evaluate all those primes you never look at.
 
L

ld

[ sorry about mangling your name -- I'm not up to figuring out
how to include non-ASCII characters .... ]
Ertugrul Söylemez a écrit :
[ snip ]
Note for example that in Haskell it is common to create an infinite
list, extend it to an infinite list of infinite lists, zip all sublists
with an infinite list of integers, map a function over all elements,
extract a finite portion, split it into sublists, etc.  The compiler
translates all these data structures into variables and loops.  The
lists go away entirely.  The resulting code is often just as fast as the
equivalent (but much longer and much less readable) C code.
Nitpick:  How is it possible to create an infinite data structure
on a machine with finite resources?  (Maybe "unbounded" or "arbitrarily
long" would be a better term?)

I think such languages work on a "lazy" principle- they only evaluate
the things that are actually needed so if you create a couple of
infinite structures you have no problem unless you try to output an
infinite result.

You don't have problem either, it just takes an infinite time. This is
equivalent to an infinite loop which outputs something in C.
So you might write a program that computes a list of all the primes,
but you only output the first 1000. The compiler is smart enough not
to evaluate all those primes you never look at.

I don't know if the compiler is smart, but it follows the language
principle which requires an "outermost reduction". C uses an
"innermost reduction" which evaluate the arguments first then call the
function. Haskell does the opposite which by definition evaluates only
the arguments used (when the value is needed).

regards,

ld.
 
R

Richard Bos

jacob navia said:
Then, to read a 4 digit year from character into an integer you use
several temporary objects, by reading the characters into a list
container temp, that calls a general routine to convert a list
container into a boxed multi-precision integer that gives a boxed
integer that is converted into an unboxed one... eventually.

....and that is precisely why a generalised container library for C is
exactly the wrong thing to write.

Richard
 
E

Ersek, Laszlo

...and that is precisely why a generalised container library for C is
exactly the wrong thing to write.

Funnily enough, I think this is overgeneralization. In my narrow view of
things (aka "tunnel vision"), I'd rather not rewrite my light-weight
red-black tree "library" each time I turn to it. And hey, it links to
user-allocated objects through (void *) and takes a comparison function,
so it's generic!

Cheers,
lacos
 
L

ld

You don't have problem either, it just takes an infinite time. This is
equivalent to an infinite loop which outputs something in C.

Not exactly.  Try to translate the following pair of functions to C:

  isPrime :: Integral i => i -> Bool
  isPrime n = all (\d -> rem n d /= 0) . takeWhile (\d -> d^2 <= n) $ primes

  primes :: Integral i =>
  primes = 2 : filter isPrime [3..]

'primes' is an infinite list of all prime numbers, calculated in a naive
trial division manner, but the trial division uses only prime numbers.
This is where lazy infinite data structures are more powerful than just
infinite loops with finite data structures.


This doesn't contradict what I said. If you try to print primes, it
will take an infinite time, assuming an infinite memory since the CAF
(primes) cannot be garbage collected until the end of the printing
which never comes. The observable effect will be the same as an
infinite loop printing something. This is why I say that it is not a
problem (or equivalently, it is a problem in both C and Haskell
depending on your intend).
Note that there are two related, but separate concepts at work here:
Firstly non-strict semantics.  This is what you mentioned:  Function
arguments are not evaluated before the function is called.

As I understand it, non-strict semantic _results_ from the reduction
strategy (and its properties). I am not aware of a strict semantic
being able to apply outermost reduction. Therefore, non-strict
semantic means to push the lambda expressions representing the
arguments first then call the function.
 But when are
they evaluated then?  This is where the lazy evaluation stategy comes
in, the second concept.  A computation is carried out as it is demanded..

As I understand it (;-),

lazy evaluation
= non-strict semantic + subgraph sharing
= optimal reduction strategy.

and it is not required to explain the behavior of primes.

regards,

ld.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top