Super-scalar Optimizations

P

Phrogz

I was looking over the shoulder of a C++ coworker yesterday, when he
was writing a hack to only run certain code once. The C++ code was the
equivalent of the following Ruby code:

already_run = false
while true
foo = bar if !already_run
already_run = true
do_something( )
end

I asked him: "Wouldn't it be slightly faster to nest the boolean
assignment inside the if statement?" I was suggesting the equivalent
of:

already_run = false
while true
if !already_run
foo = bar
already_run = true
end
do_something( )
end

His answer was "no", and involved discussion of super-scalar
architectures and the fact that the original way ran the assignment in
parallel to the condition evaluation, and so was in fact faster.


The reason I'm posting is - are there any such considerations in Ruby
(when writing Ruby code, not C/C++ components)?

Or am I correct in assuming that the current state of
compilation/interpretation is such that there is no parallel branching
of statements to worry about?
 
P

Phil Tomson

I was looking over the shoulder of a C++ coworker yesterday, when he
was writing a hack to only run certain code once. The C++ code was the
equivalent of the following Ruby code:

already_run = false
while true
foo = bar if !already_run
already_run = true
do_something( )
end

I asked him: "Wouldn't it be slightly faster to nest the boolean
assignment inside the if statement?" I was suggesting the equivalent
of:

already_run = false
while true
if !already_run
foo = bar
already_run = true
end
do_something( )
end

His answer was "no", and involved discussion of super-scalar
architectures and the fact that the original way ran the assignment in
parallel to the condition evaluation, and so was in fact faster.


The reason I'm posting is - are there any such considerations in Ruby
(when writing Ruby code, not C/C++ components)?

Or am I correct in assuming that the current state of
compilation/interpretation is such that there is no parallel branching
of statements to worry about?

I'll take a stab at this, but I'm no expert....

Your cow-worker _may_ be right about the differences between the two code
snippets. I would think that you'd have to know what kind of assembly
code got generated by the two different snippets, though. Perhaps he's
taken a look at the compiled code. It also seems like it could differ a
lot between compilers (g++ vs. VC++).

As far as Ruby code goes, I don't think you would see any difference
because Ruby doesn't get compiled to native code (yet ;-). Though there
may be differences in how the interpretter handles the two different
snippets which could possibly effect speed, it would have nothing to do
with anything deep down in the actual hardware processor.

Phil
 
R

Robert Klemme

Phil said:
I'll take a stab at this, but I'm no expert....

Your cow-worker _may_ be right about the differences between the two
code snippets. I would think that you'd have to know what kind of
assembly code got generated by the two different snippets, though.
Perhaps he's taken a look at the compiled code. It also seems like
it could differ a lot between compilers (g++ vs. VC++).

As far as Ruby code goes, I don't think you would see any difference
because Ruby doesn't get compiled to native code (yet ;-). Though
there may be differences in how the interpretter handles the two
different snippets which could possibly effect speed, it would have
nothing to do with anything deep down in the actual hardware
processor.

.... especially as Ruby does no parallelism internally (no native threads).

My 0.02EUR...

robert
 
D

Devin Mullins

Remembering, vaguely, my comp arch class, I'm pretty sure the co-worker
(or cow-worker, if you really don't like him) was talking not about
threading, but about how some VLIW (very long instruction word, i.e. not
32-bit) machines include a 'predicate' in addition to the instruction
(i.e. MOV FOO, BAR IF EQL ALREADY_RUN, 0 is one instruction). I presume,
then, you guys are not running on x86.

And no, Ruby's too high level, and yeah, not compiled.

And wouldn't it be faster still just to pull foo = bar out of the while
loop? :)

Devin
 
R

Robert Klemme

Devin said:
Remembering, vaguely, my comp arch class, I'm pretty sure the
co-worker (or cow-worker, if you really don't like him) was talking
not about threading, but about how some VLIW (very long instruction
word, i.e. not 32-bit) machines include a 'predicate' in addition to
the instruction (i.e. MOV FOO, BAR IF EQL ALREADY_RUN, 0 is one
instruction). I presume, then, you guys are not running on x86.

And no, Ruby's too high level, and yeah, not compiled.

And wouldn't it be faster still just to pull foo = bar out of the
while loop? :)

Even more so: what's the point of a loop that is alway run only once?

robert
 
G

Gavin Kistner

Even more so: what's the point of a loop that is alway run only once?

Er, the loop doesn't run once, only the initialization code. The loop
runs forever.


And wouldn't it be faster still just to pull foo = bar out of the
while loop? :)

I actually flubbed the example slightly. It should have been:

already_run = false
while true
do_something( )
foo = bar if !already_run
already_run = true
end

Where "do_something()" was actually about 15 lines of code. The
alternative would have been:

do_something( )
foo = bar
while true
do_something( )
end

which is not very DRY when do_something( ) is a large block of code.

But again, even the programmer himself called it a hack while writing
it; at the time we weren't even sure if setting foo=bar after the
first iteration was the right fix to the problem.
 
A

Ara.T.Howard

Er, the loop doesn't run once, only the initialization code. The loop runs
forever.




I actually flubbed the example slightly. It should have been:

already_run = false
while true
do_something( )
foo = bar if !already_run
already_run = true
end

Where "do_something()" was actually about 15 lines of code. The alternative
would have been:

do_something( )
foo = bar
while true
do_something( )
end

which is not very DRY when do_something( ) is a large block of code.

sure it is - block being the key word here:

do_something = lambda {|arg|
... 15 lines of code
true
}

do_something[arg] and (foo = bar) and loop{ do_something[arg] }


pulling out little blocks of code that are re-used by can't be methods due to
context senstive behaviour or because they are just too small is what lambda
abstraction is for ;-)

cheers.

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================
 
B

Ben Giddings

His answer was "no", and involved discussion of super-scalar
architectures and the fact that the original way ran the assignment in
parallel to the condition evaluation, and so was in fact faster.

Yay! Trying to outsmart a compiler!

This sure seems like premature optimization to me. Was it really slowing
things down to do it the more obvious way? Had that been proven using a
profiler?

Computer code is a language that is meant to be read by both humans and
computers. These days, computers are really smart and their compilers can
look at the code and know what you're trying to do. In Ruby, Matz does
this by looking at context when something could be interpreted different
ways. C/C++ compilers can often spot common control structures and use an
optimized version in the machine code they produce.

Since computers are so smart, these days it makes more sense to write code
that a human can understand. Unless you truly need to clarify things for
the computer (i.e. things run too slow when they're written in the
human-obvious way) don't write for the computer!

Ben
 
G

Gavin Kistner

sure it is - block being the key word here:

Er, we've wandered far from the original content - the above Ruby
code was simply an illustration of the C++ code in question, because
I can't be bothered to know how to write proper C++ syntax.

Yes, Ruby makes life far cooler than C++.
 
G

Gavin Kistner

Yay! Trying to outsmart a compiler!

This sure seems like premature optimization to me. Was it really
slowing
things down to do it the more obvious way? Had that been proven
using a
profiler?

I appreciate your comments, but in the defense of my coworker:
1) As I've stated, using the boolean flag to run the code once was
only a hack to test if the solution would fix, and

2) No, I doubt that the placement of a single boolean assignment made
any measurable difference either way. My point with this thread
(which has been answered) was simply to find out if Ruby had any
similar things to keep in mind that would flow down to the
instruction pipeline architecture. The placement of that assignment
in the C++ code
Since computers are so smart, these days it makes more sense to
write code
that a human can understand. Unless you truly need to clarify
things for
the computer (i.e. things run too slow when they're written in the
human-obvious way) don't write for the computer!

FWIW, I don't think that the difference between:

if ( !foo )
{
bar( );
foo = true;
}

versus

if ( !foo )
{
bar( );
}
foo = true;


makes a difference either way in terms of legibility. Being against
premature optimization is fine to a point, but in any programming
project there are numerous basic choices one can make which will
affect performance.
 
M

Matthias Georgi

Gavin said:
2) No, I doubt that the placement of a single boolean assignment made
any measurable difference either way. My point with this thread
(which has been answered) was simply to find out if Ruby had any
similar things to keep in mind that would flow down to the
instruction pipeline architecture. The placement of that assignment
in the C++ code

I just searched google for super-scalar optimizations regarding
interpreters and found a paper about java bytecode-interpreters:
http://www.csc.uvic.ca/~csc586a/papers/p58-ogata.pdf

It seems, that the frequent memory access of interpreters prevent
super-scalar processors from branch-predicting and parallel-execution.

So you may assume, that almost no parallel execution happens in a ruby
script execution.

Besides that, I was always wondering, if there are performance issues
with procs. Given the fact, that they hold a reference to the c-stack,
maybe a proc call would result in some kind of stack restoring. This is
also the reason for the enormous memory consumption of continuations,
which store the whole c-stack(about 60kb).
 
S

Steven Jenkins

Ara.T.Howard said:
do_something = lambda {|arg|
... 15 lines of code
true
}

do_something[arg] and (foo = bar) and loop{ do_something[arg] }


pulling out little blocks of code that are re-used by can't be methods
due to
context senstive behaviour or because they are just too small is what
lambda
abstraction is for ;-)

OK, you got me thinking. I've built a Ruby extension to access a system
engineering database we use at work. The vendor provides a C API, which
I've wrapped with SWIG. SWIG is useful, but the generated Ruby methods
aren't very Ruby-like. So I've written a layer on top of the SWIG
methods. Most of the layer methods look like this:

def framelist
ret, list = CAPIitem_getframelist(self)
check_result(ret)
list
end

Object#check_result is a common method that checks for an error
indication, finds what the last error was, and raises the corresponding
exception:

def check_result(res)
return unless res == Cradle::FALSE
code, msg = CAPIlast_error()
raise EXCEPTION
Code:
, msg
end

Is there any reason to prefer, or not, using a lambda for check_result?
The only downside I can see to the current implementation is that the
backtrace for all exceptions ends in check_result. You have to look at
the next level to see where the error actually occurred. (I think it's
almost the same with a lambda, but it doesn't report it's in another
method).

Ideas?

Steve
 
P

Pit Capitain

Steven said:
OK, you got me thinking. I've built a Ruby extension to access a system
engineering database we use at work. The vendor provides a C API, which
I've wrapped with SWIG. SWIG is useful, but the generated Ruby methods
aren't very Ruby-like. So I've written a layer on top of the SWIG
methods. Most of the layer methods look like this:

def framelist
ret, list = CAPIitem_getframelist(self)
check_result(ret)
list
end

Object#check_result is a common method that checks for an error
indication, finds what the last error was, and raises the corresponding
exception:

def check_result(res)
return unless res == Cradle::FALSE
code, msg = CAPIlast_error()
raise EXCEPTION
Code:
, msg
end

Is there any reason to prefer, or not, using a lambda for check_result?
The only downside I can see to the current implementation is that the
backtrace for all exceptions ends in check_result. You have to look at
the next level to see where the error actually occurred. (I think it's
almost the same with a lambda, but it doesn't report it's in another
method).

Ideas?[/QUOTE]

If the only goal is to get rid of the topmost entry in the backtrace,
you don't need a lambda. Just add a third parameter to Kernel#raise.
Change the last line in check_result to:

raise EXCEPTION[code], msg, caller

Regards,
Pit
 
S

Steven Jenkins

Pit said:
If the only goal is to get rid of the topmost entry in the backtrace,
you don't need a lambda. Just add a third parameter to Kernel#raise.
Change the last line in check_result to:

raise EXCEPTION
Code:
, msg, caller[/QUOTE]

The only goal at this point is understanding, but this is helpful. Now I
can't think of any reason to prefer the lambda. It's just that Ara was
so enthusiastic about it :-)

Thanks.

Steve
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top