New Altivec-optimized valarray implementation

G

Glen Low

I have written a new implemention of the std::valarray library that is
optimized to use Altivec (Apple's "Velocity Engine", part of the
PowerPC G4's in most Macintoshes and the announced IBM PPC 970). The
implementation is mostly standard conforming and is complete.

As soon as I get my shingle up on the web (1 or 2 day's time), I'll
post the library and its accompanying docs, which I call "MacSTL".
Would like comments, tests, discussions of it...

Preliminary benchmarks on my Power Mac G4 make it 550% faster on
inlined arithmetic, 1360% faster on inlined transcendentals and 290%
faster on summation than the gcc 3.1 std::valarray. Even the
non-Altivec-optimized inline arithmetic is 50% faster than gcc, due to
ruthless elimination of extrenous loads and stores by using STL-style
algorithms.

I sense that std::valarray is somewhat of a dead horse out there, but
I believe I can show there's still some life left in that concept. As
for Altivec, it's still up and coming!

P.S. The library also has several STL-influenced concepts for the Mac
(or BSD/PowerPC): std::vectors with Mach copy-on-write semantics,
clean COM wrappers using just std containers, zlib wrapped in std
iostreams...
 
E

E. Robert Tisdale

Glen said:
I have written a new implementation of the std::valarray library
that is optimized to use AltiVec (Apple's "Velocity Engine",

AltiVec(tm) is Motorola's trademark
for the first PowerPC SIMD extension.

http://www.simdtech.org/altivec
part of the PowerPC G4's in most Macintoshes
and the announced IBM PPC 970).
The implementation is mostly standard conforming and is complete.
As soon as I get my shingle up on the web (1 or 2 day's time),
I'll post the library and its accompanying docs,
which I call "MacSTL".
Would like comments, tests, discussions of it...
Preliminary benchmarks on my Power Mac G4 make it 550% faster
on inlined arithmetic, 1360% faster on inlined transcendentals
and 290% faster on summation than the gcc 3.1 std::valarray.
Even the non-Altivec-optimized inline arithmetic
is 50% faster than gcc, due to ruthless elimination
of extraneous loads and stores by using STL-style algorithms.

I sense that std::valarray is somewhat of a dead horse out there but
I believe I can show there's still some life left in that concept.
As for Altivec, it's still up and coming!

P.S. The library also has several STL-influenced concepts for the Mac
(or BSD/PowerPC): std::vectors with Mach copy-on-write semantics,
clean COM wrappers using just std containers, zlib wrapped in std
iostreams...

You might want to post this to the Object Oriented Numerics mailing list

http://www.oonumerics.org/mailman/listinfo.cgi/oon-list/

I believe that Kent Budge still subscribes to this list
and he may appreciate vindication.

You might also visit the
High Performance Embedded Computing Software Initiative (HPEC-SI)

http://www.hpec-si.org/

They are working on a C++ binding for
the Vector Signal Image Processing Library (VSIPL)

http://www.vsipl.org/

There are several implementations of the VSIPL now
for AltiVec on the PowerPC and you really should be
benchmarking your implementation of std::valarray against them.

Also, do you plan an implementation for The Power Mac G5

http://www.apple.com/powermac/
 
G

Glen Low

You might want to post this to the Object Oriented Numerics mailing list
http://www.oonumerics.org/mailman/listinfo.cgi/oon-list/

I believe that Kent Budge still subscribes to this list
and he may appreciate vindication.

I will do that once I get back to my own Mac. Yes, I have seen Kent's
rationale at the site and I'll be linking to it from my own website,
once it's up.
There are several implementations of the VSIPL now
for AltiVec on the PowerPC and you really should be
benchmarking your implementation of std::valarray against them.

I was hoping for a wider audience and thus targetted std::valarray.
Also, do you plan an implementation for The Power Mac G5

http://www.apple.com/powermac/

Definitely! I tried to get out my library before the WWDC
announcement, in the hopes that the rumors were true and get some
additional publicity. Alas several sleepless nights and time off my
day job and I only finished it on Monday or so, sans docs which I'm
working on now.

I've taken a look at the developer.apple.com site new G5 docs and it
looks good for what I am doing, especially since Altivec code is
bandwidth sensitive. For example aligning loops for maximal
performance would dovetail nicely with my inline implementation. I'll
have to download the gcc 3.3 compiler that they've provided and see
how it goes.

Cheers,
Glen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top