I
inhahe
Can anyone give me pointers/instructions/a template for writing a Python
extension in assembly (or better, HLA)?
extension in assembly (or better, HLA)?
inhahe said:Can anyone give me pointers/instructions/a template for writing a Python
extension in assembly (or better, HLA)?
Can anyone give me pointers/instructions/a template for writing a Python
extension in assembly (or better, HLA)?
inhahe said:Well the problem is that I'm actually not an assembler guru, so I don't know
how to implement a dll in asm or use a c calling convention, although I'm
sure those instructions are available on the web. I was just afraid of
trying to learn that AND making python-specific extensions at the same time.
I thought of making a c extension with embedded asm, but that just seemed
less than ideal. But if somebody thinks that's the Right Way to do it,
that's good enough..
You could be right, but here are my reasons.
I need to make something that's very CPU-intensive and as fast as possible.
The faster, the better, and if it's not fast enough it won't even work.
They say that the C++ optimizer can usually optimize better than a person
coding in assembler by hand can, but I just can't believe that, at least for
me, because when I code in assembler, I feel like I can see the best way to
do it and I just can't imagine AI would even be smart enough to do it that
way...
For portability, I'd simply write different asm routines for different
systems. How wide a variety of systems I'd support I don't know. As a bare
minimum, 32-bit x86, 64-bit x86, and one or more of their available forms of
SIMD.
inhahe said:I like to learn what I need, but I have done assembly before, I wrote a
terminal program in assembly for example, with ansi and avatar support. I'm
just not fluent in much other than the language itself, per se.
Perhaps C would be as fast as my asm would, but C would not allow me to use
SIMD, which seems like it would improve my speed a lot, I think my goals are
pretty much what SIMD was made for.
D'Arcy J.M. Cain said:2. Once the code is functioning, benchmark it and find the
bottlenecks. Replace the problem methods with a C extension. Refactor
(and check your unit tests again) if needed to break out the problem
areas into as small a piece as possible.
3. If it is still slow, embed some assembler where it is slowing down.
Even on the same processor you may have different assemblers depending
on the OS.
yeah I don't know much about that, I was figuring perhaps I could limit the
assembler parts / methodology to something I could write generically
enough.. and if all else fails write for the other OS's or only support
windows. also I think I should be using SIMD of some sort, and I'm not
sure but I highly doubt C++ compilers support SIMD.
There's probably only 2 or 3 basic algorithms that will need to have all
that speed.
I won't know if the assembler is faster until I embed it, and if I'm going
to do that I might as well use it.
Although it's true I'd only have to embed it for one system to see (more or
less).
yeah I don't know much about that, I was figuring perhaps I could limit the
assembler parts / methodology to something I could write generically
enough.. and if all else fails write for the other OS's or only support
windows. also I think I should be using SIMD of some sort, and I'm not
sure but I highly doubt C++ compilers support SIMD.
I won't know if the assembler is faster until I embed it, and if I'm going
to do that I might as well use it.
Although it's true I'd only have to embed it for one system to see (more or
less).
Why wouldn't the compilers support it? It's part of the x86
architexture,
isn't it?
There's probably only 2 or 3 basic algorithms that will need to have all
that speed.
I won't know if the assembler is faster until I embed it, and if I'm going
to do that I might as well use it.
Yeah, but I don't know if it uses it by default, and my guess is it
depends on how the compiler back end goes about optimizing the code
for whether it will see data access/computation patterns amenable to
SIMD.
perhaps you explicitly use them with some extended syntax or something?
Can anyone give me pointers/instructions/a template for writing a Python
extension in assembly (or better, HLA)?
compiling for 32 bit architectures, and using sse instructions is
default on x86-64 architectures, but you can use -march=(some
architecture with simd instructions), -msse, -msse2, -msse3, or
-mfpmath=(one of 387, sse, or sse,387) to get the compiler to use
them.
As long as we're talking about compilers and such... anybody want to
chip in how this works in Python bytecode or what the bytecode
interpreter does? Okay, wait, before anybody says that's
implementation-dependent: does anybody want to chip in what the
CPython implementation does? (or any other implementation they're
familiar with, I guess)
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.