C / asm / long ints

Discussion in 'C Programming' started by fermineutron, Oct 26, 2006.

  1. fermineutron

    fermineutron Guest

    A while back i tried to calculate factorials of large numbers using
    arrays in C, the array encoded integer arithemetic that i wrote in C
    was very slow, it would take almost a second to multiply 2 array
    encoded integers. resently i looked at large precision libraries for C,
    in particular GMP but i was unable to get it to run with my compiler,
    aperantly some header files were not found. I was curious what is the
    best way to interface C and asm so that i could write simmilar library
    in asm and use it in C.

    Any ideas?
     
    fermineutron, Oct 26, 2006
    #1
    1. Advertising

  2. fermineutron said:

    > A while back i tried to calculate factorials of large numbers using
    > arrays in C, the array encoded integer arithemetic that i wrote in C
    > was very slow, it would take almost a second to multiply 2 array
    > encoded integers. resently i looked at large precision libraries for C,
    > in particular GMP but i was unable to get it to run with my compiler,
    > aperantly some header files were not found. I was curious what is the
    > best way to interface C and asm so that i could write simmilar library
    > in asm and use it in C.


    Assembly language is not a magic wand you can wave to turn bad algorithms
    into good ones. If your C routines weren't quick enough, your assembly
    language routines are very unlikely to be quick enough either.

    Choose Better Algorithms. Then implement them in C.

    --
    Richard Heathfield
    "Usenet is a strange place" - dmr 29/7/1999
    http://www.cpax.org.uk
    email: rjh at above domain (but drop the www, obviously)
     
    Richard Heathfield, Oct 26, 2006
    #2
    1. Advertising

  3. fermineutron

    Guest

    fermineutron wrote:
    > A while back i tried to calculate factorials of large numbers using
    > arrays in C, the array encoded integer arithemetic that i wrote in C
    > was very slow, it would take almost a second to multiply 2 array
    > encoded integers. resently i looked at large precision libraries for C,
    > in particular GMP but i was unable to get it to run with my compiler,
    > aperantly some header files were not found. I was curious what is the
    > best way to interface C and asm so that i could write simmilar library
    > in asm and use it in C.
    >
    > Any ideas?


    It would probably be easier to figure out how to make GMP
    work on your system. And worth the effort.
     
    , Oct 26, 2006
    #3
  4. fermineutron wrote:
    > A while back i tried to calculate factorials of large numbers using
    > arrays in C, the array encoded integer arithemetic that i wrote in C
    > was very slow


    You probably could write assembly to to the manipulations that you do in
    your multiply procedure "a little less slowly", but I suspect what you
    *really* want is something borrowed from higher maths: The FFT multiply.

    http://numbers.computation.free.fr/Constants/Algorithms/fft.html

    Variations on this theme are found in all kinds of programming
    situations where really big numbers are involved, and in your own
    experiments you stumbled upon the itch that is scratched by this approach.

    If approximate solutions to n! would be useful in your application,
    Stirling gave us something in the 18th century that finds all kinds
    of uses in computing today:
    http://mathworld.wolfram.com/StirlingsApproximation.html

    If you study computer science in a university setting, you will see
    Stirling's formula in the first course of discrete maths, and the DFT
    ("Discrete Fourier Transform") will come back to haunt you a few times
    in various algorithm and automata courses. (Or at least it *should*;
    some schools appear to not actually teach Computer Science.)
     
    Jack \Abram\ Off, Oct 27, 2006
    #4
  5. fermineutron

    Nishu Guest

    fermineutron wrote:
    > A while back i tried to calculate factorials of large numbers using
    > arrays in C, the array encoded integer arithemetic that i wrote in C
    > was very slow, it would take almost a second to multiply 2 array
    > encoded integers. resently i looked at large precision libraries for C,
    > in particular GMP but i was unable to get it to run with my compiler,
    > aperantly some header files were not found. I was curious what is the
    > best way to interface C and asm so that i could write simmilar library
    > in asm and use it in C.


    Some compilers support embedded asm but it is waste of time unless you
    understand the underlined processor architecture, its related
    instruction set, how your compiler generates the corresponding asm for
    your optimized C routine and so on. It is big time investment.
    Relatively, less time consuming method with appreciable return is to
    optimize the C routine itself; and there are various techinques to do
    so which are easily available on the net and also in some previous
    threads in this group.

    -Nishu
     
    Nishu, Oct 27, 2006
    #5
  6. "Nishu" <> wrote in message
    news:...
    > fermineutron wrote:


    [...]

    >> I was curious what is the
    >> best way to interface C and asm so that i could write simmilar library
    >> in asm and use it in C.


    [...]

    You make you libraries ABI follow the C calling convention for the processor
    your targeting. For x86 prams are passed on the stack left-to-right, and
    SPARC passed prams mostly in registers', ect...

    Here are two examples of how to use IA-32 assembly language to build a
    library that has a strict C API, and an ABI that follows the C calling
    convention for the IA-32:

    http://appcore.home.comcast.net/

    http://appcore.home.comcast.net/vzoom/refcount/
     
    Chris Thomasson, Oct 27, 2006
    #6
  7. fermineutron

    Richard Bos Guest

    "Jack \"Abram\" Off" <> wrote:

    > fermineutron wrote:
    > > A while back i tried to calculate factorials of large numbers using
    > > arrays in C, the array encoded integer arithemetic that i wrote in C
    > > was very slow

    >
    > You probably could write assembly to to the manipulations that you do in
    > your multiply procedure "a little less slowly", but I suspect what you
    > *really* want is something borrowed from higher maths: The FFT multiply.
    >
    > http://numbers.computation.free.fr/Constants/Algorithms/fft.html


    That's nice and fast, but it's a floating-point process and introduces
    errors. All very well for floating point computations which are already
    imprecise, but if you start out with a nice, exact integer array it's
    probably not what you want.

    Richard
     
    Richard Bos, Oct 27, 2006
    #7
  8. "fermineutron" <> wrote in message
    news:...
    > A while back i tried to calculate factorials of large numbers using
    > arrays in C, the array encoded integer arithemetic that i wrote in C
    > was very slow, it would take almost a second to multiply 2 array
    > encoded integers. resently i looked at large precision libraries for C,
    > in particular GMP but i was unable to get it to run with my compiler,
    > aperantly some header files were not found. I was curious what is the
    > best way to interface C and asm so that i could write simmilar library
    > in asm and use it in C.
    >
    > Any ideas?
    >


    About the only realistic way you're going to get x86 assembly to outperform
    highly optimized C, is to get a true x86 assembly expert, like Terje
    Mathisen, to code it for you.

    Most current C compilers are extremely efficient in generating assembly.
    I've gone to great lengths to outcode C compilers for x86 in a couple
    special situations. The best I could do was almost match the C compiler...
    The problem for you is that the C optimizer can take full advantage of
    extremely complicated situations and special instructions to generate the
    best code. In fact, some C optimizers generate hundreds of trial
    combinations. Most people just can't handle such complexity or convoluted
    situations.

    Are there ways to make your C code faster? Yes.

    1) buy a new computer, a 2Ghz AMD is roughly 1000 times faster than a 500
    Mhz AMD x86 CPU
    2) find a better algorithm, a brute force factorization may take a second or
    two, many times slower than an elliptic curve factorization
    3) switch from, say GCC, to a compiler which is known for more efficient
    code, say OpenWatcom or Digital Mars
    4) completely unroll any loops, this reduces branching which is always
    expensive in assembly
    5) completely unroll any loops, occasionally the loop size can be reduced by
    one, depending on how the loop was coded
    6) precompute as many operations as possible, even extremely large lookup
    tables are much faster than computation
    7) make an attempt to reduce the number of variables used in the
    calculations
    8) replace multiplications and divisions with additions, subtractions,
    bitshifts
    9) don't attempt to access integer data smaller than the largest assembly
    integer type of the CPU (32-bits for 32-bit cpu, 64-bit for...)
    10) play around with a decent number of compiler optimizations, usually a
    small number of them will other the most improvement
    11) although C compilers are very good with optimization, they aren't
    perfect. Forcing the use of a register can improve the code's speed


    Rod Pemberton
     
    Rod Pemberton, Oct 27, 2006
    #8
  9. Rod Pemberton said:

    >
    > "fermineutron" <> wrote in message
    > news:...
    >> A while back i tried to calculate factorials of large numbers using
    >> arrays in C, the array encoded integer arithemetic that i wrote in C
    >> was very slow, it would take almost a second to multiply 2 array
    >> encoded integers.


    <snip>

    > Are there ways to make your C code faster? Yes.
    >
    > 1) buy a new computer, a 2Ghz AMD is roughly 1000 times faster than a 500
    > Mhz AMD x86 CPU


    That ought to be unnecessary.

    > 2) find a better algorithm


    Of all your suggestions, this is the best. When the OP first posted on this
    subject (factorial calculations using bignums), it was immediately evident
    that his algorithms were suspect, since their implementations were very
    very very much slower (by orders of magnitude, IIRC) than my own routines,
    which I wrote with more regard to clarity than to speed.

    <snip>

    --
    Richard Heathfield
    "Usenet is a strange place" - dmr 29/7/1999
    http://www.cpax.org.uk
    email: rjh at above domain (but drop the www, obviously)
     
    Richard Heathfield, Oct 27, 2006
    #9
  10. fermineutron

    CBFalconer Guest

    Chris Thomasson wrote:
    >

    .... snip ...
    >
    > You make you libraries ABI follow the C calling convention for the
    > processor your targeting. For x86 prams are passed on the stack
    > left-to-right, and SPARC passed prams mostly in registers', ect...


    Is that rather hard on the babies in the prams? Child abuse? :)

    --
    Chuck F (cbfalconer at maineline dot net)
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net>
     
    CBFalconer, Oct 27, 2006
    #10
  11. "CBFalconer" <> wrote in message
    news:...
    > Chris Thomasson wrote:
    >>

    > ... snip ...
    >>
    >> You make you libraries ABI follow the C calling convention for the
    >> processor your targeting. For x86 prams are passed on the stack
    >> left-to-right, and SPARC passed prams mostly in registers', ect...

    >
    > Is that rather hard on the babies in the prams? Child abuse? :)


    Whoa! I meant params of course!


    lol

    ;^)
     
    Chris Thomasson, Oct 27, 2006
    #11
  12. "Richard Heathfield" <> wrote in message
    news:...
    > Rod Pemberton said:
    >
    > >
    > > "fermineutron" <> wrote in message
    > > news:...
    > >> A while back i tried to calculate factorials of large numbers using
    > >> arrays in C, the array encoded integer arithemetic that i wrote in C
    > >> was very slow, it would take almost a second to multiply 2 array
    > >> encoded integers.

    >
    > <snip>
    >
    > > Are there ways to make your C code faster? Yes.
    > >
    > > 1) buy a new computer, a 2Ghz AMD is roughly 1000 times faster than a

    500
    > > Mhz AMD x86 CPU

    >
    > That ought to be unnecessary.
    >
    > > 2) find a better algorithm

    >
    > Of all your suggestions, this is the best. When the OP first posted on

    this
    > subject (factorial calculations using bignums), it was immediately evident
    > that his algorithms were suspect, since their implementations were very
    > very very much slower (by orders of magnitude, IIRC) than my own routines,
    > which I wrote with more regard to clarity than to speed.
    >
    > <snip>
    >


    > > 2) find a better algorithm

    >
    > Of all your suggestions, this is the best.


    I read your post. I originally had the line: "I have to agree with
    Healthfield." but thought better of it... ;)

    A poor algorithm may be the _cause_ of his problems, but, the _best_
    suggestion, IMO, is to continue to use C and not switch to assembly. He
    could spend his lifetime attempting to get his assembly to outperform
    optimized C.

    All of the other methods work and are easy enough to implement too. If he
    applies all of them, except for two that offer the most improvement:
    1) finding a better algorithm
    2) buying a new computer
    he could still easily see a 300-900% improvement in speed.

    One has to ask why he wants large to compute large factorials. The usual
    answer leads indirectly to something which could be used to break
    encryption, or directly to the breaking of some encryption scheme. So, I
    made the assumption that the OP wanted the absolutely fastest implementation
    he could obtain without spending years coding assembly and without spending
    a fortune. The easiest way is to spend some money for faster equipment.
    You can buy an extremely fast PC for leas than $2k USD, or a 16-core Tyan
    Typhoon PSC for 10-16$k USD.


    Rod Pemberton
     
    Rod Pemberton, Oct 27, 2006
    #12
  13. Rod Pemberton said:

    <snip>
    >
    > One has to ask why he wants large to compute large factorials.


    Probably the same reason I do - for the hell of it.

    --
    Richard Heathfield
    "Usenet is a strange place" - dmr 29/7/1999
    http://www.cpax.org.uk
    email: rjh at above domain (but drop the www, obviously)
     
    Richard Heathfield, Oct 28, 2006
    #13
  14. Rod Pemberton wrote:
    <snip>
    >
    > About the only realistic way you're going to get x86 assembly to outperform
    > highly optimized C, is to get a true x86 assembly expert, like Terje
    > Mathisen, to code it for you.
    >

    <snip>

    Agreed for the most part, but one can often realize speed gains of an
    order of magnitude or more by using special-purpose assembly
    instructions that current-generation compilers generally do not emit.
    A compiler will generally optimize the hell out of loops dealing with
    bits in unsigned integers, but will generally not (yet) understand that
    the entire thing could be perhaps replaced by a popcntbd (POWER5
    population count) or a bsr (Pentium 4 bit scan reverse) depending on
    the loop, etc.

    If such a thing is at the heart of something like a packet classifier
    or image filter, even an assembly non-expert can affect a measurable
    performance gain.


    Mark F. Haigh
     
    Mark F. Haigh, Oct 28, 2006
    #14
  15. "Mark F. Haigh" <> wrote in message
    news:...
    > Rod Pemberton wrote:
    > <snip>
    > >
    > > About the only realistic way you're going to get x86 assembly to

    outperform
    > > highly optimized C, is to get a true x86 assembly expert, like Terje
    > > Mathisen, to code it for you.
    > >

    > <snip>
    >
    > Agreed for the most part, but one can often realize speed gains of an
    > order of magnitude or more by using special-purpose assembly
    > instructions that current-generation compilers generally do not emit.
    > A compiler will generally optimize the hell out of loops dealing with
    > bits in unsigned integers, but will generally not (yet) understand that
    > the entire thing could be perhaps replaced by a popcntbd (POWER5
    > population count) or a bsr (Pentium 4 bit scan reverse) depending on
    > the loop, etc.
    >
    > If such a thing is at the heart of something like a packet classifier
    > or image filter, even an assembly non-expert can affect a measurable
    > performance gain.
    >
    >


    Although I no problem with your point there may be some situations where
    assembly is better, I currently believe that most of the instructions which
    aren't emitted by x86 compilers is because they aren't a good choice for
    speed. Using BSR as an example of a special-purpose instruction which could
    give performance gains was probably a bad example. The timings I have
    access to indicate that BSR can be very slow and it is non-pairable on the
    P4. It's not likely any compiler would use it in a situation requiring
    speed, but they may use it to reduce code size.

    cycle timings for BSR reg,reg:
    386 10+3n
    486 6-103
    P4 7-73


    Rod Pemberton
     
    Rod Pemberton, Oct 28, 2006
    #15
  16. Rod Pemberton wrote:
    > "Mark F. Haigh" <> wrote in message
    > news:...
    > > Rod Pemberton wrote:
    > > <snip>
    > > >
    > > > About the only realistic way you're going to get x86 assembly to

    > outperform
    > > > highly optimized C, is to get a true x86 assembly expert, like Terje
    > > > Mathisen, to code it for you.
    > > >

    > > <snip>
    > >
    > > Agreed for the most part, but one can often realize speed gains of an
    > > order of magnitude or more by using special-purpose assembly
    > > instructions that current-generation compilers generally do not emit.
    > > A compiler will generally optimize the hell out of loops dealing with
    > > bits in unsigned integers, but will generally not (yet) understand that
    > > the entire thing could be perhaps replaced by a popcntbd (POWER5
    > > population count) or a bsr (Pentium 4 bit scan reverse) depending on
    > > the loop, etc.
    > >
    > > If such a thing is at the heart of something like a packet classifier
    > > or image filter, even an assembly non-expert can affect a measurable
    > > performance gain.
    > >
    > >

    >
    > Although I no problem with your point there may be some situations where
    > assembly is better, I currently believe that most of the instructions which
    > aren't emitted by x86 compilers is because they aren't a good choice for
    > speed. Using BSR as an example of a special-purpose instruction which could
    > give performance gains was probably a bad example. The timings I have
    > access to indicate that BSR can be very slow and it is non-pairable on the
    > P4.


    The timings that I have access to indicate that bsr completes in a
    single cycle on the newer Core 2 Intel chips (Family 6, Model 0xF), so
    I will disagree with you on that. Whether or not this is a real
    "Pentium 4" is open to question, I suppose, but that's starting to
    drift off topic.

    I just pulled bsr off the top of my head anyways, as I have personally
    seen it dramatically decrease bitmap scanning times compared to a C
    language bit-scanning loop.

    Mark F. Haigh
     
    Mark F. Haigh, Oct 28, 2006
    #16
  17. fermineutron

    Guest

    Richard Bos wrote:
    > "Jack \"Abram\" Off" <> wrote:
    > > You probably could write assembly to to the manipulations that you do in
    > > your multiply procedure "a little less slowly", but I suspect what you
    > > *really* want is something borrowed from higher maths: The FFT multiply.
    > >
    > > http://numbers.computation.free.fr/Constants/Algorithms/fft.html

    >
    > That's nice and fast, but it's a floating-point process and introduces
    > errors. All very well for floating point computations which are already
    > imprecise, but if you start out with a nice, exact integer array it's
    > probably not what you want.


    You can perform convolutions of integer arrays (i.e.
    arbitrary-precision multiplies) *exactly* with a floating-point FFT as
    long as each integer is stored in a floating-point number with much
    larger precision, and this is precisely what is done in practice by
    most arbitrary-precision arithmetic packages.

    Steven
     
    , Oct 28, 2006
    #17
  18. fermineutron

    Richard Bos Guest

    wrote:

    > Richard Bos wrote:
    > > "Jack \"Abram\" Off" <> wrote:
    > > > You probably could write assembly to to the manipulations that you do in
    > > > your multiply procedure "a little less slowly", but I suspect what you
    > > > *really* want is something borrowed from higher maths: The FFT multiply.
    > > >
    > > > http://numbers.computation.free.fr/Constants/Algorithms/fft.html

    > >
    > > That's nice and fast, but it's a floating-point process and introduces
    > > errors. All very well for floating point computations which are already
    > > imprecise, but if you start out with a nice, exact integer array it's
    > > probably not what you want.

    >
    > You can perform convolutions of integer arrays (i.e.
    > arbitrary-precision multiplies) *exactly* with a floating-point FFT as
    > long as each integer is stored in a floating-point number with much
    > larger precision, and this is precisely what is done in practice by
    > most arbitrary-precision arithmetic packages.


    Myes. But are _much_ larger floating-point numbers really that much
    faster than appropriately handled integers? Even if you don't have these
    much larger FPs yet, and will have to emulate them? Don't get me wrong,
    I see the value of the method, but I'm not so sure of its value to the
    OP, who probably will have the same problems handling extended-precision
    FPs that he has handling extended-size integers.

    Richard
     
    Richard Bos, Oct 31, 2006
    #18
  19. fermineutron

    Richard Bos Guest

    "Chris Thomasson" <> wrote:

    > > fermineutron wrote:

    >
    > >> I was curious what is the
    > >> best way to interface C and asm so that i could write simmilar library
    > >> in asm and use it in C.

    >
    > You make you libraries ABI follow the C calling convention for the processor
    > your targeting. For x86 prams are passed on the stack left-to-right,


    Oh, are they? That'll be news to the ISO C Standard, which allows
    implementations to pull any trick they like to speed up function calls.

    And that's important; if you want an assembly function to speed up the
    program, you really don't want to use the slowest way available to call
    that function.

    Richard
     
    Richard Bos, Oct 31, 2006
    #19
  20. fermineutron wrote:
    > A while back i tried to calculate factorials of large numbers using
    > arrays in C, the array encoded integer arithemetic that i wrote in C
    > was very slow, it would take almost a second to multiply 2 array
    > encoded integers.



    Whoa! Did you just stuff one decimal digit per array element? Or one
    bit? It's hard to make arithmetic that slow.

    Try using an array of unsigned ints, keeping up to sizeof( int ) * 8
    bits per element. The code should run many many times faster. For
    instance if you were storing one decimal digit per element, using
    32-bit ints will make it run at least 400 million times faster.
    Really. You eliminate having to do a mod and divide per loop, and you
    have 2^32/10 times fewer loops.

    Or better yet find a working bignum library, there must be dozens of
    them for C.
     
    Ancient_Hacker, Oct 31, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    3
    Views:
    574
    Mark P
    Apr 3, 2005
  2. Chris N. Hinds

    unions with long long ints and doubles?

    Chris N. Hinds, Sep 30, 2003, in forum: C Programming
    Replies:
    3
    Views:
    425
    Barry Schwarz
    Oct 2, 2003
  3. Skybuck Flying

    ints ints ints and ints

    Skybuck Flying, Jul 8, 2004, in forum: C Programming
    Replies:
    24
    Views:
    851
    Jack Klein
    Jul 10, 2004
  4. Ray Dillinger

    format specifier for long long ints....

    Ray Dillinger, Mar 26, 2006, in forum: C Programming
    Replies:
    4
    Views:
    495
    Keith Thompson
    Mar 27, 2006
  5. David Geering

    longs, long longs, short short long ints . . . huh?!

    David Geering, Jan 8, 2007, in forum: C Programming
    Replies:
    15
    Views:
    578
    Keith Thompson
    Jan 11, 2007
Loading...

Share This Page