Python and GMP.

  • Thread starter alessiogiovanni.baroni
  • Start date
A

alessiogiovanni.baroni

There are reasons why Python not used the GMP library for implementing
its long type?
 
D

Diez B. Roggisch

There are reasons why Python not used the GMP library for implementing
its long type?

Any reason it should? I don't know GMP (only that it exists), but adding
binary dependencies is always a tricky and in need of careful weighting
thing to do.

Diez
 
B

Benjamin Peterson

There are reasons why Python not used the GMP library for implementing
its long type?

Basically, GMP only becomes faster when the numbers are huge.
 
M

Mark Dickinson

 <alessiogiovanni.baroni <at> gmail.com> writes:




Basically, GMP only becomes faster when the numbers are huge.

That was true for one particular attempt in this direction (by
Victor Stinner); I don't know whether more work would have fixed
this.

The other major issue is licensing: as far as I recall, the
various discussions never came to a conclusion about the legal
implications of using GMP.

Mark
 
P

Paul Rubin

Benjamin Peterson said:
Basically, GMP only becomes faster when the numbers are huge.

In my experience GMP is about 4x faster than Python longs with 1024
bit numbers, a very common cryptographic size.
 
C

casevh

 <alessiogiovanni.baroni <at> gmail.com> writes:




Basically, GMP only becomes faster when the numbers are huge.

Python 3.1 is significantly faster than Python 2.x on 64-bit
platforms. The following times are for multiplication with 2, 30 and
300 decimal digits.

Testing 2 digits. This primarily measures the overhead for call GMP
via an extension module.

$ py25 -m timeit -s "a=long('23'*1);b=long('47'*1)" "c=a*b"
10000000 loops, best of 3: 0.15 usec per loop
$ py25 -m timeit -s "a=int('23'*1);b=int('47'*1)" "c=a*b"
10000000 loops, best of 3: 0.0735 usec per loop
$ py31 -m timeit -s "a=int('23'*1);b=int('47'*1)" "c=a*b"
10000000 loops, best of 3: 0.074 usec per loop
$ py25 -m timeit -s "import gmpy;a=gmpy.mpz('23'*1);b=gmpy.mpz
('47'*1)" "c=a*b"
10000000 loops, best of 3: 0.121 usec per loop


Testing 30 digits. No significant increase in time with GMP.

$ py25 -m timeit -s "a=long('23'*15);b=long('47'*15)" "c=a*b"
1000000 loops, best of 3: 0.343 usec per loop
$ py31 -m timeit -s "a=int('23'*15);b=int('47'*15)" "c=a*b"
10000000 loops, best of 3: 0.142 usec per loop
$ py25 -m timeit -s "import gmpy;a=gmpy.mpz('23'*15);b=gmpy.mpz
('47'*15)" "c=a*b"
10000000 loops, best of 3: 0.125 usec per loop

Testing 300 digits.

$ py25 -m timeit -s "a=long('23'*150);b=long('47'*150)" "c=a*b"
100000 loops, best of 3: 12.5 usec per loop
$ py31 -m timeit -s "a=int('23'*150);b=int('47'*150)" "c=a*b"
100000 loops, best of 3: 3.13 usec per loop
$ py25 -m timeit -s "import gmpy;a=gmpy.mpz('23'*150);b=gmpy.mpz
('47'*150)" "c=a*b"
1000000 loops, best of 3: 0.673 usec per loop

Platform is 64-bit Linux with Core2 Duo processor. gmpy was linked
against MPIR 1.1. MPIR is an LGPLv2 fork of GMP and is significantly
faster than GMP 4.2.x. The newly released GMP 4.3.0 is about 10%
faster yet.

casevh
 
P

Paul Rubin

casevh said:
Python 3.1 is significantly faster than Python 2.x on 64-bit
platforms. The following times are for multiplication with 2, 30 and
300 decimal digits.

Could you test pow(a,b,c) where a,b,c are each 300 decimal digits?
This is an important operation in cryptography, that GMP is carefully
optimized for. Thanks.
 
A

alessiogiovanni.baroni

Could you test pow(a,b,c) where a,b,c are each 300 decimal digits?
This is an important operation in cryptography, that GMP is carefully
optimized for.  Thanks.

Ok, thanks for your answers. I understand the problems of licensing,
but
we could to learn from GMP's source code to improve the Python's int
implementation,
mainly because, GMP is very fast. We could violate the GPL?
 
M

Mark Dickinson

Ok, thanks for your answers. I understand the problems of licensing,
but
we could to learn from GMP's source code to improve the Python's int
implementation,
mainly because, GMP is very fast. We could violate the GPL?

Suggestions for ways to improve Python's int implementation are
very welcome. But note that Python favours portability (and also
readability and maintainability of source code) over speed here:
at least some of GMP's speed comes from using hand-crafted
assembler for common platforms, and that's really a no-go
area for Python.

There's at least one more optimization (to multiplication, as it
happens) that I plan to get in before 3.1. See

http://bugs.python.org/issue3944

Apart from that, I'm not sure there's much snot left to be
optimized out, as they say...

Mark
 
B

bearophileHUGS

casevh:
Testing 2 digits. This primarily measures the overhead for call GMP
via an extension module.
...

Thank you for adding some actual data to the whole discussion :)
If you perform similar benchmarks with Bigints of Java you will see
how much slower they are compared to the Python ones.


Mark Dickinson:
Apart from that, I'm not sure there's much snot left to be optimized out, as they say...<

Using inline ASM in Python sources isn't an option.

Bye,
bearophile
 
M

Mark Dickinson

Using inline ASM in Python sources isn't an option.

Except when it is. :)

There's a tiny amount of inline assembler in the
sources already: see Python/pymath.c and
Python/ceval.c. Not surprisingly, there's some
in the ctypes module as well.

There *are* places where it's very tempting to
add a little (optional, removable, carefully
tested, etc.) assembler to the long implementation:
one main reason that using 30-bit digits for longs
is slower (for some benchmarks) than using 15-bit
digits on 32-bit platforms is that there's no way
to tell C to do a 64-bit by 32-bit division, in cases
where you know (from understanding of the algorithm)
that the quotient fits into 32 bits.

On x86, replacing just two of the divisions
in Objects/longsobject.c by the appropriate 'divl'
inline assembler got me 10% speedups on some
benchmarks.

Mark
 
C

casevh

Could you test pow(a,b,c) where a,b,c are each 300 decimal digits?
This is an important operation in cryptography, that GMP is carefully
optimized for.  Thanks.

$ py25 -m timeit -s "a=long('23'*150);b=long('47'*150);m=long
('79'*150)" "c=pow(a,b,m)"
10 loops, best of 3: 52.7 msec per loop
$ py31 -m timeit -s "a=int('23'*150);b=int('47'*150);m=int('79'*150)"
"c=pow(a,b,m)"
100 loops, best of 3: 8.85 msec per loop
$ py25 -m timeit -s "import gmpy;a=gmpy.mpz('23'*150);b=gmpy.mpz
('47'*150);m=gmpy.mpz('79'*150)" "c=pow(a,b,m)"
1000 loops, best of 3: 1.26 msec per loop

casevh
 
P

Paul Rubin

Ok, thanks for your answers. I understand the problems of licensing,
but we could to learn from GMP's source code to improve the Python's
int implementation, mainly because, GMP is very fast.

GMP is a lot more complicated than Python bigint arithmetic. It's
optimized separately for many different cases and operations. It uses
assembly language for innder loops. And the modular exponential
operation pow(a,b,c) is especially carefully tuned in complicated
ways. Python tries to just have straightforward, reasonably portable
bigints.

I did something like

try:
import gmpy
pow=gmpy.modexp # or whatever it was called
except ImportError:
pass

in an application a while back, so that it would use gmpy if possible
and Python longs otherwise. That worked pretty well.
 
P

Paul Rubin

casevh said:
$ py25 -m timeit -s "a=long('23'*150);b=long('47'*150);m=long
('79'*150)" "c=pow(a,b,m)"
10 loops, best of 3: 52.7 msec per loop
$ py31 -m timeit -s ....
100 loops, best of 3: 8.85 msec per loop
$ py25 -m timeit -s ..."import gmpy ...
1000 loops, best of 3: 1.26 msec per loop

Wow, thanks. gmpy = 40x faster than py2.5. Ouch.
 
C

casevh

Wow, thanks.  gmpy = 40x faster than py2.5.  Ouch.

Remember this is on a 64-bit platform using the latest versions of
MPIR or GMP. The ratio would be less for older versions or 32-bit
versions.

casevh
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,608
Members
45,241
Latest member
Lisa1997

Latest Threads

Top