Shared vs static link performance hit --and Windows?

Discussion in 'Python' started by Christos TZOTZIOY Georgiou, Jul 8, 2003.

  1. Last night I was compiling the latest python snapshot at my home Linux
    system (a K6-III @420 --the extra 20 Hz is overclocking :); then I tried
    building a shared version of the interpreter. I did some speed
    comparisons, and pystone reported ~6090 pystones for the shared and
    ~7680 pystones for the (default) static build.

    This is quite a difference, and while I do know what impact position
    independent code has on libraries, this was the first time I measured a
    difference of 25% in performance...

    This is not a complaint (I am happy with the static build and its
    speed), but it's a question because I know that in Windows the build is
    "shared" (there's a python23.dll which is about the same as the .so of
    the shared lib), and pystone performance is close to ~6100 pystones.

    I might say something stupid since I am not an "expert" on Windows
    architectures, but would it be feasible (and useful) to build a static
    python.exe on windows? Or is it that there would be no difference
    performance-wise in the final interpreter?

    Any replies are welcome.
    --
    TZOTZIOY, I speak England very best,
    Microsoft Security Alert: the Matrix began as open source.
     
    Christos TZOTZIOY Georgiou, Jul 8, 2003
    #1
    1. Advertising

  2. Christos> .... I did some speed comparisons, and pystone reported ~6090
    Christos> pystones for the shared and ~7680 pystones for the (default)
    Christos> static build.

    Christos> This is quite a difference, and while I do know what impact
    Christos> position independent code has on libraries, this was the first
    Christos> time I measured a difference of 25% in performance...

    It's possible that the Python interpreter makes (percentage-wise) more calls
    through the indirect links of the position-independent code. We know it
    makes lots of function calls (often to extension modules) to implement its
    functionality, so it's quite possible that the interpreter takes a much
    larger hit than your typical C program would.

    Skip
     
    Skip Montanaro, Jul 8, 2003
    #2
    1. Advertising

  3. [I: statically-linked python is 25% faster than dynamically-linked on my
    machine]

    [Skip: explains that python possibly uses excessive function calls to
    and fro shared libraries so that explains the difference]

    Thanks, Skip; now if only someone more Windows-knowledgeable than me
    could also comment on whether we could see a similar speed increase by
    building a static Windows EXE...

    I do have a computer with Visual Studio 6 available at work, but I don't
    know much about VS[1] to answer this myself; I gave it a try, but I got
    lost in the Project Settings and how to merge the pythoncore (the dll I
    believe) and the python (the exe) projects.

    [1] Although lately I did give a shot to compile a custom module for
    haar image transforms following the directions in the docs; it was
    trivial in Linux --kudos to all of you guys who worked on distutils--
    and fairly easy on Windows, so I have both an .so and a .pyd with the
    same functionality for my program...
    --
    TZOTZIOY, I speak England very best,
    Microsoft Security Alert: the Matrix began as open source.
     
    Christos TZOTZIOY Georgiou, Jul 8, 2003
    #3
  4. On Tue, 8 Jul 2003, Christos TZOTZIOY Georgiou wrote:

    > Thanks, Skip; now if only someone more Windows-knowledgeable than me
    > could also comment on whether we could see a similar speed increase by
    > building a static Windows EXE...


    AFAIK, Windows & OS/2 don't really use the concept of PIC code for DLLs,
    as the DLLs get mapped into system address space rather than user address
    space, and all processes that access DLL code access the code at the same
    address.

    --
    Andrew I MacIntyre "These thoughts are mine alone..."
    E-mail: (pref) | Snail: PO Box 370
    (alt) | Belconnen ACT 2616
    Web: http://www.andymac.org/ | Australia
     
    Andrew MacIntyre, Jul 9, 2003
    #4
  5. Andrew MacIntyre <> writes:

    > AFAIK, Windows & OS/2 don't really use the concept of PIC code for DLLs,
    > as the DLLs get mapped into system address space rather than user address
    > space, and all processes that access DLL code access the code at the same
    > address.


    Wrong, atleast for Win32. In Win32, there is no "system address
    space"; each process has its separate address space.

    Each DLL has a preferred load address, and it is linked to that
    address. If the dynamic loader can, at run-time, link the DLL to that
    address also, there won't be any relocations. If that address is
    already in use, the dynamic loader choses a different address, and
    they are not shared across address spaces anymore.

    For Python standard DLLs and PYDs, PC/dllbase_nt.txt lists the
    addresses that have been assigned for each DLL to avoid conflicts. For
    custom extension DLLs, most likely, the linker default is used, which
    will conflict with all other DLLs which don't use the /BASE linker
    flag, either.

    Regards,
    Martin
     
    Martin v. =?iso-8859-15?q?L=F6wis?=, Jul 9, 2003
    #5
  6. Christos "TZOTZIOY" Georgiou <> wrote in message news:<>...
    > Last night I was compiling the latest python snapshot at my home Linux
    > system (a K6-III @420 --the extra 20 Hz is overclocking :); then I tried
    > building a shared version of the interpreter. I did some speed
    > comparisons, and pystone reported ~6090 pystones for the shared and
    > ~7680 pystones for the (default) static build.
    >


    Yes, today i recommend to not use the -fPIC option for certain
    libraries when compiling a .so library. If you use it you get one more
    indirection and this can be very bad on systems with long CPU
    pipelines (PIV systems). If you don't use -fPIC then the shared
    library will be patched and is only shared on disk but not in memory.

    I hope that the UNIX community gives up this 20 year old ELF format
    and start to use a new one with better performance - look at KDE to
    see the pain.
     
    Lothar Scholz, Jul 9, 2003
    #6
  7. At some point, (Lothar Scholz) wrote:

    > Christos "TZOTZIOY" Georgiou <> wrote in message news:<>...
    >> Last night I was compiling the latest python snapshot at my home Linux
    >> system (a K6-III @420 --the extra 20 Hz is overclocking :); then I tried
    >> building a shared version of the interpreter. I did some speed
    >> comparisons, and pystone reported ~6090 pystones for the shared and
    >> ~7680 pystones for the (default) static build.
    >>

    >
    > Yes, today i recommend to not use the -fPIC option for certain
    > libraries when compiling a .so library. If you use it you get one more
    > indirection and this can be very bad on systems with long CPU
    > pipelines (PIV systems). If you don't use -fPIC then the shared
    > library will be patched and is only shared on disk but not in memory.


    On PowerPC Linux (or any PPC Unix using ELF), AFAIK, compiling shared
    libraries *without* -fPIC will run you into trouble. This is espically
    true if you then link to a library that was compiled with -fPIC.
    You'll get errors like 'R_PPC_REL24 relocation ...'. I certainly see
    problems with bad code that doesn't use -fPIC that runs on x86, but
    not on PPC.

    x86 has a different way of doing things from PPC that doesn't bite you
    on the ass when you compile without -fPIC. You especially run into
    trouble when you mix non-fPIC code with -fPIC code.

    [while we're on the subject, I'll point out that -fpic and -fPIC are
    not equivalent. -fpic can run out of space in the global offset table,
    while -fPIC avoids that. x86 doesn't have that limit, so I see people
    using -fpic where they should be using -fPIC.]

    --
    |>|\/|<
    /--------------------------------------------------------------------------\
    |David M. Cooke
    |cookedm(at)physics(dot)mcmaster(dot)ca
     
    David M. Cooke, Jul 9, 2003
    #7
  8. Christos TZOTZIOY Georgiou wrote:
    > I might say something stupid since I am not an "expert" on Windows
    > architectures, but would it be feasible (and useful) to build a static
    > python.exe on windows? Or is it that there would be no difference
    > performance-wise in the final interpreter?


    On windows, DLLs (and .pyd) can only call function in other DLLs, not in
    the EXE they are called from. So building a statically linked python
    would make it impossible to load dynamically .pyd files.

    OT: One could of course change the Python extending API so that every
    extension function gets a pointer to the interpreter instances and all
    API functions are members of the interpreter object. This adds an
    additional indirection, so it wouldn't help the performace. On the plus
    side, this would make it at least feasable that extenions are binary
    compatible to newer Pythons the same way they often are on Unix.

    Daniel
     
    Daniel Dittmar, Jul 9, 2003
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jasper
    Replies:
    1
    Views:
    454
    Joe Smith
    Jun 27, 2004
  2. Jeremy

    Performance hit on ASP.NET

    Jeremy, Jul 9, 2003, in forum: ASP .Net
    Replies:
    2
    Views:
    366
    Jeremy
    Jul 9, 2003
  3. Jason Shohet

    12 tables on aspx page: A performance hit?

    Jason Shohet, Dec 29, 2003, in forum: ASP .Net
    Replies:
    1
    Views:
    1,115
    Scott M.
    Dec 29, 2003
  4. johdi
    Replies:
    2
    Views:
    463
    Scott Allen
    Jul 8, 2004
  5. CK
    Replies:
    9
    Views:
    406
    Jerry Rasmussen
    Oct 19, 2006
Loading...

Share This Page