pythonXX.dll size: please split CJK codecs out

Discussion in 'Python' started by Giovanni Bajo, Aug 20, 2005.

  1. Hello,

    python24.dll is much bigger than python23.dll. This was discussed already on
    the newsgroup, see the thread starting here:
    http://mail.python.org/pipermail/python-list/2004-July/229096.html

    I don't think I fully understand the reason why additional .pyd modules were
    built into the .dll. OTOH, this does not help anyone, since:

    - Normal users don't care about the size of the pythonXX.dll, or the number of
    dependencies, nor if a given module is shipped as .py or .pyd. They just import
    modules of the standard library, ignoring where each module resides. So,
    putting more modules (or less modules) within pythonXX.dll makes absolutely no
    differences for them.
    - Users which freeze applications instead are *worse* served by this, because
    they end up with larger programs. For them, it is better to have the highest
    granularity wrt external modules, so that the resulting freezed application is
    as small as possible.

    A post in the previous thread (specifically
    http://mail.python.org/pipermail/python-list/2004-July/229157.html) suggests
    that py2exe users might get a small benefit from the fact that in some cases
    they would be able to ship the program with only 3 files (app.exe,
    python24.dll, and library.zip). But:

    1) I reckon this is a *very* rare case. You need to write an application that
    does not use Tk, socket, zlib, expat, nor any external library like numarray or
    PIL.
    2) Even if you fit the above case, you still end up with 3 files, which means
    you still have to package your app somehow, etc. Also, the resulting package
    will be *bigger* for no reason, as python24.dll might include modules which the
    user doesn't need.

    I don't think that merging things into python24.dll is a good way to serve
    users of freezing programs, not even py2exe users. Personally, I use McMillan's
    PyInstaller[1] which always builds a single executable, no matter what. So I do
    not like the idea that things are getting worse because of py2exe: py2exe
    should be fixed instead, if its users request to have fewer files to ship (in
    my case, for instance, this missing feature is a showstopper for adopting
    py2exe).

    Can we at least undo this unfortunate move in time for 2.5? I would be grateful
    if *at least* the CJK codecs (which are like 1Mb big) are splitted out of
    python25.dll. IMHO, I would prefer having *more* granularity, rather than
    *less*.

    +1 on splitting out the CJK codecs.

    Thanks,
    Giovanni Bajo


    [1] See also my page on PyInstaller: http://www.develer.com/oss/PyInstaller
     
    Giovanni Bajo, Aug 20, 2005
    #1
    1. Advertising

  2. Giovanni Bajo wrote:
    > I don't think I fully understand the reason why additional .pyd modules were
    > built into the .dll. OTOH, this does not help anyone, since:


    The reason is simple: a single DLL is easier to maintain. You only need
    to add the new files to the VC project, edit config.c, and be done. No
    new project to create for N different configurations, no messing with
    the MSI builder.

    In addition, having everything in a single DLL speeds up Python startup
    a little, since less file searching is necessary.

    > Can we at least undo this unfortunate move in time for 2.5? I would be grateful
    > if *at least* the CJK codecs (which are like 1Mb big) are splitted out of
    > python25.dll. IMHO, I would prefer having *more* granularity, rather than
    > *less*.


    If somebody would formulate a policy (i.e. conditions under which
    modules go into python2x.dll, vs. going into separate files), I'm
    willing to implement it. This policy should best be formulated in
    a PEP.

    The policy should be flexible wrt. to future changes. I.e. it should
    *not* say "do everything as in Python 2.3", because this means I
    would have to rip off the modules added after 2.3 entirely (i.e.
    not ship them at all). Instead, the policy should give clear guidance
    even for modules that are not yet developed.

    It should be a PEP, so that people can comment. For example,
    I think I would be -1 on a policy "make python2x.dll as minimal
    as possible, containing only modules that are absolutely
    needed for startup".

    Regards,
    Martin
     
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Aug 21, 2005
    #2
    1. Advertising

  3. Martin v. Löwis wrote:

    >> I don't think I fully understand the reason why additional .pyd
    >> modules were built into the .dll. OTOH, this does not help anyone,
    >> since:

    >
    > The reason is simple: a single DLL is easier to maintain. You only
    > need
    > to add the new files to the VC project, edit config.c, and be done. No
    > new project to create for N different configurations, no messing with
    > the MSI builder.


    FWIW, this just highlights how ineffecient your build system is. Everything you
    currently do by hand could be automated, including MSI generation. Also, you
    describe the Windows procedure, which I suppose it does not take into account
    what needs to be done for other OS. But I'm sure that revamping the Python
    building system is not a piece of cake.

    I'll take the point though: it's easier to maintain for developers, and most
    Python users don't care.

    > In addition, having everything in a single DLL speeds up Python
    > startup a little, since less file searching is necessary.


    I highly doubt this can be noticed in an actual benchmark, but I could be
    wrong. I can produce numbers though, if this can help people decide.

    >> Can we at least undo this unfortunate move in time for 2.5? I would
    >> be grateful if *at least* the CJK codecs (which are like 1Mb big)
    >> are splitted out of python25.dll. IMHO, I would prefer having *more*
    >> granularity, rather than *less*.

    >
    > If somebody would formulate a policy (i.e. conditions under which
    > modules go into python2x.dll, vs. going into separate files), I'm
    > willing to implement it. This policy should best be formulated in
    > a PEP.
    >
    > The policy should be flexible wrt. to future changes. I.e. it should
    > *not* say "do everything as in Python 2.3", because this means I
    > would have to rip off the modules added after 2.3 entirely (i.e.
    > not ship them at all). Instead, the policy should give clear guidance
    > even for modules that are not yet developed.
    >
    > It should be a PEP, so that people can comment. For example,
    > I think I would be -1 on a policy "make python2x.dll as minimal
    > as possible, containing only modules that are absolutely
    > needed for startup".


    I'm willing to write up such a PEP, but it's hard to devise an universal
    policy. Basically, the only element we can play with is the size of the
    resulting binary for the module. Would you like a policy like "split out every
    module whose binary on Windows is > X kbytes?".

    My personal preference would go to something "make python2x.dll include only
    the modules which are really core, like sys and os". This would also provide
    guidance to future modules, as they would simply go in external modules (I
    don't think really core stuff is being added right now).

    At this point, my main goal is getting CJK out of the DLL, so everything that
    lets me achieve this goal is good for me.

    Thanks,
    --
    Giovanni Bajo
     
    Giovanni Bajo, Aug 21, 2005
    #3
  4. Giovanni Bajo wrote:
    >
    > FWIW, this just highlights how ineffecient your build system is. Everything you
    > currently do by hand could be automated, including MSI generation.


    I'm sure Martin would be happy to consider a patch to make the build
    system more efficient. :)

    > I'm willing to write up such a PEP, but it's hard to devise an universal
    > policy.


    This is the reason that a PEP is needed before there are changes.
    --
    Michael Hoffman
     
    Michael Hoffman, Aug 21, 2005
    #4
  5. Revamping Python build system (Was: pythonXX.dll size: please split CJK codecs out)

    Michael Hoffman wrote:

    >> FWIW, this just highlights how ineffecient your build system is.
    >> Everything you currently do by hand could be automated, including
    >> MSI generation.

    >
    > I'm sure Martin would be happy to consider a patch to make the build
    > system more efficient. :)



    Out of curiosity, was this ever discussed among Python developers? Would
    something like scons qualify for this? OTOH, scons opens nasty
    self-bootstrapping issues (being written itself in Python).

    Before considering a patch (or even a PEP) for this, the basic requirements
    should be made clear. I know portability among several UNIX flavours is one,
    for instance. What are the others?
    --
    Giovanni Bajo
     
    Giovanni Bajo, Aug 21, 2005
    #5
  6. Giovanni Bajo wrote:
    > FWIW, this just highlights how ineffecient your build system is. Everything you
    > currently do by hand could be automated, including MSI generation. Also, you
    > describe the Windows procedure, which I suppose it does not take into account
    > what needs to be done for other OS. But I'm sure that revamping the Python
    > building system is not a piece of cake.


    You are wrong. It is not true that everything I do by hand could be
    automated. Atleast after automation, I still would have to do things
    by hand, namely invoke the automation.

    You probably haven't looked at the MSI generation at all: it *is*
    automatic. However, everytime something changes in the structure,
    the code generating the MSI must be adjusted to the new structure.

    > I'll take the point though: it's easier to maintain for developers, and most
    > Python users don't care.


    See, this I find surprising. If there really is such a big need for
    python24.dll being split in many more modules - why doesn't anybody
    just do this, and offers it as a separate installation for use
    with py2exe?

    The fact that this hasn't happened indicates that users don't need
    it badly enough. I personally rarely need to create a standalone
    Python application, but when I did, I just used freeze, and static
    linking. That way, I got a single binary, with no magic packaging,
    and a minimal one, too.

    >>In addition, having everything in a single DLL speeds up Python
    >>startup a little, since less file searching is necessary.

    >
    > I highly doubt this can be noticed in an actual benchmark, but I could be
    > wrong. I can produce numbers though, if this can help people decide.


    No, this is a minor issue. If you do write a PEP, and you find it
    relatively easy to compare the maximum modularization to the minimal
    one, it would be useful to underline your point, of course.

    > I'm willing to write up such a PEP, but it's hard to devise an universal
    > policy.


    Indeed. For Python 2.4, I made up a policy for myself: everything that
    does not depend on a separate (non-system) library goes into
    pythonxy.dll. That way, everybody will be able to compile Python
    from sources without downloading anything else, yet it causes minimum
    maintenance overhead. That's how the current python24.dll came about.

    > Basically, the only element we can play with is the size of the
    > resulting binary for the module. Would you like a policy like "split out every
    > module whose binary on Windows is > X kbytes?".


    It's less important what I like - I think I would ask for a poll on
    the proposed PEP, and I would be -1 on anything that means more work
    for contributors. But that would be only one voice, and, if a majority
    of the Windows Python users preferred your policy, it would be
    implemented (of course, somebody contributing the resulting project
    files or some automation for them would also help).

    > My personal preference would go to something "make python2x.dll include only
    > the modules which are really core, like sys and os". This would also provide
    > guidance to future modules, as they would simply go in external modules (I
    > don't think really core stuff is being added right now).


    Ok, then write that into the PEP. You would have to provide a definition
    for "core", e.g. "everything that is needed for startup".

    As a guideline, the Unix build process currently includes only the
    following modules by default:

    - marshal, imp, __main__, __builtin__, sys, exceptions: Modules
    living in Python/*.c
    - gc, signal: invoked directly from the interpreter
    - thread: not sure
    - posix, errno, _sre, _codecs, so that setup.py can run
    - zipimport, to avoid bootstrapping problems for importing python24.zip
    - _symtable, because setup.py cannot get the dependencies right
    - xxsubtype, for an undocumented reason I forgot

    Regards,
    Martin
     
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Aug 21, 2005
    #6
  7. Re: Revamping Python build system (Was: pythonXX.dll size: pleasesplit CJK codecs out)

    Giovanni Bajo wrote:
    >>I'm sure Martin would be happy to consider a patch to make the build
    >>system more efficient. :)

    >
    > Out of curiosity, was this ever discussed among Python developers? Would
    > something like scons qualify for this? OTOH, scons opens nasty
    > self-bootstrapping issues (being written itself in Python).


    No. The Windows build system must be integrated with Visual Studio.
    (Perhaps this is rather, "dunno: is it integrated with VS.NET 2003?")

    When developing on Windows, you really want all the support you can
    get from VS, e.g. when debugging, performing name completion, etc.
    To me, this makes it likely that only VS project files will work.

    > Before considering a patch (or even a PEP) for this, the basic requirements
    > should be made clear. I know portability among several UNIX flavours is one,
    > for instance. What are the others?


    Clearly, the starting requirement would be that you look at the build
    process *at all*. The Windows build process and the Unix build process
    are completely different. Portability is desirable only for the Unix
    build process; however, you might find that it already meets your needs
    quite well.

    Regards,
    Martin
     
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Aug 21, 2005
    #7
  8. Re: Revamping Python build system (Was: pythonXX.dll size: please split CJK codecs out)

    Martin v. Löwis wrote:

    >> Out of curiosity, was this ever discussed among Python developers?
    >> Would something like scons qualify for this? OTOH, scons opens nasty
    >> self-bootstrapping issues (being written itself in Python).

    >
    > No. The Windows build system must be integrated with Visual Studio.
    > (Perhaps this is rather, "dunno: is it integrated with VS.NET 2003?")
    > When developing on Windows, you really want all the support you can
    > get from VS, e.g. when debugging, performing name completion, etc.
    > To me, this makes it likely that only VS project files will work.


    You seem to ignore the fact that scons can easily generate VS.NET projects. And
    it does that by parsing the same file it could use to build the project
    directly (by invoking your Visual Studio); and that very same file would be the
    same under both Windows and UNIX.

    And even if we disabled this feature and build the project directly from
    command line, you could still edit your files with the Visual Studio
    environment and debug them in there (since you are still compiling them with
    Visual C, it's just scons invoking the compiler). You could even setup the
    environment so that when you press CTRL+SHIFT+B (or F7, if you have the old
    keybinding), it invokes scons and builds the project.

    So, if the requirement is "integration with Visual Studio", that is not an
    issue to switching to a different build process.

    >> Before considering a patch (or even a PEP) for this, the basic
    >> requirements should be made clear. I know portability among several
    >> UNIX flavours is one, for instance. What are the others?

    >
    > Clearly, the starting requirement would be that you look at the build
    > process *at all*.


    I compiled Python several times under Windows (both 2.2.x and 2.3.x) using
    Visual Studio 6, and one time under Linux. But I never investigated into it in
    detail.

    > The Windows build process and the Unix build process
    > are completely different.


    But there is no technical reason why it has to be so. I work on several
    portable projects, and they use the same build process under both Windows and
    Unix, while retaining full Visual Studio integration (I myself am a Visual
    Studio user).

    > Portability is desirable only for the Unix
    > build process; however, you might find that it already meets your
    > needs quite well.


    Well, you came up with a maintenance problem: you told me that building more
    external modules needs more effort. In a well-configured and fully-automated
    build system, when you add a file you have to write its name only one time in a
    project description file; if you want to build a dynamic library, you have to
    add a single line. This would take care of both Windows and UNIX, both
    compilation, packaging and installation.
    --
    Giovanni Bajo
     
    Giovanni Bajo, Aug 21, 2005
    #8
  9. Giovanni Bajo

    Ron Adam Guest

    Martin v. Löwis wrote:

    >>Can we at least undo this unfortunate move in time for 2.5? I would be grateful
    >>if *at least* the CJK codecs (which are like 1Mb big) are splitted out of
    >>python25.dll. IMHO, I would prefer having *more* granularity, rather than
    >>*less*.

    >
    > If somebody would formulate a policy (i.e. conditions under which
    > modules go into python2x.dll, vs. going into separate files), I'm
    > willing to implement it. This policy should best be formulated in
    > a PEP.


    +1 Yes, I think this needs to be addressed.

    > The policy should be flexible wrt. to future changes. I.e. it should
    > *not* say "do everything as in Python 2.3", because this means I
    > would have to rip off the modules added after 2.3 entirely (i.e.
    > not ship them at all). Instead, the policy should give clear guidance
    > even for modules that are not yet developed.


    Agree.

    > It should be a PEP, so that people can comment. For example,
    > I think I would be -1 on a policy "make python2x.dll as minimal
    > as possible, containing only modules that are absolutely
    > needed for startup".


    Also agree, Both the minimal and maximal dll size possible are ideals
    that are not the most optimal choices.

    I would put the starting minimum boundary as:

    1. "The minimum required to start the python interpreter with no
    additional required files."

    Currently python 2.4 (on windows) does not yet meet that guideline, so
    it seems some modules still need to be added while other modules, (I
    haven't checked which), are probably not needed to meet that guideline.

    This could be extended to:

    2. "The minimum required to run an agreed upon set of simple Python
    programs."

    I expect there may be a lot of differing opinions on just what those
    minimum Python programs should be. But that is where the PEP process
    comes in.


    Regards,
    Ron


    > Regards,
    > Martin
     
    Ron Adam, Aug 21, 2005
    #9
  10. Re: Revamping Python build system (Was: pythonXX.dll size: pleasesplit CJK codecs out)

    Giovanni Bajo wrote:
    > You seem to ignore the fact that scons can easily generate VS.NET projects.


    I'm not ignoring it - I'm not aware of it. And also, I don't quite
    believe it until I see it.

    > But there is no technical reason why it has to be so. I work on several
    > portable projects, and they use the same build process under both Windows and
    > Unix, while retaining full Visual Studio integration (I myself am a Visual
    > Studio user).


    Well, as long "F6" works...

    > Well, you came up with a maintenance problem: you told me that building more
    > external modules needs more effort. In a well-configured and fully-automated
    > build system, when you add a file you have to write its name only one time in a
    > project description file; if you want to build a dynamic library, you have to
    > add a single line. This would take care of both Windows and UNIX, both
    > compilation, packaging and installation.


    I very much doubt this is possible. For some modules, you also need to
    create autoconf fragments on Unix, for example, and you need might need
    to specify different libraries on different systems.

    Regards,
    Martin
     
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Aug 21, 2005
    #10
  11. Ron Adam wrote:
    > I would put the starting minimum boundary as:
    >
    > 1. "The minimum required to start the python interpreter with no
    > additional required files."
    >
    > Currently python 2.4 (on windows) does not yet meet that guideline, so
    > it seems some modules still need to be added while other modules, (I
    > haven't checked which), are probably not needed to meet that guideline.


    I'm not sure, either, but I *think* python24 won't load any .pyd file
    on interactive startup.

    > This could be extended to:
    >
    > 2. "The minimum required to run an agreed upon set of simple Python
    > programs."
    >
    > I expect there may be a lot of differing opinions on just what those
    > minimum Python programs should be. But that is where the PEP process
    > comes in.


    As I mentioned earlier, there also should be a negative list: modules
    that depend on external libraries should not be incorporated into
    python24.dll. Most notably, this rules out zlib.pyd, _bsddb.pyd,
    and _ssl.pyd, all of which people may consider to be useful into these
    simple programs.

    Regards,
    Martin
     
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Aug 21, 2005
    #11
  12. Giovanni Bajo

    Ron Adam Guest

    Martin v. Löwis wrote:
    > Ron Adam wrote:
    >
    >>I would put the starting minimum boundary as:
    >>
    >> 1. "The minimum required to start the python interpreter with no
    >>additional required files."
    >>
    >>Currently python 2.4 (on windows) does not yet meet that guideline, so
    >>it seems some modules still need to be added while other modules, (I
    >>haven't checked which), are probably not needed to meet that guideline.

    >
    >
    > I'm not sure, either, but I *think* python24 won't load any .pyd file
    > on interactive startup.
    >
    >
    >>This could be extended to:
    >>
    >> 2. "The minimum required to run an agreed upon set of simple Python
    >>programs."
    >>
    >>I expect there may be a lot of differing opinions on just what those
    >>minimum Python programs should be. But that is where the PEP process
    >>comes in.

    >
    >
    > As I mentioned earlier, there also should be a negative list: modules
    > that depend on external libraries should not be incorporated into
    > python24.dll.


    This fits under the above, rule #1, of not needing additional files.


    Most notably, this rules out zlib.pyd, _bsddb.pyd,
    > and _ssl.pyd, all of which people may consider to be useful into these
    > simple programs.


    I would not consider those as being part of "simple" programs. But
    that's only an opinion and we need something more objective than opinion.

    Now that I think of it.. Rule 2 above should be...

    2. "The minimum (modules) required to run an agreed upon set of
    "common simple" programs.

    Frequency of use is also an important consideration.

    Maybe there's a way to classify a programs complexity based on a set of
    attributes.

    So... program simplicity could consider:

    1. Complete program is a single .py file.
    2. Not larger than 'n' lines. (some reasonable limit)
    3. Limited number of import statements.
    (less than 'n' modules imported)
    4. Uses only stdio and/or basic file operations for input
    and output. (runs in interactive console or command line.)

    Then ranking the frequency of imported modules from this set of programs
    could give a good hint as to what might be included and those less
    frequently used that may be excluded.

    Setting a pythonxx.dll minimum file size goal could further help. For
    example if excluding modules result is less than the minimum goal, then
    a few extra more frequently used modules could be included as a bonus.

    This is obviously a "practical beats purity" exercise. ;-)

    Cheers,
    Ron




    > Regards,
    > Martin
     
    Ron Adam, Aug 21, 2005
    #12
  13. "Martin v. Löwis" <> writes:

    > Ron Adam wrote:
    >> I would put the starting minimum boundary as:
    >>
    >> 1. "The minimum required to start the python interpreter with no
    >> additional required files."
    >>
    >> Currently python 2.4 (on windows) does not yet meet that guideline, so
    >> it seems some modules still need to be added while other modules, (I
    >> haven't checked which), are probably not needed to meet that guideline.

    >
    > I'm not sure, either, but I *think* python24 won't load any .pyd file
    > on interactive startup.


    That seems to be true. But it will need zlib.pyd as soon if you try to
    import from compressed zips. So, zlib can be thought as part of the
    modules required for bootstrap.

    Thomas
     
    Thomas Heller, Aug 22, 2005
    #13
  14. Thomas Heller wrote:
    > That seems to be true. But it will need zlib.pyd as soon if you try to
    > import from compressed zips. So, zlib can be thought as part of the
    > modules required for bootstrap.


    Right. OTOH, linking zlib to pythonXY means that you cannot build Python
    at all anymore unless you also have zlib available.

    Regards,
    Martin
     
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Aug 22, 2005
    #14
  15. Giovanni Bajo

    Neil Benn Guest

    Giovanni Bajo wrote:

    >Hello,
    >
    >python24.dll is much bigger than python23.dll. This was discussed already on
    >the newsgroup, see the thread starting here:
    >http://mail.python.org/pipermail/python-list/2004-July/229096.html
    >
    >I don't think I fully understand the reason why additional .pyd modules were
    >built into the .dll. OTOH, this does not help anyone, since:
    >
    >- Normal users don't care about the size of the pythonXX.dll, or the number of
    >dependencies, nor if a given module is shipped as .py or .pyd. They just import
    >modules of the standard library, ignoring where each module resides. So,
    >putting more modules (or less modules) within pythonXX.dll makes absolutely no
    >differences for them.
    >- Users which freeze applications instead are *worse* served by this, because
    >they end up with larger programs. For them, it is better to have the highest
    >granularity wrt external modules, so that the resulting freezed application is
    >as small as possible.
    >
    >
    >

    <snip>
    1.8Mb - life's too short what gain would you get from removing 1Mb
    from that? So it can get on a floppy? ;-). That would be more effort
    than is needed, IMHO, even my handy/mobile phone/cell phone can easily
    cope with 1.8Mb!

    Neil

    --

    Neil Benn
    Senior Automation Engineer
    Cenix BioScience
    BioInnovations Zentrum
    Tatzberg 47
    D-01307
    Dresden
    Germany

    Tel : +49 (0)351 4173 154
    e-mail :
    Cenix Website : http://www.cenix-bioscience.com
     
    Neil Benn, Aug 23, 2005
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Lau Lei Cheong

    CJK character and HttpRequestValidation

    Lau Lei Cheong, Feb 1, 2005, in forum: ASP .Net
    Replies:
    0
    Views:
    861
    Lau Lei Cheong
    Feb 1, 2005
  2. Fred Grafe
    Replies:
    0
    Views:
    420
    Fred Grafe
    Dec 17, 2003
  3. gs
    Replies:
    2
    Views:
    437
    Andrew Clover
    Oct 24, 2004
  4. Cafe Babe

    regexp to match CJK characters

    Cafe Babe, Oct 28, 2006, in forum: Ruby
    Replies:
    8
    Views:
    231
    Yukihiro Matsumoto
    Oct 30, 2006
  5. Karl Knechtel
    Replies:
    2
    Views:
    378
    Walter Dörwald
    Jul 10, 2012
Loading...

Share This Page