comparing two lists, ndiff performance

Discussion in 'Python' started by Zbigniew Braniecki, Jan 30, 2008.

  1. Hi all.
    I'm working on a tool for localizers.

    I have two Lists with Entities/Strings/Comments (each L10n file is built
    of those three elements).

    So I have sth like:

    l10nObject = []
    l10nObject.append(Comment('foo'))
    l10nObject.append("string")
    l10nObject.append(Entity('name', 'value'))

    etc. I can have many strings, many entities and many comments inside.

    At some point I want to compare two l10nObjects and see what entities
    were added, what strings were changed, what comments were removed.

    So What I did is that I basing on the structure of l10nObject created a
    list of elements with names of the types like:

    structure1 =
    ['Comment','Entity','Entity','str','str','Entity','Entity','Comment']

    and same for structure2.

    In result I want to have info about added/removed elements and then I
    can try to match which ones are in reality changed (so compare the value
    of two entities or two comments or two strings) etc.

    Long story short. I'm comparing two "structures" using ndiff.
    It takes LOONG time.

    In case of my script the ndiff takes 80% of the whole script time, and
    in result the new method with ndiff takes over 10 sec for 1000
    iterations, while the old one takes around4 sec for 1000 iterations.

    You can take a look at the code at:
    http://svn.braniecki.net/wsvn/Mozpyl10n/lib/mozilla/l10n/diff.py

    The def diffToObject at the end is the new method, while def
    compareL10nObjects is the old one.

    The new one is of course much better and cleaner (the old one is
    bloated), but I'm wondering if there is a faster way to compare two
    lists and find out what was added, what was removed, what was changed.
    I can simply iterate through two lists because I need to keep an order
    (so it's important that the removed line is after the 3 line which was
    not changed etc.)

    ndiff plays well here, but it seems to be extremely slow (1000
    iterations of diffToObject takes 10 sec, 7sec of this is in ndiff).

    Do you have any idea on how to compare those lists? I have similar
    problem with comparing two directory lists where I also need to keep the
    order, and I'm using the same ndiff method now.

    Is there a way to speed it up? Any easier way? Faster method?

    Greetings
    Zbigniew Braniecki
    Zbigniew Braniecki, Jan 30, 2008
    #1
    1. Advertising

  2. Zbigniew Braniecki

    Guest

    Zbigniew Braniecki:
    > Is there a way to speed it up? Any easier way? Faster method?


    This problem is a bit messy. Maybe it's better to sidestep the
    problem, and not use a list, and create an object that wraps the list,
    so it always keeps an updated record of what changes are done... but
    you have to notify it if you change the objects it contains.

    Bye,
    bearophile
    , Jan 30, 2008
    #2
    1. Advertising

  3. wrote:
    > Zbigniew Braniecki:
    >> Is there a way to speed it up? Any easier way? Faster method?

    >
    > This problem is a bit messy. Maybe it's better to sidestep the
    > problem, and not use a list, and create an object that wraps the list,
    > so it always keeps an updated record of what changes are done... but
    > you have to notify it if you change the objects it contains.


    That would be sweet... But I rarely will have it on the plate.

    In most cases I will load the two l10nObjects from the files and then
    I'll have to compare them in the way described above.

    So it's something like compare-locales or compare-l10n-directories
    script in the easiest form.

    and it'll be launched in the pessimistic case on around 40 locales each
    of them made of ~800 l10nObjects.

    I'll probably leave two methods. The faster for automated scripts which
    just have to catch changes and report that the file needs an update, and
    a detailed one for presenting it to the user.

    I was just thinking that there maybe exists a smart, fast, powerful
    method that would eliminate use of ndiff to compare two lists.

    Thanks for help!

    Greetings
    Zbigniew Braniecki
    Zbigniew Braniecki, Jan 30, 2008
    #3
  4. On 29 ene, 22:47, Zbigniew Braniecki <>
    wrote:

    > The new one is of course much better and cleaner (the old one is
    > bloated), but I'm wondering if there is a faster way to compare two
    > lists and find out what was added, what was removed, what was changed.
    > I can simply iterate through two lists because I need to keep an order
    > (so it's important that the removed line is after the 3 line which was
    > not changed etc.)
    >
    > ndiff plays well here, but it seems to be extremely slow (1000
    > iterations of diffToObject takes 10 sec, 7sec of this is in ndiff).


    ndiff does a quadratic process: first determines matching lines using
    a SequenceMatcher, then looks for near-matching lines and for each
    pair, compares them using another SequenceMatcher.
    You don't appear to be interested in what changed inside a line, just
    that it changed, so a simple SequenceMatcher would be enough.
    Output from SequenceMatcher is quite different than ndiff, but you'd
    have to reimplement the _compareLists method only.

    --
    Gabriel Genellina
    Gabriel Genellina, Jan 30, 2008
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bryan

    ndiff

    Bryan, Jul 24, 2003, in forum: Python
    Replies:
    2
    Views:
    483
    Raymond Hettinger
    Jul 25, 2003
  2. Humpdydum

    difflib.ndiff broken?

    Humpdydum, Jul 15, 2004, in forum: Python
    Replies:
    3
    Views:
    384
    Tim Peters
    Jul 16, 2004
  3. Ruby Quiz

    [QUIZ] NDiff (#46)

    Ruby Quiz, Sep 9, 2005, in forum: Ruby
    Replies:
    13
    Views:
    194
    Bil Kleb
    Sep 10, 2005
  4. Jim Freeze

    [QUIZ] NDiff (#46)

    Jim Freeze, Sep 12, 2005, in forum: Ruby
    Replies:
    3
    Views:
    77
    Brian Schröder
    Sep 13, 2005
  5. Ruby Quiz

    [SUMMARY] NDiff (#46)

    Ruby Quiz, Sep 15, 2005, in forum: Ruby
    Replies:
    2
    Views:
    87
    James Edward Gray II
    Sep 15, 2005
Loading...

Share This Page