tests

Discussion in 'Python' started by nikolay marinov, Aug 9, 2007.

  1. Hi, everyone.Does anybody have an idea how can i test two xls files for
    equality with Python
     
    nikolay marinov, Aug 9, 2007
    #1
    1. Advertising

  2. nikolay marinov

    Guest

    On Aug 9, 8:21 am, nikolay marinov <>
    wrote:
    > Hi, everyone.Does anybody have an idea how can i test two xls files for
    > equality with Python


    You should be able to read chunks of each file in binary mode and do a
    compare to check for equality. Some kind of loop should do the trick.

    Mike
     
    , Aug 9, 2007
    #2
    1. Advertising

  3. nikolay marinov

    brad Guest

    wrote:

    > You should be able to read chunks of each file in binary mode and do a
    > compare to check for equality. Some kind of loop should do the trick.


    Why not a simple md5 or sha with the hash library?
     
    brad, Aug 9, 2007
    #3
  4. nikolay marinov

    Guest

    On Aug 9, 4:04 pm, brad <> wrote:
    > wrote:
    > > You should be able to read chunks of each file in binary mode and do a
    > > compare to check for equality. Some kind of loop should do the trick.

    >
    > Why not a simple md5 or sha with the hash library?


    Or even:

    http://docs.python.org/lib/module-filecmp.html
     
    , Aug 9, 2007
    #4
  5. <> wrote in message
    news:...
    > On Aug 9, 4:04 pm, brad <> wrote:
    >> wrote:
    >> > You should be able to read chunks of each file in binary mode and do a
    >> > compare to check for equality. Some kind of loop should do the trick.

    >>
    >> Why not a simple md5 or sha with the hash library?

    >
    > Or even:
    >
    > http://docs.python.org/lib/module-filecmp.html
    >


    My understanding of reading that is that it only looks at the file names
    themselves and not their contents. So whether filename1=filename2 and in the
    case of the function below it, whether one directory has files which are in
    the other.
    Correct me if I'm wrong.
    Dom

    P.S. md5 or sha hash is what I'd go for, short of doing:

    MyFirstFile=file("file1.xls")
    MySecondFile=file("file2.xls")
    If MyFirstFile==MySecondFile:
    print "True"

    although this won't tell you where they're different, just that they are...
     
    special_dragonfly, Aug 9, 2007
    #5
  6. nikolay marinov

    Jason Guest

    On Aug 9, 8:46 am, "special_dragonfly" <>
    wrote:
    > <> wrote in message
    > >http://docs.python.org/lib/module-filecmp.html

    >
    > My understanding of reading that is that it only looks at the file names
    > themselves and not their contents. So whether filename1=filename2 and in the
    > case of the function below it, whether one directory has files which are in
    > the other.
    > Correct me if I'm wrong.
    > Dom
    >
    > P.S. md5 or sha hash is what I'd go for, short of doing:
    >
    > MyFirstFile=file("file1.xls")
    > MySecondFile=file("file2.xls")
    > If MyFirstFile==MySecondFile:
    > print "True"
    >
    > although this won't tell you where they're different, just that they are...


    You're incorrect. If the shallow flag is not given or is true, the
    results of os.stat are used to compare the two files, so if they have
    the same size, change times, etc, they're considered the same.

    If the shallow flag is given and is false, their contents are
    compared. In either case, the results are cached for efficiency's
    sake.

    --Jason


    The documentation for filecmp.cmp is:
    cmp( f1, f2[, shallow])
    Compare the files named f1 and f2, returning True if they seem
    equal, False otherwise.

    Unless shallow is given and is false, files with identical
    os.stat() signatures are taken to be equal.

    Files that were compared using this function will not be
    compared again unless their os.stat() signature changes.

    Note that no external programs are called from this function,
    giving it portability and efficiency.
     
    Jason, Aug 9, 2007
    #6
  7. nikolay marinov

    Steve Holden Guest

    Jason wrote:
    > On Aug 9, 8:46 am, "special_dragonfly" <>
    > wrote:
    >> <> wrote in message
    >>> http://docs.python.org/lib/module-filecmp.html

    >> My understanding of reading that is that it only looks at the file names
    >> themselves and not their contents. So whether filename1=filename2 and in the
    >> case of the function below it, whether one directory has files which are in
    >> the other.
    >> Correct me if I'm wrong.
    >> Dom
    >>
    >> P.S. md5 or sha hash is what I'd go for, short of doing:
    >>
    >> MyFirstFile=file("file1.xls")
    >> MySecondFile=file("file2.xls")
    >> If MyFirstFile==MySecondFile:
    >> print "True"
    >>
    >> although this won't tell you where they're different, just that they are...

    >
    > You're incorrect. If the shallow flag is not given or is true, the
    > results of os.stat are used to compare the two files, so if they have
    > the same size, change times, etc, they're considered the same.
    >
    > If the shallow flag is given and is false, their contents are
    > compared. In either case, the results are cached for efficiency's
    > sake.
    >
    > --Jason
    >
    >
    > The documentation for filecmp.cmp is:
    > cmp( f1, f2[, shallow])
    > Compare the files named f1 and f2, returning True if they seem
    > equal, False otherwise.
    >
    > Unless shallow is given and is false, files with identical
    > os.stat() signatures are taken to be equal.
    >
    > Files that were compared using this function will not be
    > compared again unless their os.stat() signature changes.
    >
    > Note that no external programs are called from this function,
    > giving it portability and efficiency.
    >


    This discussion seems to assume that Excel spreadsheets are stored in
    some canonical form so that two spreads with the same functionality are
    always identical on disk to the last bit. I very much doubt this is true
    (consider as an example the file properties that can be set).

    So really you need to define "equality". So far the tests discussed have
    concentrated on identifying identical files.

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    Holden Web LLC/Ltd http://www.holdenweb.com
    Skype: holdenweb http://del.icio.us/steve.holden
    --------------- Asciimercial ------------------
    Get on the web: Blog, lens and tag the Internet
    Many services currently offer free registration
    ----------- Thank You for Reading -------------
     
    Steve Holden, Aug 9, 2007
    #7
  8. nikolay marinov

    Jay Loden Guest

    Steve Holden wrote:
    > This discussion seems to assume that Excel spreadsheets are stored in
    > some canonical form so that two spreads with the same functionality are
    > always identical on disk to the last bit. I very much doubt this is true
    > (consider as an example the file properties that can be set).
    >
    > So really you need to define "equality". So far the tests discussed have
    > concentrated on identifying identical files.
    >
    > regards
    > Steve


    I was wondering myself if the OP was actually interested in binary identical
    files, or just duplicated content. If just duplicated content, perhaps this
    could be used as a starting point:

    http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/440661

    and the actual data can be compared

    -Jay
     
    Jay Loden, Aug 9, 2007
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. George Homorozeanu

    Saving the db state for bug reproduction and tests

    George Homorozeanu, Sep 20, 2005, in forum: ASP .Net
    Replies:
    0
    Views:
    428
    George Homorozeanu
    Sep 20, 2005
  2. ChrisC
    Replies:
    5
    Views:
    5,795
    rochnet
    Nov 1, 2009
  3. Guest
    Replies:
    3
    Views:
    773
    Guest
    Feb 5, 2004
  4. Replies:
    8
    Views:
    919
  5. dayo
    Replies:
    11
    Views:
    374
    Ilya Zakharevich
    Dec 16, 2005
Loading...

Share This Page