Re: CRC-checksum failed in gzip

Discussion in 'Python' started by andrea crotti, Aug 1, 2012.

  1. Full traceback:

    Exception in thread Thread-8:
    Traceback (most recent call last):
    File "/user/sim/python/lib/python2.7/threading.py", line 530, in
    __bootstrap_inner
    self.run()
    File "/user/sim/tests/llif/AutoTester/src/AutoTester2.py", line 67, in run
    self.processJobData(jobData, logger)
    File "/user/sim/tests/llif/AutoTester/src/AutoTester2.py", line 204,
    in processJobData
    self.run_simulator(area, jobData[1] ,log)
    File "/user/sim/tests/llif/AutoTester/src/AutoTester2.py", line 142,
    in run_simulator
    report_file, percentage, body_text = SimResults.copy_test_batch(log, area)
    File "/user/sim/tests/llif/AutoTester/src/SimResults.py", line 274,
    in copy_test_batch
    out2_lines = out2.read()
    File "/user/sim/python/lib/python2.7/gzip.py", line 245, in read
    self._read(readsize)
    File "/user/sim/python/lib/python2.7/gzip.py", line 316, in _read
    self._read_eof()
    File "/user/sim/python/lib/python2.7/gzip.py", line 338, in _read_eof
    hex(self.crc)))
    IOError: CRC check failed 0x4f675fba != 0xa9e45aL


    - The file is written with the linux gzip program.
    - no I can't reproduce the error with the same exact file that did
    failed, that's what is really puzzling,
    there seems to be no clear pattern and just randmoly fails. The file
    is also just open for read from this program,
    so in theory no way that it can be corrupted.

    I also checked with lsof if there are processes that opened it but
    nothing appears..

    - can't really try on the local disk, might take ages unfortunately
    (we are rewriting this system from scratch anyway)
     
    andrea crotti, Aug 1, 2012
    #1
    1. Advertising

  2. On Wed, 01 Aug 2012 14:01:45 +0100, andrea crotti wrote:

    > Full traceback:
    >
    > Exception in thread Thread-8:


    "DANGER DANGER DANGER WILL ROBINSON!!!"

    Why didn't you say that there were threads involved? That puts a
    completely different perspective on the problem.

    I *was* going to write back and say that you probably had either file
    system corruption, or network errors. But now that I can see that you
    have threads, I will revise that and say that you probably have a bug in
    your thread handling code.

    I must say, Andrea, your initial post asking for help was EXTREMELY
    misleading. You over-simplified the problem to the point that it no
    longer has any connection to the reality of the code you are running.
    Please don't send us on wild goose chases after bugs in code that you
    aren't actually running.


    > there seems to be no clear pattern and just randmoly fails.


    When you start using threads, you have to expect these sorts of
    intermittent bugs unless you are very careful.

    My guess is that you have a bug where two threads read from the same file
    at the same time. Since each read shares state (the position of the file
    pointer), you're going to get corruption. Because it depends on timing
    details of which threads do what at exactly which microsecond, the effect
    might as well be random.

    Example: suppose the file contains three blocks A B and C, and a
    checksum. Thread 8 starts reading the file, and gets block A and B. Then
    thread 2 starts reading it as well, and gets half of block C. Thread 8
    gets the rest of block C, calculates the checksum, and it doesn't match.

    I recommend that you run a file system check on the remote disk. If it
    passes, you can eliminate file system corruption. Also, run some network
    diagnostics, to eliminate corruption introduced in the network layer. But
    I expect that you won't find anything there, and the problem is a simple
    thread bug. Simple, but really, really hard to find.

    Good luck.


    --
    Steven
     
    Steven D'Aprano, Aug 1, 2012
    #2
    1. Advertising

  3. 2012/8/1 Steven D'Aprano <>:
    > On Wed, 01 Aug 2012 14:01:45 +0100, andrea crotti wrote:
    >
    >> Full traceback:
    >>
    >> Exception in thread Thread-8:

    >
    > "DANGER DANGER DANGER WILL ROBINSON!!!"
    >
    > Why didn't you say that there were threads involved? That puts a
    > completely different perspective on the problem.
    >
    > I *was* going to write back and say that you probably had either file
    > system corruption, or network errors. But now that I can see that you
    > have threads, I will revise that and say that you probably have a bug in
    > your thread handling code.
    >
    > I must say, Andrea, your initial post asking for help was EXTREMELY
    > misleading. You over-simplified the problem to the point that it no
    > longer has any connection to the reality of the code you are running.
    > Please don't send us on wild goose chases after bugs in code that you
    > aren't actually running.
    >
    >
    >> there seems to be no clear pattern and just randmoly fails.

    >
    > When you start using threads, you have to expect these sorts of
    > intermittent bugs unless you are very careful.
    >
    > My guess is that you have a bug where two threads read from the same file
    > at the same time. Since each read shares state (the position of the file
    > pointer), you're going to get corruption. Because it depends on timing
    > details of which threads do what at exactly which microsecond, the effect
    > might as well be random.
    >
    > Example: suppose the file contains three blocks A B and C, and a
    > checksum. Thread 8 starts reading the file, and gets block A and B. Then
    > thread 2 starts reading it as well, and gets half of block C. Thread 8
    > gets the rest of block C, calculates the checksum, and it doesn't match.
    >
    > I recommend that you run a file system check on the remote disk. If it
    > passes, you can eliminate file system corruption. Also, run some network
    > diagnostics, to eliminate corruption introduced in the network layer. But
    > I expect that you won't find anything there, and the problem is a simple
    > thread bug. Simple, but really, really hard to find.
    >
    > Good luck.
    >


    Thanks a lot, that makes a lot of sense.. I haven't given this detail
    before because I didn't write this code, and I forgot that there were
    threads involved completely, I'm just trying to help to fix this bug.

    Your explanation makes a lot of sense, but it's still surprising that
    even just reading files without ever writing them can cause troubles
    using threads :/
     
    andrea crotti, Aug 1, 2012
    #3
  4. andrea crotti

    Laszlo Nagy Guest


    > Thanks a lot, that makes a lot of sense.. I haven't given this detail
    > before because I didn't write this code, and I forgot that there were
    > threads involved completely, I'm just trying to help to fix this bug.
    >
    > Your explanation makes a lot of sense, but it's still surprising that
    > even just reading files without ever writing them can cause troubles
    > using threads :/

    Make sure that file objects are not shared between threads. If that is
    possible. It will probably solve the problem (if that is related to
    threads).
     
    Laszlo Nagy, Aug 1, 2012
    #4
  5. 2012/8/1 Laszlo Nagy <>:
    >
    >> Thanks a lot, that makes a lot of sense.. I haven't given this detail
    >> before because I didn't write this code, and I forgot that there were
    >> threads involved completely, I'm just trying to help to fix this bug.
    >>
    >> Your explanation makes a lot of sense, but it's still surprising that
    >> even just reading files without ever writing them can cause troubles
    >> using threads :/

    >
    > Make sure that file objects are not shared between threads. If that is
    > possible. It will probably solve the problem (if that is related to
    > threads).



    Well I just have to create a lock I guess right?
    with lock:
    # open file
    # read content
     
    andrea crotti, Aug 1, 2012
    #5
  6. andrea crotti

    Laszlo Nagy Guest


    >> Make sure that file objects are not shared between threads. If that is
    >> possible. It will probably solve the problem (if that is related to
    >> threads).

    >
    > Well I just have to create a lock I guess right?

    That is also a solution. You need to call file.read() inside an acquired
    lock.
    > with lock:
    > # open file
    > # read content
    >

    But not that way! Your example will keep the lock acquired for the
    lifetime of the file, so it cannot be shared between threads.

    More likely:

    ## Open file
    lock = threading.Lock()
    fin = gzip.open(file_path...)
    # Now you can share the file object between threads.

    # and do this inside any thread:
    ## data needed. block until the file object becomes usable.
    with lock:
    data = fin.read(....) # other threads are blocked while I'm reading
    ## use your data here, meanwhile other threads can read
     
    Laszlo Nagy, Aug 1, 2012
    #6
  7. Am 01.08.2012 19:57, schrieb Laszlo Nagy:
    > ## Open file
    > lock = threading.Lock()
    > fin = gzip.open(file_path...)
    > # Now you can share the file object between threads.
    >
    > # and do this inside any thread:
    > ## data needed. block until the file object becomes usable.
    > with lock:
    > data = fin.read(....) # other threads are blocked while I'm reading
    > ## use your data here, meanwhile other threads can read


    Technically, that is correct, but IMHO its complete nonsense to share
    the file object between threads in the first place. If you need the data
    in two threads, just read the file once and then share the read-only,
    immutable content. If the file is small or too large to be held in
    memory at once, just open and read it on demand. This also saves you
    from having to rewind the file every time you read it.

    Am I missing something?

    Uli
     
    Ulrich Eckhardt, Aug 2, 2012
    #7
  8. 2012/8/1 Steven D'Aprano <>:
    >
    > When you start using threads, you have to expect these sorts of
    > intermittent bugs unless you are very careful.
    >
    > My guess is that you have a bug where two threads read from the same file
    > at the same time. Since each read shares state (the position of the file
    > pointer), you're going to get corruption. Because it depends on timing
    > details of which threads do what at exactly which microsecond, the effect
    > might as well be random.
    >
    > Example: suppose the file contains three blocks A B and C, and a
    > checksum. Thread 8 starts reading the file, and gets block A and B. Then
    > thread 2 starts reading it as well, and gets half of block C. Thread 8
    > gets the rest of block C, calculates the checksum, and it doesn't match.
    >
    > I recommend that you run a file system check on the remote disk. If it
    > passes, you can eliminate file system corruption. Also, run some network
    > diagnostics, to eliminate corruption introduced in the network layer. But
    > I expect that you won't find anything there, and the problem is a simple
    > thread bug. Simple, but really, really hard to find.
    >
    > Good luck.


    One last thing I would like to do before I add this fix is to actually
    be able to reproduce this behaviour, and I thought I could just do the
    following:

    import gzip
    import threading


    class OpenAndRead(threading.Thread):
    def run(self):
    fz = gzip.open('out2.txt.gz')
    fz.read()
    fz.close()


    if __name__ == '__main__':
    for i in range(100):
    OpenAndRead().start()


    But no matter how many threads I start, I can't reproduce the CRC
    error, any idea how I can try to help it happening?

    The code in run should be shared by all the threads since there are no
    locks, right?
     
    andrea crotti, Aug 2, 2012
    #8
  9. andrea crotti

    Laszlo Nagy Guest


    > Technically, that is correct, but IMHO its complete nonsense to share
    > the file object between threads in the first place. If you need the
    > data in two threads, just read the file once and then share the
    > read-only, immutable content. If the file is small or too large to be
    > held in memory at once, just open and read it on demand. This also
    > saves you from having to rewind the file every time you read it.
    >
    > Am I missing something?

    We suspect that his program reads the same file object from different
    threads. At least this would explain his problem. I agree with you -
    usually it is not a good idea to share a file object between threads.
    This is what I told him the first time. But it is not in our hands - he
    already has a program that needs to be fixed. It might be easier for him
    to protect read() calls with a lock. Because it can be done
    automatically, without thinking too much.
     
    Laszlo Nagy, Aug 2, 2012
    #9
  10. andrea crotti

    Laszlo Nagy Guest


    > One last thing I would like to do before I add this fix is to actually
    > be able to reproduce this behaviour, and I thought I could just do the
    > following:
    >
    > import gzip
    > import threading
    >
    >
    > class OpenAndRead(threading.Thread):
    > def run(self):
    > fz = gzip.open('out2.txt.gz')
    > fz.read()
    > fz.close()
    >
    >
    > if __name__ == '__main__':
    > for i in range(100):
    > OpenAndRead().start()
    >
    >
    > But no matter how many threads I start, I can't reproduce the CRC
    > error, any idea how I can try to help it happening?

    Your example did not share the file object between threads. Here an
    example that does that:

    class OpenAndRead(threading.Thread):
    def run(self):
    global fz
    fz.read(100)

    if __name__ == '__main__':
    fz = gzip.open('out2.txt.gz')
    for i in range(10):
    OpenAndRead().start()

    Try this with a huge file. And here is the one that should never throw
    CRC error, because the file object is protected by a lock:

    class OpenAndRead(threading.Thread):
    def run(self):
    global fz
    global fl
    with fl:
    fz.read(100)

    if __name__ == '__main__':
    fz = gzip.open('out2.txt.gz')
    fl = threading.Lock()
    for i in range(2):
    OpenAndRead().start()

    >
    > The code in run should be shared by all the threads since there are no
    > locks, right?

    The code is shared but the file object is not. In your example, a new
    file object is created, every time a thread is started.
     
    Laszlo Nagy, Aug 2, 2012
    #10
  11. 2012/8/2 Laszlo Nagy <>:
    >
    > Your example did not share the file object between threads. Here an example
    > that does that:
    >
    > class OpenAndRead(threading.Thread):
    > def run(self):
    > global fz
    > fz.read(100)
    >
    > if __name__ == '__main__':
    >
    > fz = gzip.open('out2.txt.gz')
    > for i in range(10):
    > OpenAndRead().start()
    >
    > Try this with a huge file. And here is the one that should never throw CRC
    > error, because the file object is protected by a lock:
    >
    > class OpenAndRead(threading.Thread):
    > def run(self):
    > global fz
    > global fl
    > with fl:
    > fz.read(100)
    >
    > if __name__ == '__main__':
    >
    > fz = gzip.open('out2.txt.gz')
    > fl = threading.Lock()
    > for i in range(2):
    > OpenAndRead().start()
    >
    >
    >>
    >> The code in run should be shared by all the threads since there are no
    >> locks, right?

    >
    > The code is shared but the file object is not. In your example, a new file
    > object is created, every time a thread is started.
    >



    Ok sure that makes sense, but then this explanation is maybe not right
    anymore, because I'm quite sure that the file object is *not* shared
    between threads, everything happens inside a thread..

    I managed to get some errors doing this with a big file
    class OpenAndRead(threading.Thread):
    def run(self):
    global fz
    fz.read(100)

    if __name__ == '__main__':

    fz = gzip.open('bigfile.avi.gz')
    for i in range(20):
    OpenAndRead().start()

    and it doesn't fail without the *global*, but this is definitively not
    what the code does, because every thread gets a new file object, it's
    not shared..

    Anyway we'll read once for all the threads or add the lock, and
    hopefully it should solve the problem, even if I'm not convinced yet
    that it was this.
     
    andrea crotti, Aug 2, 2012
    #11
  12. 2012/8/2 andrea crotti <>:
    >
    > Ok sure that makes sense, but then this explanation is maybe not right
    > anymore, because I'm quite sure that the file object is *not* shared
    > between threads, everything happens inside a thread..
    >
    > I managed to get some errors doing this with a big file
    > class OpenAndRead(threading.Thread):
    > def run(self):
    > global fz
    > fz.read(100)
    >
    > if __name__ == '__main__':
    >
    > fz = gzip.open('bigfile.avi.gz')
    > for i in range(20):
    > OpenAndRead().start()
    >
    > and it doesn't fail without the *global*, but this is definitively not
    > what the code does, because every thread gets a new file object, it's
    > not shared..
    >
    > Anyway we'll read once for all the threads or add the lock, and
    > hopefully it should solve the problem, even if I'm not convinced yet
    > that it was this.



    Just for completeness as suggested this also does not fail:

    class OpenAndRead(threading.Thread):
    def __init__(self, lock):
    threading.Thread.__init__(self)
    self.lock = lock

    def run(self):
    global fz
    with self.lock:
    fz.read(100)

    if __name__ == '__main__':
    lock = threading.Lock()
    fz = gzip.open('bigfile.avi.gz')
    for i in range(20):
    OpenAndRead(lock).start()
     
    andrea crotti, Aug 2, 2012
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. andrea crotti

    CRC-checksum failed in gzip

    andrea crotti, Aug 1, 2012, in forum: Python
    Replies:
    0
    Views:
    147
    andrea crotti
    Aug 1, 2012
  2. Laszlo Nagy

    Re: CRC-checksum failed in gzip

    Laszlo Nagy, Aug 1, 2012, in forum: Python
    Replies:
    0
    Views:
    151
    Laszlo Nagy
    Aug 1, 2012
  3. andrea crotti

    Re: CRC-checksum failed in gzip

    andrea crotti, Aug 1, 2012, in forum: Python
    Replies:
    0
    Views:
    155
    andrea crotti
    Aug 1, 2012
  4. Laszlo Nagy

    Re: CRC-checksum failed in gzip

    Laszlo Nagy, Aug 1, 2012, in forum: Python
    Replies:
    0
    Views:
    187
    Laszlo Nagy
    Aug 1, 2012
  5. Laszlo Nagy

    Re: CRC-checksum failed in gzip

    Laszlo Nagy, Aug 1, 2012, in forum: Python
    Replies:
    0
    Views:
    212
    Laszlo Nagy
    Aug 1, 2012
Loading...

Share This Page