string parsing screwing up on large files?

D

Daniel Kramer

Hello, I'm fairly new to python but I've written a script that takes
in a special text file (a renderman .rib to be specific).. and filters
some of the commands. The .rib file is a simple text file, but in
some cases it's very large.. can be 20megs or more at times.

The script steps though each line looking for keywords and changes the
line if nessisary but most lines just pass in and out of the script
un-modified. The problem is sometimes the lines aren't written out
correctly and it's an intermittent problem. If I re-run the script
again on the same input usually it works fine. After filtering about
100 files i might get 4 or 5 that come out bad.. simply re-running
those fixes them.

Anyone know what I might look for? It's possible that the machine is
under a lot of i/o load and/or cpu load when it happens, but not sure
about that.. I normally send this processing to a render farm, so it's
hard to predict exactly what sort of load is going on at that time. It
feels like a buffer isn't getting flushed before the text is written
out.. or something like that.

Any suggestions where I might look?

thanks

daniel
 
R

Rene Pijlman

Daniel Kramer:
Any suggestions where I might look?

In the source code, probably. I've looked long and hard at your posting,
but I didn't find any bug there.
 
B

Bengt Richter

Hello, I'm fairly new to python but I've written a script that takes
in a special text file (a renderman .rib to be specific).. and filters
some of the commands. The .rib file is a simple text file, but in
some cases it's very large.. can be 20megs or more at times.

The script steps though each line looking for keywords and changes the
line if nessisary but most lines just pass in and out of the script
un-modified. The problem is sometimes the lines aren't written out
correctly and it's an intermittent problem. If I re-run the script
again on the same input usually it works fine. After filtering about
100 files i might get 4 or 5 that come out bad.. simply re-running
those fixes them.

Anyone know what I might look for? It's possible that the machine is
under a lot of i/o load and/or cpu load when it happens, but not sure
about that.. I normally send this processing to a render farm, so it's
hard to predict exactly what sort of load is going on at that time. It
feels like a buffer isn't getting flushed before the text is written
out.. or something like that.

Any suggestions where I might look?
What is telling you that some lines aren't correct? Renderman syntax errors?
Maybe if you saved the bad file(s) and re-ran the changes until you got a good
one, and then ran diff -u goodfile badfile to see how things were actually
changing, it would become clear. Or if not, you could post some diffs and
the code that should be accomplishing the changes, and we could go from there.

Is the code threaded? Are you perhaps clobbering something across threads
occasionally? Accidental name collisions? Unsychronized accesses?

You might also want to mention what platform and python version etc you are running.
Maybe there is a file system bug that an upgrade would fix? It doesn't happen often,
but it might be worth googling for for your platform.

Regards,
Bengt Richter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top