buffering choking sys.stdin.readlines() ?

C

cshirky

Newbie question:

I'm trying to turn a large XML file (~7G compressed) into a YAML file,
and my program seems to be buffering the input.

IOtest.py is just

import sys
for line in sys.stdin.readlines():
print line

but when I run

$ gzcat bigXMLfile.gz | IOtest.py

but it hangs then dies.

The goal of the program is to build a YAML file with print statements,
rather than building a gigantic nested dictionary, but I am obviously
doing something wrong in passing input through without buffering. Any
advice gratefully fielded.

-clay
 
D

Diez B. Roggisch

cshirky said:
Newbie question:

I'm trying to turn a large XML file (~7G compressed) into a YAML file,
and my program seems to be buffering the input.

IOtest.py is just

import sys
for line in sys.stdin.readlines():
print line

but when I run

$ gzcat bigXMLfile.gz | IOtest.py

but it hangs then dies.

The goal of the program is to build a YAML file with print statements,
rather than building a gigantic nested dictionary, but I am obviously
doing something wrong in passing input through without buffering. Any
advice gratefully fielded.

readlines() reads all of the file into the memory. Try using xreadlines,
the generator-version, instead. And I'm not 100% sure, but I *think* doing

for line in sys.stdin:
...

does exactly that.

Diez
 
C

cshirky

readlines() reads all of the file into the memory. Try using xreadlines,
the generator-version, instead. And I'm not 100% sure, but I *think* doing

for line in sys.stdin

both work -- many thanks.

-clay
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,431
Messages
2,571,677
Members
48,796
Latest member
Greg L.

Latest Threads

Top