String Fomat Conversion

M

mcg

Investigating python day 1:

Data in file:
x y
1 2
3 4
5 6


Want to read file into an array of pairs.

in c: scanf("%d %d",&x,&y)---store x y in array, loop.

How do I do this in python??
In the actual application, the pairs are floating pt i.e. -1.003
 
S

Stephen Thorne

Investigating python day 1:

Data in file:
x y
1 2
3 4
5 6

Want to read file into an array of pairs.

in c: scanf("%d %d",&x,&y)---store x y in array, loop.

How do I do this in python??
In the actual application, the pairs are floating pt i.e. -1.003

f = file('input', 'r')
labels = f.readline() # consume the first line of the file.

Easy Option:
for line in f.readlines():
x, y = line.split()
x = float(x)
y = float(y)

Or, more concisely:
for line in f.readlines():
x, y = map(float, line.split())

Regards,
Stephen Thorne
 
S

Steven Bethard

Stephen said:
f = file('input', 'r')
labels = f.readline() # consume the first line of the file.

Easy Option:
for line in f.readlines():
x, y = line.split()
x = float(x)
y = float(y)

Or, more concisely:
for line in f.readlines():
x, y = map(float, line.split())

Somewhat more memory efficient:

lines_iter = iter(file('input'))
labels = lines_iter.next()
for line in lines_iter:
x, y = [float(f) for f in line.split()]

By using the iterator instead of readlines, I read only one line from
the file into memory at once, instead of all of them. This may or may
not matter depending on the size of your files, but using iterators is
generally more scalable, though of course it's not always possible.

I also opted to use a list comprehension instead of map, but this is
totally a matter of personal preference -- the performance differences
are probably negligible.

Steve
 
D

Dennis Benzinger

mcg said:
Investigating python day 1:

Data in file:
x y
1 2
3 4
5 6


Want to read file into an array of pairs.

in c: scanf("%d %d",&x,&y)---store x y in array, loop.

How do I do this in python??
In the actual application, the pairs are floating pt i.e. -1.003

Either do what the other posters wrote, or if you really like scanf
try the following Python module:

Scanf --- a pure Python scanf-like module
http://hkn.eecs.berkeley.edu/~dyoo/python/scanf/

Bye,
Dennis
 
S

Stephen Thorne

Stephen said:
f = file('input', 'r')
labels = f.readline() # consume the first line of the file.

Easy Option:
for line in f.readlines():
x, y = line.split()
x = float(x)
y = float(y)

Or, more concisely:
for line in f.readlines():
x, y = map(float, line.split())

Somewhat more memory efficient:

lines_iter = iter(file('input'))
labels = lines_iter.next()
for line in lines_iter:
x, y = [float(f) for f in line.split()]

By using the iterator instead of readlines, I read only one line from
the file into memory at once, instead of all of them. This may or may
not matter depending on the size of your files, but using iterators is
generally more scalable, though of course it's not always possible.

I just did a teensy test. All three options used exactly the same
amount of total memory.

I did all I did in the name of clarity, considering the OP was on his
first day with python. How I would actually write it would be:

inputfile = file('input','r')
inputfile.readline()
data = [map(float, line.split()) for line in inputfile]

Notice how you don't have to call iter() on it, you can treat it as an
iterable to begin with.

Stephen.
 
S

Steven Bethard

Stephen said:
I did all I did in the name of clarity, considering the OP was on his
first day with python. How I would actually write it would be:

inputfile = file('input','r')
inputfile.readline()
data = [map(float, line.split()) for line in inputfile]

Notice how you don't have to call iter() on it, you can treat it as an
iterable to begin with.

Beware of mixing iterator methods and readline:

http://docs.python.org/lib/bltin-file-objects.html

next( )
...In order to make a for loop the most efficient way of looping
over the lines of a file (a very common operation), the next() method
uses a hidden read-ahead buffer. As a consequence of using a read-ahead
buffer, combining next() with other file methods (like readline()) does
not work right.

I haven't tested your code in particular, but this warning was enough to
make me generally avoid mixing iter methods and other methods.

Steve
 
A

Alex Martelli

Steven Bethard said:
Beware of mixing iterator methods and readline:

_mixing_, yes. But -- starting the iteration after some other kind of
reading (readline, or read(N), etc) -- is OK...

http://docs.python.org/lib/bltin-file-objects.html

next( )
...In order to make a for loop the most efficient way of looping
over the lines of a file (a very common operation), the next() method
uses a hidden read-ahead buffer. As a consequence of using a read-ahead
buffer, combining next() with other file methods (like readline()) does
not work right.

I haven't tested your code in particular, but this warning was enough to
make me generally avoid mixing iter methods and other methods.

Yeah, I know... it's hard to explain exactly what IS a problem and what
isn't -- not to mention that this IS to some extent a matter of the file
object's implementation and the docs can't/don't want to constrain the
implementer's future freedom, should it turn out to matter. Sigh.

In the Nutshell (2nd ed), which is not normative and thus gives me a tad
more freedom, I have tried to be a tiny bit more specific, taking
advantage, also, of the fact that I'm now addressing the 2.3 and 2.4
implementations, only. Quoting from my current draft (pardon the XML
markup...):

"""
interrupting such a loop prematurely (e.g., with <c>break</c>), or
calling <r>f</r><c>.next()</c> instead of <r>f</r><c>.readline()</c>,
leaves the file's current position at an arbitrary value. If you want
to switch from using <r>f</r> as an iterator to calling other reading
methods on <r>f</r>, be sure to set the file's current position to a
known value by appropriately calling <r>f</r><c>.seek</c>.
"""

I hope this concisely indicates that the problem (in today's current
implementations) is only with switching FROM iteration TO other
approaches to reading, and (if the file is seekable) there's nothing so
problematic here that a good old 'seek' won't cure...


Alex
 
S

Steven Bethard

Alex said:
Beware of mixing iterator methods and readline:
[snip]

I hope this concisely indicates that the problem (in today's current
implementations) is only with switching FROM iteration TO other
approaches to reading, and (if the file is seekable) there's nothing so
problematic here that a good old 'seek' won't cure...

Thanks for the clarification!

Steve
 
J

Jeff Shannon

Stephen said:
I just did a teensy test. All three options used exactly the same
amount of total memory.

I would presume that, for a small file, the entire contents of the
file will be sucked into the read buffer implemented by the underlying
C file library. An iterator will only really save memory consumption
when the file size is greater than that buffer's size.

Actually, now that I think of it, there's probably another copy of the
data at Python level. For readlines(), that copy is the list object
itself. For iter and iter.next(), it's in the iterator's read-ahead
buffer. So perhaps memory savings will occur when *that* buffer size
is exceeded. It's also quite possible that both buffers are the same
size...

Anyhow, I'm sure that the fact that they use the same size for your
test is a reflection of buffering. The next question is, which
provides the most *conceptual* simplicity? (The answer to that one, I
think, depends on how your brain happens to see things...)

Jeff Shannon
Technician/Programmer
Credit International
 
E

enigma

Do you really need to use the iter function here? As far as I can
tell, a file object is already an iterator. The file object
documentation says that, "[a] file object is its own iterator, for
example iter(f) returns f (unless f is closed)." It doesn't look like
it makes a difference one way or the other, I'm just curious.
 
S

Steven Bethard

enigma said:
Do you really need to use the iter function here? As far as I can
tell, a file object is already an iterator. The file object
documentation says that, "[a] file object is its own iterator, for
example iter(f) returns f (unless f is closed)." It doesn't look like
it makes a difference one way or the other, I'm just curious.

Nope, you're right -- that's just my obsessive-compulsive disorder
kicking in. ;) A lot of objects aren't their own iterators, so I tend
to ask for an iterator with iter() when I know I want one. But for
files, this definitely isn't necessary:

py> file('temp.txt', 'w').write("""\
.... x y
.... 1 2
.... 3 4
.... 5 6
.... """)
py> f = file('temp.txt')
py> f.next()
'x y\n'
py> for line in f:
.... print [float(f) for f in line.split()]
....
[1.0, 2.0]
[3.0, 4.0]
[5.0, 6.0]

And to illustrate Alex Martelli's point that using readline, etc. before
using the file as an iterator is fine:

py> f = file('temp.txt')
py> f.readline()
'x y\n'
py> for line in f:
.... print [float(f) for f in line.split()]
....
[1.0, 2.0]
[3.0, 4.0]
[5.0, 6.0]

But using readline, etc. after using the file as an iterator is *not*
fine, generally:

py> f = file('temp.txt')
py> f.next()
'x y\n'
py> f.readline()
''

In this case, if my understanding's right, the entire file contents have
been read into the iterator buffer, so readline thinks the entire file's
been read in and gives you '' to indicate this.

Steve
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,019
Latest member
RoxannaSta

Latest Threads

Top