String Fomat Conversion

mcg · Jan 27, 2005

Investigating python day 1:

Data in file:
x y
1 2
3 4
5 6

Want to read file into an array of pairs.

in c: scanf("%d %d",&x,&y)---store x y in array, loop.

How do I do this in python??
In the actual application, the pairs are floating pt i.e. -1.003

Stephen Thorne · Jan 27, 2005

Investigating python day 1:

Data in file:
x y
1 2
3 4
5 6

Want to read file into an array of pairs.

in c: scanf("%d %d",&x,&y)---store x y in array, loop.

How do I do this in python??
In the actual application, the pairs are floating pt i.e. -1.003

f = file('input', 'r')
labels = f.readline() # consume the first line of the file.

Easy Option:
for line in f.readlines():
x, y = line.split()
x = float(x)
y = float(y)

Or, more concisely:
for line in f.readlines():
x, y = map(float, line.split())

Regards,
Stephen Thorne

Steven Bethard · Jan 27, 2005

Stephen said:
f = file('input', 'r')
labels = f.readline() # consume the first line of the file.

Easy Option:
for line in f.readlines():
x, y = line.split()
x = float(x)
y = float(y)

Or, more concisely:
for line in f.readlines():
x, y = map(float, line.split())

Somewhat more memory efficient:

lines_iter = iter(file('input'))
labels = lines_iter.next()
for line in lines_iter:
x, y = [float(f) for f in line.split()]

By using the iterator instead of readlines, I read only one line from
the file into memory at once, instead of all of them. This may or may
not matter depending on the size of your files, but using iterators is
generally more scalable, though of course it's not always possible.

I also opted to use a list comprehension instead of map, but this is
totally a matter of personal preference -- the performance differences
are probably negligible.

Steve

Dennis Benzinger · Jan 27, 2005

mcg said:
Investigating python day 1:

Data in file:
x y
1 2
3 4
5 6

Want to read file into an array of pairs.

in c: scanf("%d %d",&x,&y)---store x y in array, loop.

How do I do this in python??
In the actual application, the pairs are floating pt i.e. -1.003

Either do what the other posters wrote, or if you really like scanf
try the following Python module:

Scanf --- a pure Python scanf-like module
http://hkn.eecs.berkeley.edu/~dyoo/python/scanf/

Bye,
Dennis

Stephen Thorne · Jan 27, 2005

Stephen said:
Stephen said:

f = file('input', 'r')
labels = f.readline() # consume the first line of the file.

Easy Option:
for line in f.readlines():
x, y = line.split()
x = float(x)
y = float(y)

Or, more concisely:
for line in f.readlines():
x, y = map(float, line.split())

Click to expand...

Somewhat more memory efficient:

lines_iter = iter(file('input'))
labels = lines_iter.next()
for line in lines_iter:
x, y = [float(f) for f in line.split()]

By using the iterator instead of readlines, I read only one line from
the file into memory at once, instead of all of them. This may or may
not matter depending on the size of your files, but using iterators is
generally more scalable, though of course it's not always possible.

I just did a teensy test. All three options used exactly the same
amount of total memory.

I did all I did in the name of clarity, considering the OP was on his
first day with python. How I would actually write it would be:

inputfile = file('input','r')
inputfile.readline()
data = [map(float, line.split()) for line in inputfile]

Notice how you don't have to call iter() on it, you can treat it as an
iterable to begin with.

Stephen.

Steven Bethard · Jan 27, 2005

Stephen said:
I did all I did in the name of clarity, considering the OP was on his
first day with python. How I would actually write it would be:

inputfile = file('input','r')
inputfile.readline()
data = [map(float, line.split()) for line in inputfile]

Notice how you don't have to call iter() on it, you can treat it as an
iterable to begin with.

Beware of mixing iterator methods and readline:

http://docs.python.org/lib/bltin-file-objects.html

next( )
...In order to make a for loop the most efficient way of looping
over the lines of a file (a very common operation), the next() method
uses a hidden read-ahead buffer. As a consequence of using a read-ahead
buffer, combining next() with other file methods (like readline()) does
not work right.

I haven't tested your code in particular, but this warning was enough to
make me generally avoid mixing iter methods and other methods.

Steve

Alex Martelli · Jan 27, 2005

Steven Bethard said:
Beware of mixing iterator methods and readline:

_mixing_, yes. But -- starting the iteration after some other kind of
reading (readline, or read(N), etc) -- is OK...

http://docs.python.org/lib/bltin-file-objects.html

next( )
...In order to make a for loop the most efficient way of looping
over the lines of a file (a very common operation), the next() method
uses a hidden read-ahead buffer. As a consequence of using a read-ahead
buffer, combining next() with other file methods (like readline()) does
not work right.

I haven't tested your code in particular, but this warning was enough to
make me generally avoid mixing iter methods and other methods.

Yeah, I know... it's hard to explain exactly what IS a problem and what
isn't -- not to mention that this IS to some extent a matter of the file
object's implementation and the docs can't/don't want to constrain the
implementer's future freedom, should it turn out to matter. Sigh.

In the Nutshell (2nd ed), which is not normative and thus gives me a tad
more freedom, I have tried to be a tiny bit more specific, taking
advantage, also, of the fact that I'm now addressing the 2.3 and 2.4
implementations, only. Quoting from my current draft (pardon the XML
markup...):

"""
interrupting such a loop prematurely (e.g., with <c>break</c>), or
calling <r>f</r><c>.next()</c> instead of <r>f</r><c>.readline()</c>,
leaves the file's current position at an arbitrary value. If you want
to switch from using <r>f</r> as an iterator to calling other reading
methods on <r>f</r>, be sure to set the file's current position to a
known value by appropriately calling <r>f</r><c>.seek</c>.
"""

I hope this concisely indicates that the problem (in today's current
implementations) is only with switching FROM iteration TO other
approaches to reading, and (if the file is seekable) there's nothing so
problematic here that a good old 'seek' won't cure...

Alex

Steven Bethard · Jan 27, 2005

Alex said:
Beware of mixing iterator methods and readline:

Click to expand...

[snip]

I hope this concisely indicates that the problem (in today's current
implementations) is only with switching FROM iteration TO other
approaches to reading, and (if the file is seekable) there's nothing so
problematic here that a good old 'seek' won't cure...

Thanks for the clarification!

Steve

Jeff Shannon · Jan 27, 2005

Stephen said:
I just did a teensy test. All three options used exactly the same
amount of total memory.

I would presume that, for a small file, the entire contents of the
file will be sucked into the read buffer implemented by the underlying
C file library. An iterator will only really save memory consumption
when the file size is greater than that buffer's size.

Actually, now that I think of it, there's probably another copy of the
data at Python level. For readlines(), that copy is the list object
itself. For iter and iter.next(), it's in the iterator's read-ahead
buffer. So perhaps memory savings will occur when *that* buffer size
is exceeded. It's also quite possible that both buffers are the same
size...

Anyhow, I'm sure that the fact that they use the same size for your
test is a reflection of buffering. The next question is, which
provides the most *conceptual* simplicity? (The answer to that one, I
think, depends on how your brain happens to see things...)

Jeff Shannon
Technician/Programmer
Credit International

enigma · Jan 27, 2005

Do you really need to use the iter function here? As far as I can
tell, a file object is already an iterator. The file object
documentation says that, "[a] file object is its own iterator, for
example iter(f) returns f (unless f is closed)." It doesn't look like
it makes a difference one way or the other, I'm just curious.

Steven Bethard · Jan 27, 2005

enigma said:
Do you really need to use the iter function here? As far as I can
tell, a file object is already an iterator. The file object
documentation says that, "[a] file object is its own iterator, for
example iter(f) returns f (unless f is closed)." It doesn't look like
it makes a difference one way or the other, I'm just curious.

Nope, you're right -- that's just my obsessive-compulsive disorder
kicking in.

A lot of objects aren't their own iterators, so I tend
to ask for an iterator with iter() when I know I want one. But for
files, this definitely isn't necessary:

py> file('temp.txt', 'w').write("""\
.... x y
.... 1 2
.... 3 4
.... 5 6
.... """)
py> f = file('temp.txt')
py> f.next()
'x y\n'
py> for line in f:
.... print [float(f) for f in line.split()]
....
[1.0, 2.0]
[3.0, 4.0]
[5.0, 6.0]

And to illustrate Alex Martelli's point that using readline, etc. before
using the file as an iterator is fine:

py> f = file('temp.txt')
py> f.readline()
'x y\n'
py> for line in f:
.... print [float(f) for f in line.split()]
....
[1.0, 2.0]
[3.0, 4.0]
[5.0, 6.0]

But using readline, etc. after using the file as an iterator is *not*
fine, generally:

py> f = file('temp.txt')
py> f.next()
'x y\n'
py> f.readline()
''

In this case, if my understanding's right, the entire file contents have
been read into the iterator buffer, so readline thinks the entire file's
been read in and gives you '' to indicate this.

Steve

C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
SENTINEL CONTROL LOOP WHEN DEALING WITH TWO ARRAYS	1	Oct 26, 2023
Function is not worked in C	2	Jun 27, 2023
Problem with codewars.	5	Dec 4, 2023
Need help in debugging tic tac toe (Beginner)	0	Jun 28, 2023
Need help in debugging tic tac toe (beginner)	2	Jun 28, 2023
Docplex package in python	0	Nov 8, 2022
Blue J Ciphertext Program	2	Nov 22, 2023

String Fomat Conversion

mcg

Stephen Thorne

Steven Bethard

Dennis Benzinger

Stephen Thorne

Steven Bethard

Alex Martelli

Steven Bethard

Jeff Shannon

enigma

Steven Bethard

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads