read lines

H

Horacius ReX

Hi, I have a text file like this;

1 -33.453579
2 -148.487125
3 -195.067172
4 -115.958374
5 -100.597841
6 -121.566441
7 -121.025381
8 -132.103507
9 -108.939327
10 -97.046703
11 -52.866534
12 -48.432623
13 -112.790419
14 -98.516975
15 -98.724436

So I want to write a program in python that reads each line and
detects which numbers of the second column are the maximum and the
minimum.

I tried with;

import os, sys,re,string

# first parameter is the name of the data file
name1 = sys.argv[1]
infile1 = open(name1,"r")

# 1. get minimum and maximum

minimum=0
maximum=0


print " minimum = ",minimum
print " maximum = ",maximum


while 1:
line = infile1.readline()
ll = re.split("\s+",string.strip(line))
print ll[0],ll[1]
a=ll[0]
b=ll[1]
print a,b
if(b<minimum):
minimum=b
print " minimum= ",minimum
if(b>maximum):
maximum=b
print " maximum= ",maximum

print minimum, maximum


But it does not work and I get errors like;

Traceback (most recent call last):
File "translate_to_intervals.py", line 20, in <module>
print ll[0],ll[1]
IndexError: list index out of range


Could anybody help me ?

Thanks
 
C

Chris

Hi, I have a text file like this;

1 -33.453579
2 -148.487125
3 -195.067172
4 -115.958374
5 -100.597841
6 -121.566441
7 -121.025381
8 -132.103507
9 -108.939327
10 -97.046703
11 -52.866534
12 -48.432623
13 -112.790419
14 -98.516975
15 -98.724436

So I want to write a program in python that reads each line and
detects which numbers of the second column are the maximum and the
minimum.

I tried with;

import os, sys,re,string

# first parameter is the name of the data file
name1 = sys.argv[1]
infile1 = open(name1,"r")

# 1. get minimum and maximum

minimum=0
maximum=0

print " minimum = ",minimum
print " maximum = ",maximum

while 1:
line = infile1.readline()
ll = re.split("\s+",string.strip(line))
print ll[0],ll[1]
a=ll[0]
b=ll[1]
print a,b
if(b<minimum):
minimum=b
print " minimum= ",minimum
if(b>maximum):
maximum=b
print " maximum= ",maximum

print minimum, maximum

But it does not work and I get errors like;

Traceback (most recent call last):
File "translate_to_intervals.py", line 20, in <module>
print ll[0],ll[1]
IndexError: list index out of range

Could anybody help me ?

Thanks

You're not guaranteed to have that 2 or even 1 element after
splitting. If the line is empty or has 1 space you need to handle
it. Also is there really a need for regex for a simple string split ?

import sys

infile = open(sys.argv[1], 'r')
min, max = 0, 0

for each_line in infile.readlines():
if each_line.strip():
tmp = each_line.strip().split()
try:
b = tmp[1]
except IndexError:
continue
if b < min: min = b
if b > max: max = b
 
Z

Zepo Len

Hi, I have a text file like this;
1 -33.453579
2 -148.487125
....

So I want to write a program in python that reads each line and
detects which numbers of the second column are the maximum and the
minimum.

I tried with;

import os, sys,re,string

# first parameter is the name of the data file
name1 = sys.argv[1]
infile1 = open(name1,"r")

# 1. get minimum and maximum

minimum=0
maximum=0


print " minimum = ",minimum
print " maximum = ",maximum


while 1:
line = infile1.readline()
ll = re.split("\s+",string.strip(line))
print ll[0],ll[1]
a=ll[0]
b=ll[1]
print a,b
if(b<minimum):
minimum=b
print " minimum= ",minimum
if(b>maximum):
maximum=b
print " maximum= ",maximum

print minimum, maximum


But it does not work and I get errors like;

Traceback (most recent call last):
File "translate_to_intervals.py", line 20, in <module>
print ll[0],ll[1]
IndexError: list index out of range

Your regex is not working correctly I guess, I don't even know why you are
using a regex, something like this would work just fine:

import sys
nums = [float(line.split(' -')[1]) for line in open(sys.argv[1])]
print 'min=', min(nums), 'max=', max(nums)
 
N

Neil Cerutti

Hi, I have a text file like this;

1 -33.453579
2 -148.487125
3 -195.067172
4 -115.958374
5 -100.597841
6 -121.566441
7 -121.025381
8 -132.103507
9 -108.939327
10 -97.046703
11 -52.866534
12 -48.432623
13 -112.790419
14 -98.516975
15 -98.724436

So I want to write a program in python that reads each line and
detects which numbers of the second column are the maximum and
the minimum.

Check out 3.6.1 String Methods in the Python Library Reference.
It contains what you need.

Also, read about max and min from 2.1 Built-in Functions.
I tried with;

import os, sys,re,string

The string module is best avoided, except for a few character
classes, e.g., Paladins and Clerics. ;-) Use str methods instead.

It's more readable to import one module per line.
# first parameter is the name of the data file
name1 = sys.argv[1]
infile1 = open(name1,"r")

# 1. get minimum and maximum

minimum=0
maximum=0


print " minimum = ",minimum
print " maximum = ",maximum


while 1:
line = infile1.readline()

This isn't the best way to read files in Python. Check out 7.2
Reading and Writing Files in the Python Tutorial.
ll = re.split("\s+",string.strip(line))
print ll[0],ll[1]
a=ll[0]
b=ll[1]

Don't mix tabs and spaces. Python's Style Guide generally
recommends four spaces per indent.
print a,b
if(b<minimum):

readline returns str objects. You'll need to convert them to
numbers manually before comparing.
minimum=b
print " minimum= ",minimum
if(b>maximum):
maximum=b
print " maximum= ",maximum

print minimum, maximum


But it does not work and I get errors like;

Traceback (most recent call last):
File "translate_to_intervals.py", line 20, in <module>
print ll[0],ll[1]
IndexError: list index out of range

This is caused by line becoming an empty string when readline
encounters end of the file.
Could anybody help me ?

The following will not work in Python 2.4 or earlier.

from __future__ import with_statement
import sys
from operator import itemgetter
from contextmanager import closing

with closing(file(sys.argv[1])) as fp:
table = [(int(i), float(n)) for i, n in (line.split() for line in fp)]
print table
print "maximum =", max(table, key=itemgetter(1))
print "minimum =", min(table, key=itemgetter(1))
 
P

Piet van Oostrum

Horacius ReX said:
HR> while 1:
HR> line = infile1.readline()

You have an infinite loop. Fortunately your program stops because of the
error. When you encounter end of file, line becomes the empty string and
the split gives you only 1 item instead of 2.

So add the following:
if not line: break

Also your choice for 0 as initial values of minimum and maximum isn't good.
 
Z

Zepo Len

Your regex is not working correctly I guess, I don't even know why you
are using a regex, something like this would work just fine:

import sys
nums = [float(line.split(' -')[1]) for line in open(sys.argv[1])]
print 'min=', min(nums), 'max=', max(nums)

Sorry, that should be line.split() - didn't realise those were negative
numbers.
 
B

Bruno Desthuilliers

Chris a écrit :
Hi, I have a text file like this;

1 -33.453579
2 -148.487125
3 -195.067172
4 -115.958374
5 -100.597841
6 -121.566441
7 -121.025381
8 -132.103507
9 -108.939327
10 -97.046703
11 -52.866534
12 -48.432623
13 -112.790419
14 -98.516975
15 -98.724436

So I want to write a program in python that reads each line and
detects which numbers of the second column are the maximum and the
minimum.
(snip)

You're not guaranteed to have that 2 or even 1 element after
splitting. If the line is empty or has 1 space you need to handle
it. Also is there really a need for regex for a simple string split ?

import sys

infile = open(sys.argv[1], 'r')
min, max = 0, 0

# shadowing the builtin min and max functions may not be such
# a good idea !-)
# Also, you may want to use a sentinel value here instead:
mini, maxi = None, None
for each_line in infile.readlines():

# You don't need to read the whole file in memory
# the file object knows how to iterate over lines.
# Also, you may want to track line numbers so you can
# warn about an incorrect line, cf below

for linenum, line in enumerate(infile):
if each_line.strip():

# you're uselessly calling line.strip two times...
line = line.strip()
if line:
tmp = each_line.strip().split()

tmp = line.split()
try:
b = tmp[1]
# Notice that here, b is a string, not a number...
try:
b = int(tmp[1])
except (IndexError, TypeError), e:

# you may want to warn about incorrect/unexpected format here
# (writing to sys.stderr, since stdout is for normal outputs)
print >> sys.sdterr, \
"incorrect line format line %s ('%s') : %e" \
% (linenum, line, e)
continue

if b < min: min = b
if b > max: max = b

# If the first test succeeds, doing the second is useless.
# also, take into account the sentinel value. The identity test
# against None should not be too costly. If it was, it's simple to
# optimize it out of the for loop.

if mini is None or b < mini:
mini = b
elif maxi is None or b > maxi:
maxi = b


# closing the file might be a good idea too, at least for any
# serious app
infile.close()


Now there are also these two builtin functions min and max, and the
itertools tee() function...

import sys
from itertools import tee

def extract_number(iterable):
for linenum, line in enumerate(iterable):
try:
yield int(line.strip().split()[1])
except (IndexError, TypeError), e:
print >> sys.stderr, e
continue

# please add proper error handling around here
infile = open(sys.argv[1])
lines1, lines2 = tee(infile)
print min(extract_numbers(lines1)), max(extract_numbers(lines2))
infile.close()


HTH
 
B

Bruno Desthuilliers

Bruno Desthuilliers a écrit :
(snip)
# Notice that here, b is a string, not a number...
try:
b = int(tmp[1])

oops, I meant:
b = float(tmp[1])


Idem here:
def extract_number(iterable):
for linenum, line in enumerate(iterable):
try:
yield int(line.strip().split()[1])
yield float(line.strip().split()[1])
 
P

Peter Otten

Bruno said:
# You don't need to read the whole file in memory
lines1, lines2 = tee(infile)
print min(extract_numbers(lines1)), max(extract_numbers(lines2))

tee() internally maintains a list of items that were seen by
one but not all of the iterators returned. Therefore after calling min()
and before calling max() you have a list of one float per line in memory
which is quite close conceptually to reading the whole file in memory.

If you want to use memory efficiently, stick with the for-loop.

Peter
 
B

Bruno Desthuilliers

Peter Otten a écrit :
tee() internally maintains a list of items that were seen by
one but not all of the iterators returned. Therefore after calling min()
and before calling max() you have a list of one float per line in memory
which is quite close conceptually to reading the whole file in memory.

If you want to use memory efficiently, stick with the for-loop.

Indeed - I should have specified that the second version was not
necesseraly better wrt/ either perfs and/or resources usage. Thanks for
having made this point clear.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top