read lines

Discussion in 'Python' started by Horacius ReX, Dec 4, 2007.

  1. Horacius ReX

    Horacius ReX Guest

    Hi, I have a text file like this;

    1 -33.453579
    2 -148.487125
    3 -195.067172
    4 -115.958374
    5 -100.597841
    6 -121.566441
    7 -121.025381
    8 -132.103507
    9 -108.939327
    10 -97.046703
    11 -52.866534
    12 -48.432623
    13 -112.790419
    14 -98.516975
    15 -98.724436

    So I want to write a program in python that reads each line and
    detects which numbers of the second column are the maximum and the
    minimum.

    I tried with;

    import os, sys,re,string

    # first parameter is the name of the data file
    name1 = sys.argv[1]
    infile1 = open(name1,"r")

    # 1. get minimum and maximum

    minimum=0
    maximum=0


    print " minimum = ",minimum
    print " maximum = ",maximum


    while 1:
    line = infile1.readline()
    ll = re.split("\s+",string.strip(line))
    print ll[0],ll[1]
    a=ll[0]
    b=ll[1]
    print a,b
    if(b<minimum):
    minimum=b
    print " minimum= ",minimum
    if(b>maximum):
    maximum=b
    print " maximum= ",maximum

    print minimum, maximum


    But it does not work and I get errors like;

    Traceback (most recent call last):
    File "translate_to_intervals.py", line 20, in <module>
    print ll[0],ll[1]
    IndexError: list index out of range


    Could anybody help me ?

    Thanks
     
    Horacius ReX, Dec 4, 2007
    #1
    1. Advertising

  2. Horacius ReX

    Chris Guest

    On Dec 4, 2:14 pm, Horacius ReX <> wrote:
    > Hi, I have a text file like this;
    >
    > 1 -33.453579
    > 2 -148.487125
    > 3 -195.067172
    > 4 -115.958374
    > 5 -100.597841
    > 6 -121.566441
    > 7 -121.025381
    > 8 -132.103507
    > 9 -108.939327
    > 10 -97.046703
    > 11 -52.866534
    > 12 -48.432623
    > 13 -112.790419
    > 14 -98.516975
    > 15 -98.724436
    >
    > So I want to write a program in python that reads each line and
    > detects which numbers of the second column are the maximum and the
    > minimum.
    >
    > I tried with;
    >
    > import os, sys,re,string
    >
    > # first parameter is the name of the data file
    > name1 = sys.argv[1]
    > infile1 = open(name1,"r")
    >
    > # 1. get minimum and maximum
    >
    > minimum=0
    > maximum=0
    >
    > print " minimum = ",minimum
    > print " maximum = ",maximum
    >
    > while 1:
    > line = infile1.readline()
    > ll = re.split("\s+",string.strip(line))
    > print ll[0],ll[1]
    > a=ll[0]
    > b=ll[1]
    > print a,b
    > if(b<minimum):
    > minimum=b
    > print " minimum= ",minimum
    > if(b>maximum):
    > maximum=b
    > print " maximum= ",maximum
    >
    > print minimum, maximum
    >
    > But it does not work and I get errors like;
    >
    > Traceback (most recent call last):
    > File "translate_to_intervals.py", line 20, in <module>
    > print ll[0],ll[1]
    > IndexError: list index out of range
    >
    > Could anybody help me ?
    >
    > Thanks


    You're not guaranteed to have that 2 or even 1 element after
    splitting. If the line is empty or has 1 space you need to handle
    it. Also is there really a need for regex for a simple string split ?

    import sys

    infile = open(sys.argv[1], 'r')
    min, max = 0, 0

    for each_line in infile.readlines():
    if each_line.strip():
    tmp = each_line.strip().split()
    try:
    b = tmp[1]
    except IndexError:
    continue
    if b < min: min = b
    if b > max: max = b
     
    Chris, Dec 4, 2007
    #2
    1. Advertising

  3. Horacius ReX

    Zepo Len Guest

    > Hi, I have a text file like this;
    >
    > 1 -33.453579
    > 2 -148.487125
    > ....
    >
    > So I want to write a program in python that reads each line and
    > detects which numbers of the second column are the maximum and the
    > minimum.
    >
    > I tried with;
    >
    > import os, sys,re,string
    >
    > # first parameter is the name of the data file
    > name1 = sys.argv[1]
    > infile1 = open(name1,"r")
    >
    > # 1. get minimum and maximum
    >
    > minimum=0
    > maximum=0
    >
    >
    > print " minimum = ",minimum
    > print " maximum = ",maximum
    >
    >
    > while 1:
    > line = infile1.readline()
    > ll = re.split("\s+",string.strip(line))
    > print ll[0],ll[1]
    > a=ll[0]
    > b=ll[1]
    > print a,b
    > if(b<minimum):
    > minimum=b
    > print " minimum= ",minimum
    > if(b>maximum):
    > maximum=b
    > print " maximum= ",maximum
    >
    > print minimum, maximum
    >
    >
    > But it does not work and I get errors like;
    >
    > Traceback (most recent call last):
    > File "translate_to_intervals.py", line 20, in <module>
    > print ll[0],ll[1]
    > IndexError: list index out of range


    Your regex is not working correctly I guess, I don't even know why you are
    using a regex, something like this would work just fine:

    import sys
    nums = [float(line.split(' -')[1]) for line in open(sys.argv[1])]
    print 'min=', min(nums), 'max=', max(nums)
     
    Zepo Len, Dec 4, 2007
    #3
  4. Horacius ReX

    Neil Cerutti Guest

    On 2007-12-04, Horacius ReX <> wrote:
    > Hi, I have a text file like this;
    >
    > 1 -33.453579
    > 2 -148.487125
    > 3 -195.067172
    > 4 -115.958374
    > 5 -100.597841
    > 6 -121.566441
    > 7 -121.025381
    > 8 -132.103507
    > 9 -108.939327
    > 10 -97.046703
    > 11 -52.866534
    > 12 -48.432623
    > 13 -112.790419
    > 14 -98.516975
    > 15 -98.724436
    >
    > So I want to write a program in python that reads each line and
    > detects which numbers of the second column are the maximum and
    > the minimum.


    Check out 3.6.1 String Methods in the Python Library Reference.
    It contains what you need.

    Also, read about max and min from 2.1 Built-in Functions.

    > I tried with;
    >
    > import os, sys,re,string


    The string module is best avoided, except for a few character
    classes, e.g., Paladins and Clerics. ;-) Use str methods instead.

    It's more readable to import one module per line.

    > # first parameter is the name of the data file
    > name1 = sys.argv[1]
    > infile1 = open(name1,"r")
    >
    > # 1. get minimum and maximum
    >
    > minimum=0
    > maximum=0
    >
    >
    > print " minimum = ",minimum
    > print " maximum = ",maximum
    >
    >
    > while 1:
    > line = infile1.readline()


    This isn't the best way to read files in Python. Check out 7.2
    Reading and Writing Files in the Python Tutorial.

    > ll = re.split("\s+",string.strip(line))
    > print ll[0],ll[1]
    > a=ll[0]
    > b=ll[1]


    Don't mix tabs and spaces. Python's Style Guide generally
    recommends four spaces per indent.

    > print a,b
    > if(b<minimum):


    readline returns str objects. You'll need to convert them to
    numbers manually before comparing.

    > minimum=b
    > print " minimum= ",minimum
    > if(b>maximum):
    > maximum=b
    > print " maximum= ",maximum
    >
    > print minimum, maximum
    >
    >
    > But it does not work and I get errors like;
    >
    > Traceback (most recent call last):
    > File "translate_to_intervals.py", line 20, in <module>
    > print ll[0],ll[1]
    > IndexError: list index out of range


    This is caused by line becoming an empty string when readline
    encounters end of the file.

    > Could anybody help me ?


    The following will not work in Python 2.4 or earlier.

    from __future__ import with_statement
    import sys
    from operator import itemgetter
    from contextmanager import closing

    with closing(file(sys.argv[1])) as fp:
    table = [(int(i), float(n)) for i, n in (line.split() for line in fp)]
    print table
    print "maximum =", max(table, key=itemgetter(1))
    print "minimum =", min(table, key=itemgetter(1))

    --
    Neil Cerutti
     
    Neil Cerutti, Dec 4, 2007
    #4
  5. >>>>> Horacius ReX <> (HR) wrote:

    >HR> while 1:
    >HR> line = infile1.readline()


    You have an infinite loop. Fortunately your program stops because of the
    error. When you encounter end of file, line becomes the empty string and
    the split gives you only 1 item instead of 2.

    So add the following:
    if not line: break

    Also your choice for 0 as initial values of minimum and maximum isn't good.

    --
    Piet van Oostrum <>
    URL: http://www.cs.uu.nl/~piet [PGP 8DAE142BE17999C4]
    Private email:
     
    Piet van Oostrum, Dec 4, 2007
    #5
  6. Horacius ReX

    Zepo Len Guest

    > Your regex is not working correctly I guess, I don't even know why you
    > are using a regex, something like this would work just fine:
    >
    > import sys
    > nums = [float(line.split(' -')[1]) for line in open(sys.argv[1])]
    > print 'min=', min(nums), 'max=', max(nums)


    Sorry, that should be line.split() - didn't realise those were negative
    numbers.
     
    Zepo Len, Dec 4, 2007
    #6
  7. Chris a écrit :
    > On Dec 4, 2:14 pm, Horacius ReX <> wrote:
    >> Hi, I have a text file like this;
    >>
    >> 1 -33.453579
    >> 2 -148.487125
    >> 3 -195.067172
    >> 4 -115.958374
    >> 5 -100.597841
    >> 6 -121.566441
    >> 7 -121.025381
    >> 8 -132.103507
    >> 9 -108.939327
    >> 10 -97.046703
    >> 11 -52.866534
    >> 12 -48.432623
    >> 13 -112.790419
    >> 14 -98.516975
    >> 15 -98.724436
    >>
    >> So I want to write a program in python that reads each line and
    >> detects which numbers of the second column are the maximum and the
    >> minimum.
    >>

    (snip)
    >
    > You're not guaranteed to have that 2 or even 1 element after
    > splitting. If the line is empty or has 1 space you need to handle
    > it. Also is there really a need for regex for a simple string split ?
    >
    > import sys
    >
    > infile = open(sys.argv[1], 'r')
    > min, max = 0, 0


    # shadowing the builtin min and max functions may not be such
    # a good idea !-)
    # Also, you may want to use a sentinel value here instead:
    mini, maxi = None, None

    > for each_line in infile.readlines():


    # You don't need to read the whole file in memory
    # the file object knows how to iterate over lines.
    # Also, you may want to track line numbers so you can
    # warn about an incorrect line, cf below

    for linenum, line in enumerate(infile):

    > if each_line.strip():


    # you're uselessly calling line.strip two times...
    line = line.strip()
    if line:

    > tmp = each_line.strip().split()


    tmp = line.split()

    > try:
    > b = tmp[1]

    # Notice that here, b is a string, not a number...
    try:
    b = int(tmp[1])
    > except (IndexError, TypeError), e:


    # you may want to warn about incorrect/unexpected format here
    # (writing to sys.stderr, since stdout is for normal outputs)
    print >> sys.sdterr, \
    "incorrect line format line %s ('%s') : %e" \
    % (linenum, line, e)
    > continue



    > if b < min: min = b
    > if b > max: max = b


    # If the first test succeeds, doing the second is useless.
    # also, take into account the sentinel value. The identity test
    # against None should not be too costly. If it was, it's simple to
    # optimize it out of the for loop.

    if mini is None or b < mini:
    mini = b
    elif maxi is None or b > maxi:
    maxi = b


    # closing the file might be a good idea too, at least for any
    # serious app
    infile.close()


    Now there are also these two builtin functions min and max, and the
    itertools tee() function...

    import sys
    from itertools import tee

    def extract_number(iterable):
    for linenum, line in enumerate(iterable):
    try:
    yield int(line.strip().split()[1])
    except (IndexError, TypeError), e:
    print >> sys.stderr, e
    continue

    # please add proper error handling around here
    infile = open(sys.argv[1])
    lines1, lines2 = tee(infile)
    print min(extract_numbers(lines1)), max(extract_numbers(lines2))
    infile.close()


    HTH
     
    Bruno Desthuilliers, Dec 4, 2007
    #7
  8. Bruno Desthuilliers a écrit :
    (snip)
    > # Notice that here, b is a string, not a number...
    > try:
    > b = int(tmp[1])


    oops, I meant:
    b = float(tmp[1])


    Idem here:

    > def extract_number(iterable):
    > for linenum, line in enumerate(iterable):
    > try:
    > yield int(line.strip().split()[1])

    yield float(line.strip().split()[1])
     
    Bruno Desthuilliers, Dec 4, 2007
    #8
  9. Horacius ReX

    Peter Otten Guest

    Bruno Desthuilliers wrote:

    > # You don't need to read the whole file in memory


    > lines1, lines2 = tee(infile)
    > print min(extract_numbers(lines1)), max(extract_numbers(lines2))


    tee() internally maintains a list of items that were seen by
    one but not all of the iterators returned. Therefore after calling min()
    and before calling max() you have a list of one float per line in memory
    which is quite close conceptually to reading the whole file in memory.

    If you want to use memory efficiently, stick with the for-loop.

    Peter
     
    Peter Otten, Dec 4, 2007
    #9
  10. Peter Otten a écrit :
    > Bruno Desthuilliers wrote:
    >
    >> # You don't need to read the whole file in memory

    >
    >> lines1, lines2 = tee(infile)
    >> print min(extract_numbers(lines1)), max(extract_numbers(lines2))

    >
    > tee() internally maintains a list of items that were seen by
    > one but not all of the iterators returned. Therefore after calling min()
    > and before calling max() you have a list of one float per line in memory
    > which is quite close conceptually to reading the whole file in memory.
    >
    > If you want to use memory efficiently, stick with the for-loop.


    Indeed - I should have specified that the second version was not
    necesseraly better wrt/ either perfs and/or resources usage. Thanks for
    having made this point clear.
     
    Bruno Desthuilliers, Dec 4, 2007
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jack
    Replies:
    9
    Views:
    2,681
  2. Joe Wright
    Replies:
    0
    Views:
    526
    Joe Wright
    Jul 27, 2003
  3. lovecreatesbeauty

    How to know two lines are a pare parallel lines

    lovecreatesbeauty, Apr 27, 2006, in forum: C Programming
    Replies:
    11
    Views:
    668
    Old Wolf
    Apr 28, 2006
  4. Replies:
    1
    Views:
    460
    Jonathan Mcdougall
    Dec 6, 2005
  5. Murali
    Replies:
    2
    Views:
    577
    Jerry Coffin
    Mar 9, 2006
Loading...

Share This Page