writing results to array

Discussion in 'Python' started by Bevan Jenkins, Dec 3, 2007.

  1. Hello,

    I have recently discovered the python language and am having a lot of
    fun getting head around the basics of it.
    However, I have run into a stumbling block that I have not been able
    to overcome, so I thought I would ask for help.
    <Overview>
    I am trying to import a text file that has the following format:
    02/01/2000 @ 00:00:00 0.983896 Q10 T2
    03/01/2000 @ 00:00:00 0.557377 Q10 T2
    04/01/2000 @ 00:00:00 0.508871 Q10 T2
    05/01/2000 @ 00:00:00 0.583196 Q10 T2
    06/01/2000 @ 00:00:00 0.518281 Q10 T2
    when there is missing data:
    12/09/2000 @ 00:00:00 Q151 T2
    13/09/2000 @ 00:00:00 Q151 T2

    I have cobbled together some code which imports the data. The next
    step is to create an array in which each column contains a years worth
    of values. Thus, if i have 6 years of data (2001-2006 inclusive),
    there will be six columns, with 365 rows (not all years have a full
    data set and may only have say 340 days of data.
    <The question>
    In the code below
    print answer[j,1] is giving me the right answer but i can't write it
    to an array.
    any suggestions welcomed.


    This is what I have:
    flow=[]
    flowdate=[]
    yeardate=[]
    uniqueyear=[]
    #flow_order=
    flow_rank=[]
    icount=[]
    p=[]

    filename=r"C:\Documents and Settings\bevanj\Desktop\flow_duration.tsf"
    linesep ="\n"

    # read in whole file
    tempdata = open( filename).read()
    # break into lines
    tempdata = string.split( tempdata, linesep )
    # for each record, get the field values
    for i in range( len( tempdata)):
    # split into the lines
    fields = string.split( tempdata)
    if len(fields)>5:
    flowdate.append(fields[0])
    list =string.split(fields[0],"/")
    yeardate.append(list[2])
    flow.append(float(fields[3]))
    answer=column_stack((flowdate,flow))

    for rows in yeardate:
    if rows not in uniqueyear:
    uniqueyear.append(rows)

    #print answer[:,0] #date
    flow_order=empty((0,0),dtype=float)
    #for yr in enumerate(uniqueyear):
    for iyr,yr in enumerate(uniqueyear):
    for j, val, in enumerate (answer[:,0]):
    flowyr=string.split(val,"/")
    if int(flowyr[2])==int(yr):
    print answer[j,1]
    #flow_order =
     
    Bevan Jenkins, Dec 3, 2007
    #1
    1. Advertising

  2. Bevan Jenkins

    Matimus Guest

    On Dec 3, 12:45 pm, Bevan Jenkins <> wrote:
    > Hello,
    >
    > I have recently discovered the python language and am having a lot of
    > fun getting head around the basics of it.
    > However, I have run into a stumbling block that I have not been able
    > to overcome, so I thought I would ask for help.
    > <Overview>
    > I am trying to import a text file that has the following format:
    > 02/01/2000 @ 00:00:00 0.983896 Q10 T2
    > 03/01/2000 @ 00:00:00 0.557377 Q10 T2
    > 04/01/2000 @ 00:00:00 0.508871 Q10 T2
    > 05/01/2000 @ 00:00:00 0.583196 Q10 T2
    > 06/01/2000 @ 00:00:00 0.518281 Q10 T2
    > when there is missing data:
    > 12/09/2000 @ 00:00:00 Q151 T2
    > 13/09/2000 @ 00:00:00 Q151 T2
    >
    > I have cobbled together some code which imports the data. The next
    > step is to create an array in which each column contains a years worth
    > of values. Thus, if i have 6 years of data (2001-2006 inclusive),
    > there will be six columns, with 365 rows (not all years have a full
    > data set and may only have say 340 days of data.
    > <The question>
    > In the code below
    > print answer[j,1] is giving me the right answer but i can't write it
    > to an array.
    > any suggestions welcomed.
    >
    > This is what I have:
    > flow=[]
    > flowdate=[]
    > yeardate=[]
    > uniqueyear=[]
    > #flow_order=
    > flow_rank=[]
    > icount=[]
    > p=[]
    >
    > filename=r"C:\Documents and Settings\bevanj\Desktop\flow_duration.tsf"
    > linesep ="\n"
    >
    > # read in whole file
    > tempdata = open( filename).read()
    > # break into lines
    > tempdata = string.split( tempdata, linesep )
    > # for each record, get the field values
    > for i in range( len( tempdata)):
    > # split into the lines
    > fields = string.split( tempdata)
    > if len(fields)>5:
    > flowdate.append(fields[0])
    > list =string.split(fields[0],"/")
    > yeardate.append(list[2])
    > flow.append(float(fields[3]))
    > answer=column_stack((flowdate,flow))
    >
    > for rows in yeardate:
    > if rows not in uniqueyear:
    > uniqueyear.append(rows)
    >
    > #print answer[:,0] #date
    > flow_order=empty((0,0),dtype=float)
    > #for yr in enumerate(uniqueyear):
    > for iyr,yr in enumerate(uniqueyear):
    > for j, val, in enumerate (answer[:,0]):
    > flowyr=string.split(val,"/")
    > if int(flowyr[2])==int(yr):
    > print answer[j,1]
    > #flow_order =


    I'm not sure what you mean by `write it to an array'. `answers' is an
    array. Perhaps you could show an example that has the bad behavior you
    are observing. Or at least an example of what you expect to get.

    Also, just a couple of pointers:

    this:

    > tempdata = open( filename).read()
    > # break into lines
    > tempdata = string.split( tempdata, linesep )
    > # for each record, get the field values
    > for i in range( len( tempdata)):
    > # split into the lines
    > fields = string.split( tempdata)


    is better written (and usually written) in python like this:

    for line in open(filename):
    fields = line.split()

    Don't use the string module, use the methods of the strings
    themselves.
    Don't use built-in type names as variable names, as seen on this line:
    > list =string.split(fields[0],"/") # list is a built-in type


    You only need to use enumerate if you actually want the index. If you
    don't need the index, just iterate over the sequence. eg. use this:

    > for yr in uniqueyear:


    You don't need to re-create the column-stack each time you get a value
    from the file. It is very inefficient.

    eg. this:

    > for i in range( len( tempdata)):
    > # split into the lines
    > fields = string.split( tempdata)
    > if len(fields)>5:
    > flowdate.append(fields[0])
    > list =string.split(fields[0],"/")
    > yeardate.append(list[2])
    > flow.append(float(fields[3]))
    > answer=column_stack((flowdate,flow))


    to this:

    > for i in range( len( tempdata)):
    > # split into the lines
    > fields = string.split( tempdata)
    > if len(fields)>5:
    > flowdate.append(fields[0])
    > list =string.split(fields[0],"/")
    > yeardate.append(list[2])
    > flow.append(float(fields[3]))
    > answer=column_stack((flowdate,flow))


    or, with the other suggested changes:

    > for line in open(filename):
    > # split into the lines
    > fields = line.split()
    > if len(fields) > 5:
    > flowdate.append(fields[0])
    > year = fields[0].split("/")[2]
    > yeardate.append(year)
    > flow.append(float(fields[3]))
    > answer=column_stack((flowdate,flow))


    If I was doing this though, I would use a dictionary (dict) where the
    keys are the year and the values are lists of flows for that year.

    Something like this:
    Code:
    filename=r"C:\Documents and Settings\bevanj\Desktop\flow_duration.tsf"
    year2flows = {}
    
    fin = open(filename)
    for line in fin:
        # split into the lines
        fields = line.split()
        if len(fields)>5:
            date = fields[0]
            year = fields[0].split("/")[-1]
            flow = float(fields[3])
            year2flows.setdefault(year, []).append((date, flow))
    fin.close()
    
    # This does what you were doing.
    for yr in sorted(year2flows.keys()):
        for date, flow in year2flows[yr]
            print flow
    # If you just wanted one year though you could do something like this:
    for date, flow in year2flows[2004]:
        print flow
    
    
    The above code is untested, so I make no guarantees. If you are using
    python 2.5, you might look into using defaultdict (in the collections
    module). It will simplify the code a bit.

    from this:
    year2flows = {}
    # bunch of stuff...
    year2flows.setdefault(year, []).append((date, flow))
    to this:
    from collections import defaultdict
    year2flows = defaultdict(list)
    # bunch of stuff...
    year2flows[year].append((date, flow))

    Matt
     
    Matimus, Dec 3, 2007
    #2
    1. Advertising

  3. Bevan Jenkins

    Chris Guest

    On Dec 3, 10:45 pm, Bevan Jenkins <> wrote:
    > Hello,
    >
    > I have recently discovered the python language and am having a lot of
    > fun getting head around the basics of it.
    > However, I have run into a stumbling block that I have not been able
    > to overcome, so I thought I would ask for help.
    > <Overview>
    > I am trying to import a text file that has the following format:
    > 02/01/2000 @ 00:00:00 0.983896 Q10 T2
    > 03/01/2000 @ 00:00:00 0.557377 Q10 T2
    > 04/01/2000 @ 00:00:00 0.508871 Q10 T2
    > 05/01/2000 @ 00:00:00 0.583196 Q10 T2
    > 06/01/2000 @ 00:00:00 0.518281 Q10 T2
    > when there is missing data:
    > 12/09/2000 @ 00:00:00 Q151 T2
    > 13/09/2000 @ 00:00:00 Q151 T2
    >
    > I have cobbled together some code which imports the data. The next
    > step is to create an array in which each column contains a years worth
    > of values. Thus, if i have 6 years of data (2001-2006 inclusive),
    > there will be six columns, with 365 rows (not all years have a full
    > data set and may only have say 340 days of data.
    > <The question>
    > In the code below
    > print answer[j,1] is giving me the right answer but i can't write it
    > to an array.
    > any suggestions welcomed.
    >
    > This is what I have:
    > flow=[]
    > flowdate=[]
    > yeardate=[]
    > uniqueyear=[]
    > #flow_order=
    > flow_rank=[]
    > icount=[]
    > p=[]
    >
    > filename=r"C:\Documents and Settings\bevanj\Desktop\flow_duration.tsf"
    > linesep ="\n"
    >
    > # read in whole file
    > tempdata = open( filename).read()
    > # break into lines
    > tempdata = string.split( tempdata, linesep )
    > # for each record, get the field values
    > for i in range( len( tempdata)):
    > # split into the lines
    > fields = string.split( tempdata)
    > if len(fields)>5:
    > flowdate.append(fields[0])
    > list =string.split(fields[0],"/")
    > yeardate.append(list[2])
    > flow.append(float(fields[3]))
    > answer=column_stack((flowdate,flow))
    >
    > for rows in yeardate:
    > if rows not in uniqueyear:
    > uniqueyear.append(rows)
    >
    > #print answer[:,0] #date
    > flow_order=empty((0,0),dtype=float)
    > #for yr in enumerate(uniqueyear):
    > for iyr,yr in enumerate(uniqueyear):
    > for j, val, in enumerate (answer[:,0]):
    > flowyr=string.split(val,"/")
    > if int(flowyr[2])==int(yr):
    > print answer[j,1]
    > #flow_order =


    Maybe you're looking for something more in the line of:

    fInput = open('tst.txt')
    dictObj = {}
    """{ Year_Key: { DayKey: FloatValue}}"""
    for each_line in fInput.readlines():
    if each_line.strip():
    line = each_line.strip().split()
    if len(line) == 6:
    if dictObj.has_key(line[0].split('/')[-1]):
    tmpDict = dictObj[line[0].split('/')[-1]]
    tmpDict[line[0]] = line[3]
    else:
    dictObj[line[0].split('/')[-1]] = {line[0]:line[3]}
    fInput.close()
     
    Chris, Dec 4, 2007
    #3
  4. On Mon, 3 Dec 2007 12:45:29 -0800 (PST), Bevan Jenkins
    <> declaimed the following in comp.lang.python:


    > <The question>
    > In the code below
    > print answer[j,1] is giving me the right answer but i can't write it
    > to an array.


    Unless you are using some module/class that you didn't show us in
    the code, Python doesn't really have arrays (there is an array built-in,
    but I don't recall ever seeing it used, and then there are the various
    numeric processing modules: numarry, numeric, and numpy [which
    supercedes the other two]).

    > answer=column_stack((flowdate,flow))
    >

    You don't supply the code/definition for column_stack(), other than
    that you are passing in a single argument -- which is a tuple containing
    a list of dates and a list of whatever "flow" represents. Lacking this,
    I can not guess what "answer" is supposed to represent.

    > #print answer[:,0] #date
    > flow_order=empty((0,0),dtype=float)


    Where did empty() come from, and what is it supposed to be doing?

    A cut at the parsing half of the problem:

    -=-=-=-=-=-=-


    #FILENAME = r"C:\Documents and
    Settings\bevanj\Desktop\flow_duration.tsf"
    FILENAME = "test.data"
    #convention is that "constants" be all UPPERCASE name

    data = {} #empty dictionary

    fin = open(FILENAME, "r")

    for ln in fin: #automatically reads by lines
    flds = ln.split() #use string methods, not module functions
    if len(flds) == 6:
    (day, mon, year) = flds[0].split("/")
    if year in data: #does dictionary already have the year?
    data[year].append((flds[0], float(flds[3]))) #append to
    previous list
    else:
    data[year] = [(flds[0], float(flds[3]))] #new list
    created

    fin.close()

    #at this point, we should have a dictionary keyed by year, each year
    #contains a list of (date, value) tuples.

    # no code was given for column_stack() which looks to be taking
    # ONE argument: a tuple containing two lists -> a list of dates and a
    # list of values [just the opposite of what the above code produces,
    to whit:
    # ( [ "01/01/2001", "01/02/2001", ...], [ 0.9, 0.5, ...] )
    # vs
    # [ ( "01/01/2001", 0.9), ( "01/02/2001", 0.5), ... ( ..., ...) ]
    #
    # with no code for it, I can not guess at what "answer" is supposed to
    contain
    #
    # furthermore, for normal Python lists (NOT arrays -- arrays are
    special module
    # creatures and don't work quite like lists) one does not write
    multidimensional
    # (nested lists) using mdl[x, y] notation, but by mdl[x][y]


    import pprint
    pprint.pprint(data)
    -=-=-=-=-=-=-=-

    When fed the following data (note that I mixed some orders to
    illustrate the code)

    -=-=-=-=-=-=-=-
    02/01/2000 @ 00:00:00 0.983896 Q10 T2
    03/01/2000 @ 00:00:00 0.557377 Q10 T2
    04/01/2000 @ 00:00:00 0.508871 Q10 T2
    05/01/2000 @ 00:00:00 0.583196 Q10 T2
    06/01/2000 @ 00:00:00 0.518281 Q10 T2
    12/09/2000 @ 00:00:00 Q151 T2
    13/09/2000 @ 00:00:00 Q151 T2
    02/01/2001 @ 00:00:00 0.983896 Q10 T2
    03/01/2001 @ 00:00:00 0.557377 Q10 T2
    04/01/2002 @ 00:00:00 0.608871 Q10 T2
    05/01/2001 @ 00:00:00 0.583196 Q10 T2
    06/01/2001 @ 00:00:00 0.518281 Q10 T2
    12/09/2001 @ 00:00:00 Q151 T2
    13/09/2002 @ 00:00:00 Q151 T2
    02/01/2002 @ 00:00:00 0.983896 Q10 T2
    03/01/2002 @ 00:00:00 0.557377 Q10 T2
    04/01/2001 @ 00:00:00 0.408871 Q10 T2
    05/01/2002 @ 00:00:00 0.583196 Q10 T2
    06/01/2002 @ 00:00:00 0.518281 Q10 T2
    12/09/2002 @ 00:00:00 Q151 T2
    13/09/2001 @ 00:00:00 Q151 T2

    -=-=-=-=-=-=-=-

    produces:

    >pythonw -u "Script11.py"

    {'2000': [('02/01/2000', 0.98389599999999999),
    ('03/01/2000', 0.55737700000000001),
    ('04/01/2000', 0.50887099999999996),
    ('05/01/2000', 0.58319600000000005),
    ('06/01/2000', 0.51828099999999999)],
    '2001': [('02/01/2001', 0.98389599999999999),
    ('03/01/2001', 0.55737700000000001),
    ('05/01/2001', 0.58319600000000005),
    ('06/01/2001', 0.51828099999999999),
    ('04/01/2001', 0.40887099999999998)],
    '2002': [('04/01/2002', 0.60887100000000005),
    ('02/01/2002', 0.98389599999999999),
    ('03/01/2002', 0.55737700000000001),
    ('05/01/2002', 0.58319600000000005),
    ('06/01/2002', 0.51828099999999999)]}
    >Exit code: 0

    --
    Wulfraed Dennis Lee Bieber KD6MOG

    HTTP://wlfraed.home.netcom.com/
    (Bestiaria Support Staff: )
    HTTP://www.bestiaria.com/
     
    Dennis Lee Bieber, Dec 4, 2007
    #4
  5. Thank you all very much.

    Firstly for providing an answer that does exactly what I require. But
    also for the hints on the naming conventions and the explanations of
    how I was going wrong.

    Thanks again,
    b
     
    Bevan Jenkins, Dec 4, 2007
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Analog Guy

    Writing Testbench Output Results

    Analog Guy, Sep 29, 2004, in forum: VHDL
    Replies:
    1
    Views:
    1,272
    Jim Lewis
    Sep 29, 2004
  2. Monique Y. Mudama
    Replies:
    1
    Views:
    457
    Monique Y. Mudama
    Jun 28, 2005
  3. Replies:
    17
    Views:
    534
    Peter Otten
    Dec 1, 2006
  4. lovecreatesbeauty
    Replies:
    8
    Views:
    1,661
    Old Wolf
    Sep 12, 2005
  5. Ken Fine
    Replies:
    3
    Views:
    540
    Steven Cheng [MSFT]
    Jul 23, 2008
Loading...

Share This Page