writing results to array

B

Bevan Jenkins

Hello,

I have recently discovered the python language and am having a lot of
fun getting head around the basics of it.
However, I have run into a stumbling block that I have not been able
to overcome, so I thought I would ask for help.
<Overview>
I am trying to import a text file that has the following format:
02/01/2000 @ 00:00:00 0.983896 Q10 T2
03/01/2000 @ 00:00:00 0.557377 Q10 T2
04/01/2000 @ 00:00:00 0.508871 Q10 T2
05/01/2000 @ 00:00:00 0.583196 Q10 T2
06/01/2000 @ 00:00:00 0.518281 Q10 T2
when there is missing data:
12/09/2000 @ 00:00:00 Q151 T2
13/09/2000 @ 00:00:00 Q151 T2

I have cobbled together some code which imports the data. The next
step is to create an array in which each column contains a years worth
of values. Thus, if i have 6 years of data (2001-2006 inclusive),
there will be six columns, with 365 rows (not all years have a full
data set and may only have say 340 days of data.
<The question>
In the code below
print answer[j,1] is giving me the right answer but i can't write it
to an array.
any suggestions welcomed.


This is what I have:
flow=[]
flowdate=[]
yeardate=[]
uniqueyear=[]
#flow_order=
flow_rank=[]
icount=[]
p=[]

filename=r"C:\Documents and Settings\bevanj\Desktop\flow_duration.tsf"
linesep ="\n"

# read in whole file
tempdata = open( filename).read()
# break into lines
tempdata = string.split( tempdata, linesep )
# for each record, get the field values
for i in range( len( tempdata)):
# split into the lines
fields = string.split( tempdata)
if len(fields)>5:
flowdate.append(fields[0])
list =string.split(fields[0],"/")
yeardate.append(list[2])
flow.append(float(fields[3]))
answer=column_stack((flowdate,flow))

for rows in yeardate:
if rows not in uniqueyear:
uniqueyear.append(rows)

#print answer[:,0] #date
flow_order=empty((0,0),dtype=float)
#for yr in enumerate(uniqueyear):
for iyr,yr in enumerate(uniqueyear):
for j, val, in enumerate (answer[:,0]):
flowyr=string.split(val,"/")
if int(flowyr[2])==int(yr):
print answer[j,1]
#flow_order =
 
M

Matimus

Hello,

I have recently discovered the python language and am having a lot of
fun getting head around the basics of it.
However, I have run into a stumbling block that I have not been able
to overcome, so I thought I would ask for help.
<Overview>
I am trying to import a text file that has the following format:
02/01/2000 @ 00:00:00 0.983896 Q10 T2
03/01/2000 @ 00:00:00 0.557377 Q10 T2
04/01/2000 @ 00:00:00 0.508871 Q10 T2
05/01/2000 @ 00:00:00 0.583196 Q10 T2
06/01/2000 @ 00:00:00 0.518281 Q10 T2
when there is missing data:
12/09/2000 @ 00:00:00 Q151 T2
13/09/2000 @ 00:00:00 Q151 T2

I have cobbled together some code which imports the data. The next
step is to create an array in which each column contains a years worth
of values. Thus, if i have 6 years of data (2001-2006 inclusive),
there will be six columns, with 365 rows (not all years have a full
data set and may only have say 340 days of data.
<The question>
In the code below
print answer[j,1] is giving me the right answer but i can't write it
to an array.
any suggestions welcomed.

This is what I have:
flow=[]
flowdate=[]
yeardate=[]
uniqueyear=[]
#flow_order=
flow_rank=[]
icount=[]
p=[]

filename=r"C:\Documents and Settings\bevanj\Desktop\flow_duration.tsf"
linesep ="\n"

# read in whole file
tempdata = open( filename).read()
# break into lines
tempdata = string.split( tempdata, linesep )
# for each record, get the field values
for i in range( len( tempdata)):
# split into the lines
fields = string.split( tempdata)
if len(fields)>5:
flowdate.append(fields[0])
list =string.split(fields[0],"/")
yeardate.append(list[2])
flow.append(float(fields[3]))
answer=column_stack((flowdate,flow))

for rows in yeardate:
if rows not in uniqueyear:
uniqueyear.append(rows)

#print answer[:,0] #date
flow_order=empty((0,0),dtype=float)
#for yr in enumerate(uniqueyear):
for iyr,yr in enumerate(uniqueyear):
for j, val, in enumerate (answer[:,0]):
flowyr=string.split(val,"/")
if int(flowyr[2])==int(yr):
print answer[j,1]
#flow_order =


I'm not sure what you mean by `write it to an array'. `answers' is an
array. Perhaps you could show an example that has the bad behavior you
are observing. Or at least an example of what you expect to get.

Also, just a couple of pointers:

this:
tempdata = open( filename).read()
# break into lines
tempdata = string.split( tempdata, linesep )
# for each record, get the field values
for i in range( len( tempdata)):
# split into the lines
fields = string.split( tempdata)


is better written (and usually written) in python like this:

for line in open(filename):
fields = line.split()

Don't use the string module, use the methods of the strings
themselves.
Don't use built-in type names as variable names, as seen on this line:
list =string.split(fields[0],"/") # list is a built-in type

You only need to use enumerate if you actually want the index. If you
don't need the index, just iterate over the sequence. eg. use this:
for yr in uniqueyear:

You don't need to re-create the column-stack each time you get a value
from the file. It is very inefficient.

eg. this:
for i in range( len( tempdata)):
# split into the lines
fields = string.split( tempdata)
if len(fields)>5:
flowdate.append(fields[0])
list =string.split(fields[0],"/")
yeardate.append(list[2])
flow.append(float(fields[3]))
answer=column_stack((flowdate,flow))


to this:
for i in range( len( tempdata)):
# split into the lines
fields = string.split( tempdata)
if len(fields)>5:
flowdate.append(fields[0])
list =string.split(fields[0],"/")
yeardate.append(list[2])
flow.append(float(fields[3]))
answer=column_stack((flowdate,flow))


or, with the other suggested changes:
for line in open(filename):
# split into the lines
fields = line.split()
if len(fields) > 5:
flowdate.append(fields[0])
year = fields[0].split("/")[2]
yeardate.append(year)
flow.append(float(fields[3]))
answer=column_stack((flowdate,flow))

If I was doing this though, I would use a dictionary (dict) where the
keys are the year and the values are lists of flows for that year.

Something like this:
Code:
filename=r"C:\Documents and Settings\bevanj\Desktop\flow_duration.tsf"
year2flows = {}

fin = open(filename)
for line in fin:
    # split into the lines
    fields = line.split()
    if len(fields)>5:
        date = fields[0]
        year = fields[0].split("/")[-1]
        flow = float(fields[3])
        year2flows.setdefault(year, []).append((date, flow))
fin.close()

# This does what you were doing.
for yr in sorted(year2flows.keys()):
    for date, flow in year2flows[yr]
        print flow
# If you just wanted one year though you could do something like this:
for date, flow in year2flows[2004]:
    print flow

The above code is untested, so I make no guarantees. If you are using
python 2.5, you might look into using defaultdict (in the collections
module). It will simplify the code a bit.

from this:
year2flows = {}
# bunch of stuff...
year2flows.setdefault(year, []).append((date, flow))
to this:
from collections import defaultdict
year2flows = defaultdict(list)
# bunch of stuff...
year2flows[year].append((date, flow))

Matt
 
C

Chris

Hello,

I have recently discovered the python language and am having a lot of
fun getting head around the basics of it.
However, I have run into a stumbling block that I have not been able
to overcome, so I thought I would ask for help.
<Overview>
I am trying to import a text file that has the following format:
02/01/2000 @ 00:00:00 0.983896 Q10 T2
03/01/2000 @ 00:00:00 0.557377 Q10 T2
04/01/2000 @ 00:00:00 0.508871 Q10 T2
05/01/2000 @ 00:00:00 0.583196 Q10 T2
06/01/2000 @ 00:00:00 0.518281 Q10 T2
when there is missing data:
12/09/2000 @ 00:00:00 Q151 T2
13/09/2000 @ 00:00:00 Q151 T2

I have cobbled together some code which imports the data. The next
step is to create an array in which each column contains a years worth
of values. Thus, if i have 6 years of data (2001-2006 inclusive),
there will be six columns, with 365 rows (not all years have a full
data set and may only have say 340 days of data.
<The question>
In the code below
print answer[j,1] is giving me the right answer but i can't write it
to an array.
any suggestions welcomed.

This is what I have:
flow=[]
flowdate=[]
yeardate=[]
uniqueyear=[]
#flow_order=
flow_rank=[]
icount=[]
p=[]

filename=r"C:\Documents and Settings\bevanj\Desktop\flow_duration.tsf"
linesep ="\n"

# read in whole file
tempdata = open( filename).read()
# break into lines
tempdata = string.split( tempdata, linesep )
# for each record, get the field values
for i in range( len( tempdata)):
# split into the lines
fields = string.split( tempdata)
if len(fields)>5:
flowdate.append(fields[0])
list =string.split(fields[0],"/")
yeardate.append(list[2])
flow.append(float(fields[3]))
answer=column_stack((flowdate,flow))

for rows in yeardate:
if rows not in uniqueyear:
uniqueyear.append(rows)

#print answer[:,0] #date
flow_order=empty((0,0),dtype=float)
#for yr in enumerate(uniqueyear):
for iyr,yr in enumerate(uniqueyear):
for j, val, in enumerate (answer[:,0]):
flowyr=string.split(val,"/")
if int(flowyr[2])==int(yr):
print answer[j,1]
#flow_order =


Maybe you're looking for something more in the line of:

fInput = open('tst.txt')
dictObj = {}
"""{ Year_Key: { DayKey: FloatValue}}"""
for each_line in fInput.readlines():
if each_line.strip():
line = each_line.strip().split()
if len(line) == 6:
if dictObj.has_key(line[0].split('/')[-1]):
tmpDict = dictObj[line[0].split('/')[-1]]
tmpDict[line[0]] = line[3]
else:
dictObj[line[0].split('/')[-1]] = {line[0]:line[3]}
fInput.close()
 
D

Dennis Lee Bieber

<The question>
In the code below
print answer[j,1] is giving me the right answer but i can't write it
to an array.

Unless you are using some module/class that you didn't show us in
the code, Python doesn't really have arrays (there is an array built-in,
but I don't recall ever seeing it used, and then there are the various
numeric processing modules: numarry, numeric, and numpy [which
supercedes the other two]).
answer=column_stack((flowdate,flow))
You don't supply the code/definition for column_stack(), other than
that you are passing in a single argument -- which is a tuple containing
a list of dates and a list of whatever "flow" represents. Lacking this,
I can not guess what "answer" is supposed to represent.
#print answer[:,0] #date
flow_order=empty((0,0),dtype=float)

Where did empty() come from, and what is it supposed to be doing?

A cut at the parsing half of the problem:

-=-=-=-=-=-=-


#FILENAME = r"C:\Documents and
Settings\bevanj\Desktop\flow_duration.tsf"
FILENAME = "test.data"
#convention is that "constants" be all UPPERCASE name

data = {} #empty dictionary

fin = open(FILENAME, "r")

for ln in fin: #automatically reads by lines
flds = ln.split() #use string methods, not module functions
if len(flds) == 6:
(day, mon, year) = flds[0].split("/")
if year in data: #does dictionary already have the year?
data[year].append((flds[0], float(flds[3]))) #append to
previous list
else:
data[year] = [(flds[0], float(flds[3]))] #new list
created

fin.close()

#at this point, we should have a dictionary keyed by year, each year
#contains a list of (date, value) tuples.

# no code was given for column_stack() which looks to be taking
# ONE argument: a tuple containing two lists -> a list of dates and a
# list of values [just the opposite of what the above code produces,
to whit:
# ( [ "01/01/2001", "01/02/2001", ...], [ 0.9, 0.5, ...] )
# vs
# [ ( "01/01/2001", 0.9), ( "01/02/2001", 0.5), ... ( ..., ...) ]
#
# with no code for it, I can not guess at what "answer" is supposed to
contain
#
# furthermore, for normal Python lists (NOT arrays -- arrays are
special module
# creatures and don't work quite like lists) one does not write
multidimensional
# (nested lists) using mdl[x, y] notation, but by mdl[x][y]


import pprint
pprint.pprint(data)
-=-=-=-=-=-=-=-

When fed the following data (note that I mixed some orders to
illustrate the code)

-=-=-=-=-=-=-=-
02/01/2000 @ 00:00:00 0.983896 Q10 T2
03/01/2000 @ 00:00:00 0.557377 Q10 T2
04/01/2000 @ 00:00:00 0.508871 Q10 T2
05/01/2000 @ 00:00:00 0.583196 Q10 T2
06/01/2000 @ 00:00:00 0.518281 Q10 T2
12/09/2000 @ 00:00:00 Q151 T2
13/09/2000 @ 00:00:00 Q151 T2
02/01/2001 @ 00:00:00 0.983896 Q10 T2
03/01/2001 @ 00:00:00 0.557377 Q10 T2
04/01/2002 @ 00:00:00 0.608871 Q10 T2
05/01/2001 @ 00:00:00 0.583196 Q10 T2
06/01/2001 @ 00:00:00 0.518281 Q10 T2
12/09/2001 @ 00:00:00 Q151 T2
13/09/2002 @ 00:00:00 Q151 T2
02/01/2002 @ 00:00:00 0.983896 Q10 T2
03/01/2002 @ 00:00:00 0.557377 Q10 T2
04/01/2001 @ 00:00:00 0.408871 Q10 T2
05/01/2002 @ 00:00:00 0.583196 Q10 T2
06/01/2002 @ 00:00:00 0.518281 Q10 T2
12/09/2002 @ 00:00:00 Q151 T2
13/09/2001 @ 00:00:00 Q151 T2

-=-=-=-=-=-=-=-

produces:
pythonw -u "Script11.py"
{'2000': [('02/01/2000', 0.98389599999999999),
('03/01/2000', 0.55737700000000001),
('04/01/2000', 0.50887099999999996),
('05/01/2000', 0.58319600000000005),
('06/01/2000', 0.51828099999999999)],
'2001': [('02/01/2001', 0.98389599999999999),
('03/01/2001', 0.55737700000000001),
('05/01/2001', 0.58319600000000005),
('06/01/2001', 0.51828099999999999),
('04/01/2001', 0.40887099999999998)],
'2002': [('04/01/2002', 0.60887100000000005),
('02/01/2002', 0.98389599999999999),
('03/01/2002', 0.55737700000000001),
('05/01/2002', 0.58319600000000005),
('06/01/2002', 0.51828099999999999)]}
Exit code: 0
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
B

Bevan Jenkins

Thank you all very much.

Firstly for providing an answer that does exactly what I require. But
also for the hints on the naming conventions and the explanations of
how I was going wrong.

Thanks again,
b
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,007
Latest member
obedient dusk

Latest Threads

Top