O
Odysseus
I'm writing my first 'real' program, i.e. that has a purpose aside from
serving as a learning exercise. I'm posting to solicit comments about my
efforts at translating strings from an external source into useful data,
regarding efficiency and 'pythonicity' both. My only significant
programming experience is in PostScript, and I feel that I haven't yet
'found my feet' concerning the object-oriented aspects of Python, so I'd
be especially interested to know where I may be neglecting to take
advantage of them.
My input is in the form of correlated lists of strings, which I want to
merge (while ignoring some extraneous items). I populate a dictionary
called "found" with these data, still in string form. It contains
sub-dictionaries of various items keyed to strings extracted from the
list "names"; these sub-dictionaries in turn contain the associated
items I want from "cells". After loading in the strings (I have omitted
the statements that pick up strings that require no further processing,
some of them coming from a third list), I convert selected items in
place. Here's the function I wrote:
def extract_data():
i = 0
while i < len(names):
name = names[6:] # strip off "Name: "
found[name] = {'epoch1': cells[10 * i + na],
'epoch2': cells[10 * i + na + 1],
'time': cells[10 * i + na + 5],
'score1': cells[10 * i + na + 6],
'score2': cells[10 * i + na + 7]}
###
Following is my first parsing step, for those data that represent real
numbers. The two obstacles I'm contending with here are that the figures
have commas grouping the digits in threes, and that sometimes the data
are non-numeric -- I'll deal with those later. Is there a more elegant
way of removing the commas than the split-and-rejoin below?
###
for k in ('time', 'score1', 'score2'):
v = found[name][k]
if v != "---" and v != "n/a": # skip non-numeric data
v = ''.join(v.split(",")) # remove commas between 000s
found[name][k] = float(v)
###
The next one is much messier. A couple of the strings represent times,
which I think will be most useful in 'native' form, but the input is in
the format "DD Mth YYYY HH:MM:SS UTC". Near the beginning of my program
I have "from calendar import timegm". Before I can feed the data to this
function, though, I have to convert the month abbreviation to a number.
I couldn't come up with anything more elegant than look-up from a list:
the relevant part of my initialization is
'''
m_abbrevs = ("Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
'''
I'm also rather unhappy with the way I kluged the seventh and eighth
values in the tuple passed to timegm, the order of the date in the week
and in the year respectively. (I would hate to have to calculate them.)
The function doesn't seem to care what values I give it for these -- as
long as I don't omit them -- so I guess they're only there for the sake
of matching the output of the inverse function. Is there a version of
timegm that takes a tuple of only six (or seven) elements, or any better
way to handle this situation?
###
for k in ('epoch1', 'epoch2'):
dlist = found[name][k].split(" ")
m = 0
while m < 12:
if m_abbrevs[m] == dlist[1]:
dlist[1] = m + 1
break
m += 1
tlist = dlist[3].split(":")
found[name][k] = timegm((int(dlist[2]), int(dlist[1]),
int(dlist[0]), int(tlist[0]),
int(tlist[1]), int(tlist[2]),
-1, -1, 0))
i += 1
The function appears to be working OK as is, but I would welcome any &
all suggestions for improving it or making it more idiomatic.
serving as a learning exercise. I'm posting to solicit comments about my
efforts at translating strings from an external source into useful data,
regarding efficiency and 'pythonicity' both. My only significant
programming experience is in PostScript, and I feel that I haven't yet
'found my feet' concerning the object-oriented aspects of Python, so I'd
be especially interested to know where I may be neglecting to take
advantage of them.
My input is in the form of correlated lists of strings, which I want to
merge (while ignoring some extraneous items). I populate a dictionary
called "found" with these data, still in string form. It contains
sub-dictionaries of various items keyed to strings extracted from the
list "names"; these sub-dictionaries in turn contain the associated
items I want from "cells". After loading in the strings (I have omitted
the statements that pick up strings that require no further processing,
some of them coming from a third list), I convert selected items in
place. Here's the function I wrote:
def extract_data():
i = 0
while i < len(names):
name = names[6:] # strip off "Name: "
found[name] = {'epoch1': cells[10 * i + na],
'epoch2': cells[10 * i + na + 1],
'time': cells[10 * i + na + 5],
'score1': cells[10 * i + na + 6],
'score2': cells[10 * i + na + 7]}
###
Following is my first parsing step, for those data that represent real
numbers. The two obstacles I'm contending with here are that the figures
have commas grouping the digits in threes, and that sometimes the data
are non-numeric -- I'll deal with those later. Is there a more elegant
way of removing the commas than the split-and-rejoin below?
###
for k in ('time', 'score1', 'score2'):
v = found[name][k]
if v != "---" and v != "n/a": # skip non-numeric data
v = ''.join(v.split(",")) # remove commas between 000s
found[name][k] = float(v)
###
The next one is much messier. A couple of the strings represent times,
which I think will be most useful in 'native' form, but the input is in
the format "DD Mth YYYY HH:MM:SS UTC". Near the beginning of my program
I have "from calendar import timegm". Before I can feed the data to this
function, though, I have to convert the month abbreviation to a number.
I couldn't come up with anything more elegant than look-up from a list:
the relevant part of my initialization is
'''
m_abbrevs = ("Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
'''
I'm also rather unhappy with the way I kluged the seventh and eighth
values in the tuple passed to timegm, the order of the date in the week
and in the year respectively. (I would hate to have to calculate them.)
The function doesn't seem to care what values I give it for these -- as
long as I don't omit them -- so I guess they're only there for the sake
of matching the output of the inverse function. Is there a version of
timegm that takes a tuple of only six (or seven) elements, or any better
way to handle this situation?
###
for k in ('epoch1', 'epoch2'):
dlist = found[name][k].split(" ")
m = 0
while m < 12:
if m_abbrevs[m] == dlist[1]:
dlist[1] = m + 1
break
m += 1
tlist = dlist[3].split(":")
found[name][k] = timegm((int(dlist[2]), int(dlist[1]),
int(dlist[0]), int(tlist[0]),
int(tlist[1]), int(tlist[2]),
-1, -1, 0))
i += 1
The function appears to be working OK as is, but I would welcome any &
all suggestions for improving it or making it more idiomatic.