Array of dict or lists or ....?

P

Pat

I can't figure out how to set up a Python data structure to read in data
that looks something like this (albeit somewhat simplified and contrived):


States
Counties
Schools
Classes
Max Allowed Students
Current enrolled Students

Nebraska, Wabash, Newville, Math, 20, 0
Nebraska, Wabash, Newville, Gym, 400, 0
Nebraska, Tingo, Newfille, Gym, 400, 0
Ohio, Dinger, OldSchool, English, 10, 0

With each line I read in, I would create a hash entry and increment the
number of enrolled students.

I wrote a routine in Perl using arrays of hash tables (but the syntax
was a bear) that allowed me to read in the data and with those arrays of
hash tables to arrays of hash tables almost everything was dynamically
assigned.

I was able to fill in the hash tables and determine if any school class
(e.g. Gym) had exceeded the number of max students or if no students had
enrolled.

No, this is not a classroom project. I really need this for my job.
I'm converting my Perl program to Python and this portion has me stumped.

The reason why I'm converting a perfectly working program is because no
one else knows Perl or Python either (but I believe that someone new
would learn Python quicker than Perl) and the Perl program has become
huge and is continuously growing.
 
T

Tim Chase

I can't figure out how to set up a Python data structure to read in data
that looks something like this (albeit somewhat simplified and contrived):

States
Counties
Schools
Classes
Max Allowed Students
Current enrolled Students

Nebraska, Wabash, Newville, Math, 20, 0
Nebraska, Wabash, Newville, Gym, 400, 0
Nebraska, Tingo, Newfille, Gym, 400, 0
Ohio, Dinger, OldSchool, English, 10, 0

With each line I read in, I would create a hash entry and increment the
number of enrolled students.

A python version of what you describe:

class TooManyAttendants(Exception): pass
class Attendence(object):
def __init__(self, max):
self.max = int(max)
self.total = 0
def accrue(self, other):
self.total += int(other)
if self.total > self.max: raise TooManyAttendants
def __str__(self):
return "%s/%s" % (self.max, self.total)
__repr__ = __str__

data = {}
for i, line in enumerate(file("input.txt")):
print line,
state, county, school, cls, max_students, enrolled = map(
lambda s: s.strip(),
line.rstrip("\r\n").split(",")
)
try:
data.setdefault(
state, {}).setdefault(
county, {}).setdefault(
cls, Attendence(max_students)).accrue(enrolled)
except TooManyAttendants:
print "Too many Attendants in line %i" % (i + 1)
print repr(data)


You can then access things like

a = data["Nebraska"]["Wabash"]["Newville"]["Math"]
print a.max, a.total

If capitalization varies, you may have to do something like

data.setdefault(
state.upper(), {}).setdefault(
county.upper(), {}).setdefault(
cls.upper(), Attendence(max_students)).accrue(enrolled)

to make sure they're normalized into the same groupings.

-tkc
 
B

bearophileHUGS

Tim Chase:
__repr__ = __str__

I don't know if that's a good practice.

try:
data.setdefault(
state, {}).setdefault(
county, {}).setdefault(
cls, Attendence(max_students)).accrue(enrolled)
except TooManyAttendants:

I suggest to decompress that part a little, to make it a little more
readable.

Bye,
bearophile
 
T

Tim Chase

__repr__ = __str__
I don't know if that's a good practice.

I've seen it in a couple places, and it's pretty explicit what
it's doing.
I suggest to decompress that part a little, to make it a little more
readable.

I played around with the formatting and didn't really like any of
the formatting I came up with. My other possible alternatives were:

try:
data \
.setdefault(state, {}) \
.setdefault(county, {}) \
.setdefault(cls, Attendence(max_students)) \
.accrue(enrolled)
except TooManyAttendants:

or

try:
(data
.setdefault(state, {})
.setdefault(county, {})
.setdefault(cls, Attendence(max, 0))
).accrue(enrolled)
except TooManyAttendants:

Both accentuate the setdefault() calls grouped with their
parameters, which can be helpful. Which one is "better" is a
matter of personal preference:

* no extra characters but hard to read
* backslashes, or
* an extra pair of parens

-tkc
 
G

Gabriel Genellina

En Mon, 06 Oct 2008 22:52:29 -0300, Tim Chase

[[email protected] wrote]
I've seen it in a couple places, and it's pretty explicit what it's
doing.

__repr__ is used as a fallback for __str__, so just defining __repr__ (and
leaving out __str__) is enough.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,680
Members
48,796
Latest member
Greg L.

Latest Threads

Top