newbie: datastructure `dictionary' question

J

jason

Hello,

I am completely new to python and I have question that I unfortunately
could not find in the various documentation online. My best guess is
that the answer should be quitte easy but I have just enterd the learning
phase so that means a hightend chance for stupidity and mistakes on my
part.

I am trying to fill a nested dictionary from parsing a logfile. However
each time there is only one key entry created and that's it. Just
one entry, while the keys are different. That's 100% sure. I think
therefore that it is an assignment error in my part. [there we have it...]

To give an static example of the datastructure that I am using to clear
any confusion on the datastructure part:

records = { 'fam/jason-a' : {
'date' : 'Fri Sep 8 16:45:55 2006',
'from' : 'jason',
'subject' : 'Re: Oh my goes.....',
'msize' : '237284' },
'university/solar-system' : {
'date' : 'Fri Sep 8 16:45:46 2006',
'from' : 'jd',
'subject' : 'Vacancies for students',
'msize' : '9387' }
}

Looping over this datastructure is no problem.
rkeys = ['date', 'from', 'subject', 'msize']
for folder in records.keys():
print '--'
print folder
for key in rkeys:
print records[folder][key]

Now for the actual program/dynamic part - assignment in the loop I use the
following function. Note `datum' is not a date object, just a string.

def parselog(data):
other = 0
records = {}

for line in string.split(data, '\n'):
str = line.strip()
if str[:4] == 'From':
mfrom, datum = extrfrom(str), extrdate(str)
print datum, mfrom
elif str[:4] == 'Fold':
folder = extrfolder(str[8:])
records = {folder : { 'date' : datum, 'mesgbytes' : extrmsize(str[8:]), 'mesgcount' : 1}}
else:
other += 1

displrec(records)

Note, this is not ment as a collision type datastructure, all initial data
entries are unique. My question: Where is my assignment e.g. records =
{folder.... wrong ?

Thankx in advance for any tips, hints and answers.

Cheers,

Jason.
 
D

Diez B. Roggisch

jason said:
Hello,

I am completely new to python and I have question that I unfortunately
could not find in the various documentation online. My best guess is
that the answer should be quitte easy but I have just enterd the learning
phase so that means a hightend chance for stupidity and mistakes on my
part.

I am trying to fill a nested dictionary from parsing a logfile. However
each time there is only one key entry created and that's it. Just
one entry, while the keys are different. That's 100% sure. I think
therefore that it is an assignment error in my part. [there we have it...]

To give an static example of the datastructure that I am using to clear
any confusion on the datastructure part:

records = { 'fam/jason-a' : {
'date' : 'Fri Sep 8 16:45:55 2006',
'from' : 'jason',
'subject' : 'Re: Oh my goes.....',
'msize' : '237284' },
'university/solar-system' : {
'date' : 'Fri Sep 8 16:45:46 2006',
'from' : 'jd',
'subject' : 'Vacancies for students',
'msize' : '9387' }
}

Looping over this datastructure is no problem.
rkeys = ['date', 'from', 'subject', 'msize']
for folder in records.keys():
print '--'
print folder
for key in rkeys:
print records[folder][key]

Now for the actual program/dynamic part - assignment in the loop I use the
following function. Note `datum' is not a date object, just a string.

def parselog(data):
other = 0
records = {}

for line in string.split(data, '\n'):
str = line.strip()
if str[:4] == 'From':
mfrom, datum = extrfrom(str), extrdate(str)
print datum, mfrom
elif str[:4] == 'Fold':
folder = extrfolder(str[8:])
records = {folder : { 'date' : datum, 'mesgbytes' : extrmsize(str[8:]), 'mesgcount' : 1}}
else:
other += 1

displrec(records)

Note, this is not ment as a collision type datastructure, all initial data
entries are unique. My question: Where is my assignment e.g. records =
{folder.... wrong ?

What you essentially do is this:

records = {"some" : "dict"}
records = {"some other" : "dict"}

You rebind the name records to a new dictionary. So all your previously
stored data is garbage collected.

What you most probably want to do (I'm a bit confused about your code &
too lazy to dig deeper):

records = {}

records[folder] = {'date' : ...}

Notice that the dict[key]=value syntax mutates the existing dictionary.

Diez
 
J

John Machin

jason said:
Hello,

I am completely new to python and I have question that I unfortunately
could not find in the various documentation online. My best guess is
that the answer should be quitte easy but I have just enterd the learning
phase so that means a hightend chance for stupidity and mistakes on my
part.

I am trying to fill a nested dictionary from parsing a logfile. However
each time there is only one key entry created and that's it. Just
one entry, while the keys are different. That's 100% sure. I think
therefore that it is an assignment error in my part. [there we have it...]

To give an static example of the datastructure that I am using to clear
any confusion on the datastructure part:

records = { 'fam/jason-a' : {
'date' : 'Fri Sep 8 16:45:55 2006',
'from' : 'jason',
'subject' : 'Re: Oh my goes.....',
'msize' : '237284' },
'university/solar-system' : {
'date' : 'Fri Sep 8 16:45:46 2006',
'from' : 'jd',
'subject' : 'Vacancies for students',
'msize' : '9387' }
}

Looping over this datastructure is no problem.
rkeys = ['date', 'from', 'subject', 'msize']
for folder in records.keys():
print '--'
print folder
for key in rkeys:
print records[folder][key]

Now for the actual program/dynamic part - assignment in the loop I use the
following function. Note `datum' is not a date object, just a string.

def parselog(data):
other = 0
records = {}

for line in string.split(data, '\n'):
str = line.strip()
if str[:4] == 'From':
mfrom, datum = extrfrom(str), extrdate(str)
print datum, mfrom
elif str[:4] == 'Fold':
folder = extrfolder(str[8:])
records = {folder : { 'date' : datum, 'mesgbytes' : extrmsize(str[8:]), 'mesgcount' : 1}}

You are *assigning* records = blahblah each time around. "records" will
end up being bound to the blahblah related to the *last* record that
you read.

You can do it item by item:
records[folder]['date'] = datum
etc
or as a oneliner:
records[folder] = {'date' : datum, 'mesgbytes' :
extrmsize(str[8:]), 'mesgcount' : 1}

When you find yourself using a dictionary with constant keys like
'date', it's time to start thinking OO.

class LogMessage(object):
def __init__(self, date, .....)
self.date = date
etc

then later:

records[folder] = LogMessage(
date=datum,
mesgbytes= extrmsize(str[8:]),
mesgcount=1,
)


[snip]

HTH,
John
 
J

jason

<cut>
....
</cut>

Owww.. Of course... ! Thankx for the answer and the suggestion. It really
helped me a lot. I defintely going to take the OO approach later on.

thankx again for the quick reply.

Jason.
You are *assigning* records = blahblah each time around. "records" will
end up being bound to the blahblah related to the *last* record that you
read.

You can do it item by item:
records[folder]['date'] = datum
etc
or as a oneliner:
records[folder] = {'date' : datum, 'mesgbytes' :
extrmsize(str[8:]), 'mesgcount' : 1}

When you find yourself using a dictionary with constant keys like
'date', it's time to start thinking OO.

class LogMessage(object):
def __init__(self, date, .....)
self.date = date
etc

then later:

records[folder] = LogMessage(
date=datum,
mesgbytes= extrmsize(str[8:]),
mesgcount=1,
)


[snip]

HTH,
John
 
B

Bruno Desthuilliers

jason a écrit :

Just some more suggestions:
def parselog(data):
other = 0
records = {}

for line in string.split(data, '\n'):
for line in data.split('\n'):
str = line.strip()
This will shadow the builtin 'str' type. You could reassign to 'line'
instead, or manage to get stripped lines already:
for line in map(str.strip, data.split('\n'):
if str[:4] == 'From':
mfrom, datum = extrfrom(str), extrdate(str)
print datum, mfrom

Mixing processing with IO may not be a good idea...
elif str[:4] == 'Fold':

line_type = line[:4]
if line_type == 'From':
# code here
elif line_type == 'Fold':
folder = extrfolder(str[8:])
records = {folder : { 'date' : datum, 'mesgbytes' : extrmsize(str[8:]), 'mesgcount' : 1}}
You now know that it should be:
records[folder] = {...}

else:
other += 1

displrec(records)

As a last note, you may want to pay more attention to your namings...
ie, 'display_records()' is much more readable than 'displrec()' and
still not to long to type !-)

My 2 cents (and a half)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,602
Members
45,182
Latest member
BettinaPol

Latest Threads

Top