Nested dictionaries trouble

I

IamIan

Hello,

I'm writing a simple FTP log parser that sums file sizes as it runs. I
have a yearTotals dictionary with year keys and the monthTotals
dictionary as its values. The monthTotals dictionary has month keys
and file size values. The script works except the results are written
for all years, rather than just one year. I'm thinking there's an
error in the way I set my dictionaries up or reference them...

import glob, traceback

years = ["2005", "2006", "2007"]
months = ["01","02","03","04","05","06","07","08","09","10","11","12"]
# Create months dictionary to convert log values
logMonths =
{"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May":"05","Jun":"06","Jul":"07","Aug":"08","Sep":"09","Oct":"10","Nov":"11","Dec":"12"}
# Create monthTotals dictionary with default 0 value
monthTotals = dict.fromkeys(months, 0)
# Nest monthTotals dictionary in yearTotals dictionary
yearTotals = {}
for year in years:
yearTotals.setdefault(year, monthTotals)

currentLogs = glob.glob("/logs/ftp/*")

try:
for currentLog in currentLogs:
readLog = open(currentLog,"r")
for line in readLog.readlines():
if not line: continue
if len(line) < 50: continue
logLine = line.split()

# The 2nd element is month, 5th is year, 8th is filesize
# Counting from zero:

# Lookup year/month pair value
logMonth = logMonths[logLine[1]]
currentYearMonth = yearTotals[logLine[4]][logMonth]

# Update year/month value
currentYearMonth += int(logLine[7])
yearTotals[logLine[4]][logMonth] = currentYearMonth
except:
print "Failed on: " + currentLog
traceback.print_exc()

# Print dictionaries
for x in yearTotals.keys():
print "KEY",'\t',"VALUE"
print x,'\t',yearTotals[x]
#print " key",'\t',"value"
for y in yearTotals[x].keys():
print " ",y,'\t',yearTotals[x][y]


Thank you,
Ian
 
G

Gabriel Genellina

I'm writing a simple FTP log parser that sums file sizes as it runs. I
have a yearTotals dictionary with year keys and the monthTotals
dictionary as its values. The monthTotals dictionary has month keys
and file size values. The script works except the results are written
for all years, rather than just one year. I'm thinking there's an
error in the way I set my dictionaries up or reference them...
monthTotals = dict.fromkeys(months, 0)
# Nest monthTotals dictionary in yearTotals dictionary
yearTotals = {}
for year in years:
yearTotals.setdefault(year, monthTotals)

All your years share the *same* monthTotals object.
This is similar to this FAQ entry:
<http://effbot.org/pyfaq/how-do-i-create-a-multidimensional-list.htm>
You have to create a new dict for each year; replace the above code with:

yearTotals = {}
for year in years:
yearTotals[year] = dict.fromkeys(months, 0)
 
T

Terry Reedy

| Hello,
|
| I'm writing a simple FTP log parser that sums file sizes as it runs. I
| have a yearTotals dictionary with year keys and the monthTotals
| dictionary as its values. The monthTotals dictionary has month keys
| and file size values. The script works except the results are written
| for all years, rather than just one year. I'm thinking there's an
| error in the way I set my dictionaries up or reference them...
|
| import glob, traceback
|
| years = ["2005", "2006", "2007"]
| months = ["01","02","03","04","05","06","07","08","09","10","11","12"]
| # Create months dictionary to convert log values
| logMonths =
|
{"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May":"05","Jun":"06","Jul":"07","Aug":"08","Sep":"09","Oct":"10","Nov":"11","Dec":"12"}
| # Create monthTotals dictionary with default 0 value
| monthTotals = dict.fromkeys(months, 0)
| # Nest monthTotals dictionary in yearTotals dictionary
| yearTotals = {}
| for year in years:
| yearTotals.setdefault(year, monthTotals)

try yearTotals.setdefault(year, dict.fromkeys(months, 0))
so you start with a separate subdict for each year instead of 1 for all.

tjr
 
7

7stud

1) You have this setup:

logMonths = {"Jan":"01", "Feb":"02",...}
yearTotals = {
"2005":{"01":0, "02":0, ....}
"2006":
"2007":
}

Then when you get a value such as "Jan", you look up the "Jan" in the
logMonths dictionary to get "01". Then you use "01" and the year, say
"2005", to look up the value in the yearTotals dictionary. Why do
that? What is the point of even having the logMonths dictionary? Why
not make "Jan" the key in the the "2005" dictionary and look it up
directly:

yearTotals = {
"2005":{"Jan":0, "Feb":0, ....}
"2006":
"2007":
}

That way you could completely eliminate the lookup in the logMonths
dict.

2) In this part:

logMonth = logMonths[logLine[1]]
currentYearMonth = yearTotals[logLine[4]][logMonth]
# Update year/month value
currentYearMonth += int(logLine[7])
yearTotals[logLine[4]][logMonth] = currentYearMonth

I'm not sure why you are using all those intermediate steps. How
about:

yearTotals[logLine[4]][logLine[1]] += int(logLine[7])

To me that is a lot clearer. Or, you could do this:

year, month, val = logLine[4], logLine[1], int(logLine[7])
yearTotals[year][month] += val

3)
I'm thinking there's an error in the way
I set my dictionaries up or reference them

Yep. It's right here:

for year in years:
yearTotals.setdefault(year, monthTotals)

Every year refers to the same monthTotals dict. You can use a dicts
copy() function to make a copy:

monthTotals.copy()

Here is a reworking of your code that also eliminates a lot of typing:

import calendar, pprint

years = ["200%s" % x for x in range(5, 8)]
print years

months = list(calendar.month_abbr)
print months

monthTotals = dict.fromkeys(months[1:], 0)
print monthTotals

yearTotals = {}
for year in years:
yearTotals.setdefault(year, monthTotals.copy())
pprint.pprint(yearTotals)

logs = [
["", "Feb", "", "", "2007", "", "", "12"],
["", "Jan", "", "", "2005", "", "", "3"],
["", "Jan", "", "", "2005", "", "", "7"],
]

for logLine in logs:
year, month, val = logLine[4], logLine[1], int(logLine[7])
yearTotals[year][month] += val

for x in yearTotals.keys():
print "KEY", "\t", "VALUE"
print x, "\t", yearTotals[x]
for y in yearTotals[x].keys():
print " ", y, "\t", yearTotals[x][y]
 
B

Bruno Desthuilliers

IamIan a écrit :
Hello,

I'm writing a simple FTP log parser that sums file sizes as it runs. I
have a yearTotals dictionary with year keys and the monthTotals
dictionary as its values. The monthTotals dictionary has month keys
and file size values. The script works except the results are written
for all years, rather than just one year. I'm thinking there's an
error in the way I set my dictionaries up or reference them...

import glob, traceback

years = ["2005", "2006", "2007"]
months = ["01","02","03","04","05","06","07","08","09","10","11","12"]
# Create months dictionary to convert log values
logMonths =
{"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May":"05","Jun":"06","Jul":"07","Aug":"08","Sep":"09","Oct":"10","Nov":"11","Dec":"12"}

DRY violation alert !

logMonths = {
"Jan":"01",
"Feb":"02",
"Mar":"03",
"Apr":"04",
"May":"05",
#etc
}

months = sorted(logMonths.values())
# Create monthTotals dictionary with default 0 value
monthTotals = dict.fromkeys(months, 0)
# Nest monthTotals dictionary in yearTotals dictionary
yearTotals = {}
for year in years:
yearTotals.setdefault(year, monthTotals)

A complicated way to write:
yearTotals = dict((year, monthTotals) for year in years)

And without even reading further, I can tell you have a problem here:
all 'year' entry in yearTotals points to *the same* monthTotal dict
instance. So when updating yearTotals['2007'], you see the change
reflected for all years. The cure is simple: forget the monthTotals
object, and define your yearTotals dict this way:

yearTotals = dict((year, dict.fromkeys(months, 0)) for year in years)

NB : for Python versions < 2.4.x, you need a list comp instead of a
generator expression, ie:

yearTotals = dict([(year, dict.fromkeys(months, 0)) for year in years])

HTH
 
7

7stud

IamIan said:
Hello,

I'm writing a simple FTP log parser that sums file sizes as it runs. I
have a yearTotals dictionary with year keys and the monthTotals
dictionary as its values. The monthTotals dictionary has month keys
and file size values. The script works except the results are written
for all years, rather than just one year. I'm thinking there's an
error in the way I set my dictionaries up or reference them...

import glob, traceback

years = ["2005", "2006", "2007"]
months = ["01","02","03","04","05","06","07","08","09","10","11","12"]
# Create months dictionary to convert log values
logMonths =
{"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May":"05","Jun":"06","Jul":"07","Aug":"08","Sep":"09","Oct":"10","Nov":"11","Dec":"12"}
# Create monthTotals dictionary with default 0 value
monthTotals = dict.fromkeys(months, 0)
# Nest monthTotals dictionary in yearTotals dictionary
yearTotals = {}
for year in years:
yearTotals.setdefault(year, monthTotals)

currentLogs = glob.glob("/logs/ftp/*")

try:
for currentLog in currentLogs:
readLog = open(currentLog,"r")
for line in readLog.readlines():
if not line: continue
if len(line) < 50: continue
logLine = line.split()

# The 2nd element is month, 5th is year, 8th is filesize
# Counting from zero:

# Lookup year/month pair value
logMonth = logMonths[logLine[1]]
currentYearMonth = yearTotals[logLine[4]][logMonth]

# Update year/month value
currentYearMonth += int(logLine[7])
yearTotals[logLine[4]][logMonth] = currentYearMonth
except:
print "Failed on: " + currentLog
traceback.print_exc()

# Print dictionaries
for x in yearTotals.keys():
print "KEY",'\t',"VALUE"
print x,'\t',yearTotals[x]
#print " key",'\t',"value"
for y in yearTotals[x].keys():
print " ",y,'\t',yearTotals[x][y]


Thank you,
Ian


1) You have this setup:

logMonths = {"Jan":"01", "Feb":"02",...}
yearTotals = {
"2005":{"01":0, "02":0, ....}
"2006":
"2007":
}

Then when you get a result such as "Jan", you look up "Jan" in the
logMonths dictionary to get "01". Then you use "01" and the year, say
"2005", to look up the value in the yearTotals dictionary. What is
the point of even having the logMonths dictionary? Why not make "Jan"
the key in the the "2005" dictionary and look it up directly:

yearTotals = {
"2005":{"Jan":0, "Feb":0, ....}
"2006":
"2007":
}

That way you could completely eliminate the lookup in the logMonths
dict.

2) In this part:

logMonth = logMonths[logLine[1]]
currentYearMonth = yearTotals[logLine[4]][logMonth]
# Update year/month value
currentYearMonth += int(logLine[7])
yearTotals[logLine[4]][logMonth] = currentYearMonth

I'm not sure why you are using all those intermediate steps. How
about:

yearTotals[logLine[4]][logLine[1]] += int(logLine[7])

To me that is a lot clearer. Or, you could do this:

year, month, val = logLine[4], logLine[1], int(logLine[7])
yearTotals[year][month] += val

3)
I'm thinking there's an error in the way
I set my dictionaries up or reference them

Yep. It's right here:

for year in years:
yearTotals.setdefault(year, monthTotals)

Every year refers to the same monthTotals dict. You can use a dict's
copy() function to make a copy:

monthTotals.copy()

Here is a reworking of your code that also eliminates a lot of typing:

import calendar, pprint

years = ["200%s" % x for x in range(5, 8)]
print years

months = list(calendar.month_abbr)
print months

monthTotals = dict.fromkeys(months[1:], 0)
print monthTotals

yearTotals = {}
for year in years:
yearTotals.setdefault(year, monthTotals.copy())
pprint.pprint(yearTotals)

logs = [
["", "Feb", "", "", "2007", "", "", "12"],
["", "Jan", "", "", "2005", "", "", "3"],
["", "Jan", "", "", "2005", "", "", "7"],
]

for logLine in logs:
year, month, val = logLine[4], logLine[1], int(logLine[7])
yearTotals[year][month] += val

for x in yearTotals.keys():
print "KEY", "\t", "VALUE"
print x, "\t", yearTotals[x]
for y in yearTotals[x].keys():
print " ", y, "\t", yearTotals[x][y]
 
I

IamIan

Thank you everyone for the helpful replies. Some of the solutions were
new to me, but the script now runs successfully. I'm still learning to
ride the snake but I love this language!

Ian
 
I

IamIan

I am using the suggested approach to make a years list:

years = ["199%s" % x for x in range(0,10)]
years += ["200%s" % x for x in range(0,10)]

I haven't had any luck doing this in one line though. Is it possible?

Thanks.
 
S

Steven W. Orr

On Wednesday, Apr 18th 2007 at 12:16 -0700, quoth IamIan:

=>I am using the suggested approach to make a years list:
=>
=>years = ["199%s" % x for x in range(0,10)]
=>years += ["200%s" % x for x in range(0,10)]
=>
=>I haven't had any luck doing this in one line though. Is it possible?

I'm so green that I almost get a chubby at being able to answer something.
;-)

years = [str(1990+x) for x in range(0,20)]

Yes?

--
Time flies like the wind. Fruit flies like a banana. Stranger things have .0.
happened but none stranger than this. Does your driver's license say Organ ..0
Donor?Black holes are where God divided by zero. Listen to me! We are all- 000
individuals! What if this weren't a hypothetical question?
steveo at syslang.net
 
M

Marc 'BlackJack' Rintsch

years = ["199%s" % x for x in range(0,10)]
years += ["200%s" % x for x in range(0,10)]

I haven't had any luck doing this in one line though. Is it possible?

In [48]: years = map(str, xrange(1999, 2011))

In [49]: years
Out[49]:
['1999',
'2000',
'2001',
'2002',
'2003',
'2004',
'2005',
'2006',
'2007',
'2008',
'2009',
'2010']

Ciao,
Marc 'BlackJack' Rintsch
 
S

Steven D'Aprano

I am using the suggested approach to make a years list:

years = ["199%s" % x for x in range(0,10)]
years += ["200%s" % x for x in range(0,10)]

I haven't had any luck doing this in one line though. Is it possible?

years = ["199%s" % x for x in range(0,10)] + \
["200%s" % x for x in range(0,10)]

Sorry for the line continuation, my news reader insists on breaking the
line. In your editor, just delete the "\" and line break to make it a
single line.


If you don't like that solution, here's a better one:

years = [str(1990 + n) for n in range(20)]

Or there's this:

years = [str(n) for n in range(1990, 2010)]

Or this one:

years = map(str, range(1990, 2010))
 
I

IamIan

Thank you again for the great suggestions. I have one final question
about creating a httpMonths dictionary like {'Jan':'01' , 'Feb':'02' ,
etc} with a minimal amount of typing. My code follows (using Python
2.3.4):

import calendar

# Create years list, formatting as strings
years = map(str, xrange(1990,2051))

# Create months list with three letter abbreviations
months = list(calendar.month_abbr)

# Create monthTotals dictionary with default value of zero
monthTotals = dict.fromkeys(months[1:],0)

# Create yearTotals dictionary with years for keys
# and copies of the monthTotals dictionary for values
yearTotals = dict([(year, monthTotals.copy()) for year in years])

# Create httpMonths dictionary to map month abbreviations
# to Apache numeric month representations
httpMonths =
{"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May":"05","Jun":"06","Jul":"07","Aug":"08","Sep":"09","Oct":"10","Nov":"11","Dec":"12"}

It is this last step I'm referring to. I got close with:
httpMonths = {}
for month in months[1:]:
httpMonths[month] = str(len(httpMonths)+1)

but the month numbers are missing the leading zero for 01-09. Thanks!

Ian
 
R

rzed

Thank you again for the great suggestions. I have one final
question about creating a httpMonths dictionary like {'Jan':'01'
, 'Feb':'02' , etc} with a minimal amount of typing. My code
follows (using Python 2.3.4):

import calendar

# Create years list, formatting as strings
years = map(str, xrange(1990,2051))

# Create months list with three letter abbreviations
months = list(calendar.month_abbr)

# Create monthTotals dictionary with default value of zero
monthTotals = dict.fromkeys(months[1:],0)

# Create yearTotals dictionary with years for keys
# and copies of the monthTotals dictionary for values
yearTotals = dict([(year, monthTotals.copy()) for year in
years])

# Create httpMonths dictionary to map month abbreviations
# to Apache numeric month representations
httpMonths =
{"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May":"05","Jun":"0 6
","Jul":"07","Aug":"08","Sep":"09","Oct":"10","Nov":"11","Dec":"
1
2"}

It is this last step I'm referring to. I got close with:
httpMonths = {}
for month in months[1:]:
httpMonths[month] = str(len(httpMonths)+1)

but the month numbers are missing the leading zero for 01-09.
Thanks!

Maybe something like:
httpMonths = dict((k,"%02d" % (x+1))
for x,k in enumerate(months[1:]) )
 
B

Bruno Desthuilliers

IamIan a écrit :
I am using the suggested approach to make a years list:

years = ["199%s" % x for x in range(0,10)]
years += ["200%s" % x for x in range(0,10)]

I haven't had any luck doing this in one line though. Is it possible?

# Q, D and pretty obvious
years = ["199%s" % x for x in range(0,10)] + ["200%s" % x for x in
range(0,10)]

# hardly more involved, and quite more generic
years = ["%s%s" % (c, y) for c in ("199", "201") for y in range(10)]
 
D

Dennis Lee Bieber

IamIan a écrit :
I am using the suggested approach to make a years list:

years = ["199%s" % x for x in range(0,10)]
years += ["200%s" % x for x in range(0,10)]

I haven't had any luck doing this in one line though. Is it possible?

# Q, D and pretty obvious
years = ["199%s" % x for x in range(0,10)] + ["200%s" % x for x in
range(0,10)]

# hardly more involved, and quite more generic
years = ["%s%s" % (c, y) for c in ("199", "201") for y in range(10)]
years = ["%4.4d" % (1990 + i) for i in range(20)]
years
['1990', '1991', '1992', '1993', '1994', '1995', '1996', '1997', '1998',
'1999', '2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007',
'2008', '2009']
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top