What is a perl hash in python

K

Karyn Williams

I am new to Pyton. I am trying to modify and understand a script someone
else wrote. I am trying to make sense of the following code snippet. I know
line 7 would be best coded with regex. I first would like to understand
what was coded originally. thelistOut looks like a hash to me (I'm more
familiar with perl). Perhaps someone could translate from perl to python
for me - not in code but just in concept.


Here is the code. This script is reading the list thelistOut and then
removing any items in RSMlist and taking the remainder and putting them in
graphAddressOut with the formatting.

This is a SAMPLE of what is in the lists referenced below in the loop:


thelistOut = [(632,
['/usr/local/www/data-dist/mrtg/main/test/172.16.0.23_9.log']), (145,
['/usr/local/www/data-dist/mrtg/main/test/172.16.0.23_13.log']), (0,
['/usr/local/www/data-dist/mrtg/main/test/172.16.0.23_5.log'])]

RSMList = ['172.16.0.1_1', '172.16.0.1_2', '172.16.0.1_3', '172.16.0.1_4',
'172.16.0.1_5']



#--------------------------Loop 1 -------------------------

w = 0
while w < 45:

fileOut = string.split(thelistOut[w][1][0],".log")
fileOutSplitedCommon = string.split(fileOut[0], "main/")
fileOut2D = string.split(fileOutSplitedCommon[1], "/")
fileOut = string.split(fileOut[0],"data-dist")

if fileOut2D[1] in RSMList:
w = w + 1
continue
graphAddressOut = tag1 + logUrl + fileOut[1] + extention1 + tag2 +
"<b>SWITCH: " + string.swapcase(fileOut2D[0]) + "&nbsp;&nbsp;&
nbsp;PORT ID: " + fileOut2D[1] + "</b><br>" + imgTitleTag + imgTag1 +
logUrl + fileOut[1] + extention2 + imgTag2 + tag3 + tag5
outputOut.append(graphAddressOut)
strOut = strOut + graphAddressOut

w = w + 1

#--------------------------Loop 1 -------------------------

--

Karyn Williams
Network Services Manager
California Institute of the Arts
(e-mail address removed)
http://www.calarts.edu/network
 
M

Marc 'BlackJack' Rintsch

Karyn Williams said:
I am new to Pyton. I am trying to modify and understand a script someone
else wrote. I am trying to make sense of the following code snippet. I know
line 7 would be best coded with regex.

What is line 7 in the snippet?
I first would like to understand what was coded originally. thelistOut
looks like a hash to me (I'm more familiar with perl).

It's a list which contains tuples. Each tuple contains an integer and a
list with one string that looks like a pathname.
Perhaps someone could translate from perl to python for me - not in code
but just in concept.

Which Perl? You gave us Python!?
Here is the code. This script is reading the list thelistOut and then
removing any items in RSMlist and taking the remainder and putting them
in graphAddressOut with the formatting.

There's nothing removed from `thelistOut`. Names where the
filename/basename without the extension is in `RSMList` are not processed
and added to `outputOut`.
This is a SAMPLE of what is in the lists referenced below in the loop:


thelistOut = [(632,
['/usr/local/www/data-dist/mrtg/main/test/172.16.0.23_9.log']), (145,
['/usr/local/www/data-dist/mrtg/main/test/172.16.0.23_13.log']), (0,
['/usr/local/www/data-dist/mrtg/main/test/172.16.0.23_5.log'])]

RSMList = ['172.16.0.1_1', '172.16.0.1_2', '172.16.0.1_3',
'172.16.0.1_4', '172.16.0.1_5']



#--------------------------Loop 1 -------------------------

w = 0
while w < 45:

The loop looks odd. Is it really a literal 45 here or are all elements of
`thelistOut` processed? Then a for loop over the list if you don't need
`w` for something other than indexing into the list or an `xrange()`
object are much cleaner than using a while loop and updating the counter
manually. That the second element of the tuple seems to be always a list
with one item looks odd too.
fileOut = string.split(thelistOut[w][1][0],".log")
fileOutSplitedCommon = string.split(fileOut[0], "main/")
fileOut2D = string.split(fileOutSplitedCommon[1], "/")
fileOut = string.split(fileOut[0],"data-dist")

This might be more readable and understandable if `os.path.splitext()` and
`os.path.split()` where used.
if fileOut2D[1] in RSMList:
w = w + 1
continue

Might be cleaner to negate the test and use the remaining code as body of
that ``if`` statement.
graphAddressOut = tag1 + logUrl + fileOut[1] + extention1 + tag2
+
"<b>SWITCH: " + string.swapcase(fileOut2D[0]) + "&nbsp;&nbsp;& nbsp;PORT
ID: " + fileOut2D[1] + "</b><br>" + imgTitleTag + imgTag1 + logUrl +
fileOut[1] + extention2 + imgTag2 + tag3 + tag5
outputOut.append(graphAddressOut)
strOut = strOut + graphAddressOut

That's an unreadable mess. Better use string formatting.

And last but not least: a hash is called dictionary in Python.

Ciao,
Marc 'BlackJack' Rintsch
 
D

Dennis Lee Bieber

I am new to Pyton. I am trying to modify and understand a script someone
else wrote. I am trying to make sense of the following code snippet. I know

"someone else" didn't write Python either, looking at that mishmash
line 7 would be best coded with regex. I first would like to understand
what was coded originally. thelistOut looks like a hash to me (I'm more
familiar with perl). Perhaps someone could translate from perl to python
for me - not in code but just in concept.


Here is the code. This script is reading the list thelistOut and then
removing any items in RSMlist and taking the remainder and putting them in
graphAddressOut with the formatting.

This is a SAMPLE of what is in the lists referenced below in the loop:


thelistOut = [(632,
['/usr/local/www/data-dist/mrtg/main/test/172.16.0.23_9.log']), (145,
['/usr/local/www/data-dist/mrtg/main/test/172.16.0.23_13.log']), (0,
['/usr/local/www/data-dist/mrtg/main/test/172.16.0.23_5.log'])]

This is a list containing three elements. Each element is a tuple
containing two sub-elements. The first sub-element appears to be an
integer (I have no idea of the significance of the value at this time).
The second sub-element is another list containing a single
sub-sub-element -- that sub-sub-element is a string (file path name).
RSMList = ['172.16.0.1_1', '172.16.0.1_2', '172.16.0.1_3', '172.16.0.1_4',
'172.16.0.1_5']



#--------------------------Loop 1 -------------------------

w = 0
while w < 45:
for w in xrange(45):
fileOut = string.split(thelistOut[w][1][0],".log")
fileOutSplitedCommon = string.split(fileOut[0], "main/")
fileOut2D = string.split(fileOutSplitedCommon[1], "/")
fileOut = string.split(fileOut[0],"data-dist")
Direct use of the string module is now frowned upon. Also, since
these are file path names, using operations in the os.path module would
be more appropriate...
if fileOut2D[1] in RSMList:
w = w + 1
continue

Confusing logic, having two places where "w" is incremented. Using a
"for" loop would mean neither increment statement is needed. Actually,
"w" isn't even needed, replace the while/for with

for fid in thelistOut:
fileOut = fid[1][0] #that [1] is getting the second element of
the tuple, and the [0] gets the string out of that list (why a list of
one element string data?)

graphAddressOut = tag1 + logUrl + fileOut[1] + extention1 + tag2 +
"<b>SWITCH: " + string.swapcase(fileOut2D[0]) + "&nbsp;&nbsp;&
nbsp;PORT ID: " + fileOut2D[1] + "</b><br>" + imgTitleTag + imgTag1 +
logUrl + fileOut[1] + extention2 + imgTag2 + tag3 + tag5

This could be cleaned up too, but I'll ignore it at the moment.
outputOut.append(graphAddressOut)
strOut = strOut + graphAddressOut

w = w + 1

#--------------------------Loop 1 -------------------------


I think what you call a "hash" in PERL is a dictionary in Python:

dct = { key1 : value1, ... , keyn : valuen }

aval = dct[keyx]

Nothing of the sort used in the code you show above.
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
K

Karyn Williams

"someone else" didn't write Python either, looking at that mishmash
<G>

Thanks, Marc and Dennis.

Actually as I think about it, this operation should be able to be done in
one loop, not the ten or so that it is currently taking.


Read in a file "*.log" (excluding certain named files "1.log"), total up x
number of rows of the 2nd and third columns, push (filename, total col2)
(filename, total col 3) to two lists, sort -r and generate one web page
each with the top ten.

That is what this script is supposed to be doing.
line 7 would be best coded with regex. I first would like to understand
what was coded originally. thelistOut looks like a hash to me (I'm more
familiar with perl). Perhaps someone could translate from perl to python
for me - not in code but just in concept.


Here is the code. This script is reading the list thelistOut and then
removing any items in RSMlist and taking the remainder and putting them in
graphAddressOut with the formatting.

This is a SAMPLE of what is in the lists referenced below in the loop:


thelistOut = [(632,
['/usr/local/www/data-dist/mrtg/main/test/172.16.0.23_9.log']), (145,
['/usr/local/www/data-dist/mrtg/main/test/172.16.0.23_13.log']), (0,
['/usr/local/www/data-dist/mrtg/main/test/172.16.0.23_5.log'])]

This is a list containing three elements. Each element is a tuple
containing two sub-elements. The first sub-element appears to be an
integer (I have no idea of the significance of the value at this time).
The second sub-element is another list containing a single
sub-sub-element -- that sub-sub-element is a string (file path name).
RSMList = ['172.16.0.1_1', '172.16.0.1_2', '172.16.0.1_3', '172.16.0.1_4',
'172.16.0.1_5']



#--------------------------Loop 1 -------------------------

w = 0
while w < 45:
for w in xrange(45):
fileOut = string.split(thelistOut[w][1][0],".log")
fileOutSplitedCommon = string.split(fileOut[0], "main/")
fileOut2D = string.split(fileOutSplitedCommon[1], "/")
fileOut = string.split(fileOut[0],"data-dist")
Direct use of the string module is now frowned upon.


For future reference, why is direct use of the string module frowned upon,
and what does one use instead ?

Also, since these are file path names, using operations in the os.path module would
be more appropriate...


I'll look into os.path, but what this loop should be doing is matching and
removing the entries from thelistOut ( and thelistIn )
that are listed in RSMlist. Or as is being done, not writing them to the
new list, outputOut (graphAddressOut).
Its just a matching operation, not really a path/filename op. This is why I
will be changing this to a regex.

if fileOut2D[1] in RSMList:
w = w + 1
continue

Confusing logic, having two places where "w" is incremented. Using a
"for" loop would mean neither increment statement is needed. Actually,
"w" isn't even needed, replace the while/for with

for fid in thelistOut:
fileOut = fid[1][0] #that [1] is getting the second element of

the tuple, and the [0] gets the string out of that list (why a list of
one element string data?)
graphAddressOut = tag1 + logUrl + fileOut[1] + extention1 + tag2 +
"<b>SWITCH: " + string.swapcase(fileOut2D[0]) + "&nbsp;&nbsp;&
nbsp;PORT ID: " + fileOut2D[1] + "</b><br>" + imgTitleTag + imgTag1 +
logUrl + fileOut[1] + extention2 + imgTag2 + tag3 + tag5

This could be cleaned up too, but I'll ignore it at the moment.
outputOut.append(graphAddressOut)
strOut = strOut + graphAddressOut

w = w + 1

#--------------------------Loop 1 -------------------------


I think what you call a "hash" in PERL is a dictionary in Python:

dct = { key1 : value1, ... , keyn : valuen }

aval = dct[keyx]

Nothing of the sort used in the code you show above.



--

Karyn Williams
Network Services Manager
California Institute of the Arts
(e-mail address removed)
http://www.calarts.edu/network
 
B

Bruno Desthuilliers

Karyn Williams a écrit :
I am new to Pyton. I am trying to modify and understand a script someone
else wrote. I am trying to make sense of the following code snippet. I know
line 7 would be best coded with regex. I first would like to understand
what was coded originally. thelistOut looks like a hash to me (I'm more
familiar with perl).

It's not a hash (the Python type for hashtables is 'dict'), it's a list
of 2-tuples. FWIW, the dict type can accept such a list as an argument
to it's constructor - but then you loose the ordering.

Also, the data structure is somewhat weird, since the second item of
each tuple is always a one-element list.
Perhaps someone could translate from perl to python

Do you mean "from Python to Perl" ?
for me - not in code but just in concept.


Here is the code. This script is reading the list thelistOut and then
removing any items in RSMlist and taking the remainder and putting them in
graphAddressOut with the formatting.

This is a SAMPLE of what is in the lists referenced below in the loop:


thelistOut = [(632,
['/usr/local/www/data-dist/mrtg/main/test/172.16.0.23_9.log']), (145,
['/usr/local/www/data-dist/mrtg/main/test/172.16.0.23_13.log']), (0,
['/usr/local/www/data-dist/mrtg/main/test/172.16.0.23_5.log'])]

RSMList = ['172.16.0.1_1', '172.16.0.1_2', '172.16.0.1_3', '172.16.0.1_4',
'172.16.0.1_5']



#--------------------------Loop 1 -------------------------

w = 0
while w < 45:

fileOut = string.split(thelistOut[w][1][0],".log")
fileOutSplitedCommon = string.split(fileOut[0], "main/")
fileOut2D = string.split(fileOutSplitedCommon[1], "/")
fileOut = string.split(fileOut[0],"data-dist")

if fileOut2D[1] in RSMList:
w = w + 1
continue
graphAddressOut = tag1 + logUrl + fileOut[1] + extention1 + tag2 +
"<b>SWITCH: " + string.swapcase(fileOut2D[0]) + "&nbsp;&nbsp;&
nbsp;PORT ID: " + fileOut2D[1] + "</b><br>" + imgTitleTag + imgTag1 +
logUrl + fileOut[1] + extention2 + imgTag2 + tag3 + tag5
outputOut.append(graphAddressOut)
strOut = strOut + graphAddressOut

w = w + 1

#--------------------------Loop 1 -------------------------

Yuck. This code stinks. Whoever wrote this ought to be shot down. I
refuse to try&clean this mess unless I get payed (and well payed).
 
S

sturlamolden

Karyn said:
I am new to Pyton. I am trying to modify and understand a script someone
else wrote. I am trying to make sense of the following code snippet. I know
line 7 would be best coded with regex. I first would like to understand
what was coded originally. thelistOut looks like a hash to me (I'm more
familiar with perl).

thelistOut seems to be a list of tuples. It also seems that one of the
tuple elements are a list containing a single string. To be honest,
this is one of the most ugly examples of Python code I have ever seen.
I am not sure I would trust code written like this at all. One can very
often tell the competence of the programmer from the looks of the code.

To answer the subject: An associative container in Python is called a
'dictionary'. CPython dictonaries are implemented using hash tables
(and one of the fastest hashing algorithms known to man). There is
nothing in the Python semantics that mandates this particular
implementation of dictionaries, though. Balanced binary trees could
have been used instead of hashes, as they usually are in STL's
associative containers, but in CPython a dictionary is implemented with
a hash table under the hood.

Dictionaries work like this:

mydict = { key1 : val1, key2 : val2, key3 : val3 }
oldval3 = mydict[key3]
mydict[key3] = newval3
mydict[key4] = val4
 
D

Dennis Lee Bieber

Actually as I think about it, this operation should be able to be done in
one loop, not the ten or so that it is currently taking.


Read in a file "*.log" (excluding certain named files "1.log"), total up x
number of rows of the 2nd and third columns, push (filename, total col2)
(filename, total col 3) to two lists, sort -r and generate one web page
each with the top ten.

That is what this script is supposed to be doing.
Unfortunately, your sample code reduced things so far all we have is
a mass of operations splitting local file paths and generating some sort
of HTML data from the names.

-=-=-=-=-=- Cleaned up as best I can understand the sample
import os.path

theListOut = [
(632, "/usr/local/www/data-dist/mrtg/main/test/172.16.0.23_9.log"),
(145, "/usr/local/www/data-dist/mrtg/main/test/172.16.0.23_13.log"),
# and similar for rest of entries -- note: no 1 element sublist
(0, "/usr/local/www/data-dist/mrtg/main/test/172.16.0.23_5.log")
]

RSMList = [ "172.16.0.1_1",
"172.16.0.1_2",
"172.16.0.1_3",
"172.16.0.1_4",
# and similar for rest of list
"172.16.0.1_5" ]

# note that, given the scheme shown in this small set of data, the
list
# could be created dynamically
#
#RSMList = [ "172.16.0.1_%s" % x for x in range(1, 6) ]
#
# but I expect you have more than just one sequence of IP addresses

# I'm just putting in some nonsense here to set variables used below
tag1 = '<a href="'
logUrl = "http://www.somesite.domain/logs/"
extention1 = ".log"
tag2 = '">'
imgTitleTag = "<h3>Some image</h3>"
imgTag1 = '<img src="'
extention2 = ".png"
imgTag2 = '">'
tag3 = "</a>"
tag5 = "<br>"

## w = 0
## while w < 45:
# use of "magic numbers"... I'm guessing the original theListOut has
# 45 tuples.

outputOut = []
strOut = [] #really a list (of strings) at this point
for (fidnum, fname) in theListOut: #process each tuple
# split the base filename from the path (directory) name
(dirn, basen) = os.path.split(fname)
# split off the extension and only keep the filename part
fileOut2D = os.path.splitext(basen)[0]

(root, subd) = dirn.split("data-dist")
subsubd = subd.split("main/")[1]

if fileOut2D not in RSMList:
graphAddressOut = "".join( [tag1,
logUrl,
subd[1:],
extention1,
tag2,
"<b>SWITCH: ",
subsubd.swapcase(), #swap?
"&nbsp;&nbsp;&nbsp;PORT ID: ",
fileOut2D,
"</b><br>",
imgTitleTag,
imgTag1,
logUrl,
subd[1:],
extention2,
imgTag2,
tag3,
tag5] )


# note that a templating system, like CherryTemplate perhaps,
# could simplify much of the above

outputOut.append(graphAddressOut)
strOut.append(graphAddressOut)

# convert strOut from list to single string
strOut = "\n".join(strOut)


print strOut
-=-=-=-=-=-=-=-=- Output (I had to add guesses to make it runnable)
<a href="http://www.somesite.domain/logs/mrtg/main/test.log"><b>SWITCH:
TEST&nbsp;&nbsp;&nbsp;PORT ID: 172.16.0.23_9</b><br><h3>Some
image</h3><img
src="http://www.somesite.domain/logs/mrtg/main/test.png"></a><br>
<a href="http://www.somesite.domain/logs/mrtg/main/test.log"><b>SWITCH:
TEST&nbsp;&nbsp;&nbsp;PORT ID: 172.16.0.23_13</b><br><h3>Some
image</h3><img
src="http://www.somesite.domain/logs/mrtg/main/test.png"></a><br>
<a href="http://www.somesite.domain/logs/mrtg/main/test.log"><b>SWITCH:
TEST&nbsp;&nbsp;&nbsp;PORT ID: 172.16.0.23_5</b><br><h3>Some
image</h3><img
src="http://www.somesite.domain/logs/mrtg/main/test.png"></a><br>


Then, just to waste an afternoon, CherryTemplated version...

-=-=-=-=-=-
import os.path
from cherrytemplate import renderTemplate

TEMPLATE = """<py-for="sdir, ssdir, fid in dataList">
<a href="<py-eval="logUrl"><py-eval="sdir"><py-eval="ext1">">
<b>SWITCH:
<py-eval="ssdir">&nbsp;&nbsp;&nbsp;&nbsp;PORT&nbsp;ID:
<py-eval="fid"></b>
<br><h3>Some image</h3>
<img src="<py-eval="logUrl"><py-eval="sdir"><py-eval="ext2">">
</a><br>
</py-for>
"""

theListOut = [
(632, "/usr/local/www/data-dist/mrtg/main/test/172.16.0.23_9.log"),
(145, "/usr/local/www/data-dist/mrtg/main/test/172.16.0.23_13.log"),
# and similar for rest of entries -- note: no 1 element sublist
(0, "/usr/local/www/data-dist/mrtg/main/test/172.16.0.23_5.log")
]

RSMList = [ "172.16.0.1_1",
"172.16.0.1_2",
"172.16.0.1_3",
"172.16.0.1_4",
# and similar for rest of list
"172.16.0.1_5" ]

# note that, given the scheme shown in this small set of data, the
list
# could be created dynamically
#
#RSMList = [ "172.16.0.1_%s" % x for x in range(1, 6) ]
#
# but I expect you have more than just one sequence of IP addresses



templateData = []
for (fidnum, fname) in theListOut: #process each tuple
# split the base filename from the path (directory) name
(dirn, basen) = os.path.split(fname)
# split off the extension and only keep the filename part
fileOut2D = os.path.splitext(basen)[0]

(root, subd) = dirn.split("data-dist")
subsubd = subd.split("main/")[1]

if fileOut2D not in RSMList:
templateData.append((subd[1:], subsubd.swapcase(), fileOut2D))

templateData.sort()
# render the template with the parsed data
strOut = renderTemplate(template=TEMPLATE,
loc = {"dataList" : templateData,
"logUrl" :
"http://www.somesite.domain/logs/",
"ext1" : ".log",
"ext2" : ".png" } )

print strOut

-=-=-=-=-=-=- output

<a href="http://www.somesite.domain/logs/mrtg/main/test.log">
<b>SWITCH: TEST&nbsp;&nbsp;&nbsp;&nbsp;PORT&nbsp;ID:
172.16.0.23_13</b>
<br><h3>Some image</h3>
<img src="http://www.somesite.domain/logs/mrtg/main/test.png">
</a><br>

<a href="http://www.somesite.domain/logs/mrtg/main/test.log">
<b>SWITCH: TEST&nbsp;&nbsp;&nbsp;&nbsp;PORT&nbsp;ID:
172.16.0.23_5</b>
<br><h3>Some image</h3>
<img src="http://www.somesite.domain/logs/mrtg/main/test.png">
</a><br>

<a href="http://www.somesite.domain/logs/mrtg/main/test.log">
<b>SWITCH: TEST&nbsp;&nbsp;&nbsp;&nbsp;PORT&nbsp;ID:
172.16.0.23_9</b>
<br><h3>Some image</h3>
<img src="http://www.somesite.domain/logs/mrtg/main/test.png">
</a><br>
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
D

Dennis Lee Bieber

For future reference, why is direct use of the string module frowned upon,
and what does one use instead ?
As of a few versions ago, string objects gained the methods.

x = string.split(aStr, splitOn)

becomes

x = aStr.split(splitOn)
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top