[Newbie Q on String & List Manipulation]

M

Matthew

Hello All,

today is the first day i try to programming in Python,
my assignment is, write a silly script that probably
will run a few times a day to check if the Gmail services
is ready or not. ;)

however, i encountered some problem when playing with the
list and string.

i'm using Python 2.2.2 on Redhat. if i write something like:

a = "one"
b = "two"
a += b
print a

i will get:

onetwo

ok, seems quite ok, however, not sure why it doesn't work on
my silly Gmail script (pls refer to my script belows):

for item in thecookies:
mycookies += item

print mycookies

i have exactly 4 items in the "thecookies" list, however, when
printing out "mycookies", it just show the last item (in fact,
seems the 4 items have been overlapped each others).

could somebody pls kindly take a look at my silly script and
gimme some advise?

thanks very much in advance! :)

---
matthew




import re
import string
import sys
import urllib

user = "(e-mail address removed)"
pswd = "dapassword"

schm = "https://"
host = "www.google.com"
path = "/accounts/ServiceLoginBoxAuth"
qstr = {"service" : "mail", \
"continue" : "http://gmail.google.com/", \
"Email" : user, \
"Passwd" : pswd}

qstr = urllib.urlencode(qstr)

url = schm + host + path + "?" + qstr

conn = urllib.urlopen(url)

headers = conn.info().headers
response = conn.read()

thecookies = []

#
# extract all the Set-Cookie from the HTTP response header and put it in thecookies
#

for header in headers:
matches = re.compile("^Set-Cookie: (.*)$").search(header)
if matches:
thecookies.append(matches.group(1))

#
# make sure we've grep the SID or die
#

foundsessionid = 0

for item in thecookies:
if re.compile("^SID").search(item):
foundsessionid = 1
break

if not foundsessionid:
print "> Failded to retrieve the \"SID\" cookie"
sys.exit()

#
# grep the GV cookie from the HTTP response or die
#

matches = re.compile("^\s*var cookieVal= \"(.*)\";.*", re.M).search(response)

if matches:
thecookies.append("GV=" + matches.group(1))
else:
print "> Failed to retrieve the \"GV\" cookie"
sys.exit()

print thecookies

mycookies = ""

for item in thecookies:
mycookies += item

print mycookies

#
# still got many things to do right here...
#

sys.exit()
 
G

Gabriel Cooper

Matthew said:
Hello All,

today is the first day i try to programming in Python,
my assignment is, write a silly script that probably
will run a few times a day to check if the Gmail services
is ready or not. ;)

however, i encountered some problem when playing with the
list and string.

[...]

for item in thecookies:
mycookies += item
[...]
Try:

mycookies = string.join(thecookies, "")
 
L

Larry Bates

If you want to join all the items in a list together
try:

mycookies="".join(thecookies)

Secondly, your append to the cookies list is not
inside your loop. In your code it will only
get executed a single time (after exiting the
loop) which is most likely why you only see the
LAST item. Remember that indention in Python
has meaning!

Something more like this:

for item in thecookies:
if re.compile("^SID").search(item):
foundsessionid = 1
break

if not foundsessionid:
print '> Failed to retrieve the "SID" cookie'
sys.exit()

#
# grep the GV cookie from the HTTP response or die
#
matches = re.compile('^\s*var cookieVal= "(.*)";.*',
re.M).search(response)
if matches:
thecookies.append("GV=" + matches.group(1))
else:
print '> Failed to retrieve the "GV" cookie'
sys.exit()

print thecookies

mycookies = "".join(thecookies)

Regards,
Larry Bates
Syscon, Inc.

Matthew said:
Hello All,

today is the first day i try to programming in Python,
my assignment is, write a silly script that probably
will run a few times a day to check if the Gmail services
is ready or not. ;)

however, i encountered some problem when playing with the
list and string.

i'm using Python 2.2.2 on Redhat. if i write something like:

a = "one"
b = "two"
a += b
print a

i will get:

onetwo

ok, seems quite ok, however, not sure why it doesn't work on
my silly Gmail script (pls refer to my script belows):

for item in thecookies:
mycookies += item

print mycookies

i have exactly 4 items in the "thecookies" list, however, when
printing out "mycookies", it just show the last item (in fact,
seems the 4 items have been overlapped each others).

could somebody pls kindly take a look at my silly script and
gimme some advise?

thanks very much in advance! :)

---
matthew




import re
import string
import sys
import urllib

user = "(e-mail address removed)"
pswd = "dapassword"

schm = "https://"
host = "www.google.com"
path = "/accounts/ServiceLoginBoxAuth"
qstr = {"service" : "mail", \
"continue" : "http://gmail.google.com/", \
"Email" : user, \
"Passwd" : pswd}

qstr = urllib.urlencode(qstr)

url = schm + host + path + "?" + qstr

conn = urllib.urlopen(url)

headers = conn.info().headers
response = conn.read()

thecookies = []

#
# extract all the Set-Cookie from the HTTP response header and put it in thecookies
#

for header in headers:
matches = re.compile("^Set-Cookie: (.*)$").search(header)
if matches:
thecookies.append(matches.group(1))

#
# make sure we've grep the SID or die
#

foundsessionid = 0

for item in thecookies:
if re.compile("^SID").search(item):
foundsessionid = 1
break

if not foundsessionid:
print "> Failded to retrieve the \"SID\" cookie"
sys.exit()

#
# grep the GV cookie from the HTTP response or die
#

matches = re.compile("^\s*var cookieVal= \"(.*)\";.*", re.M).search(response)

if matches:
thecookies.append("GV=" + matches.group(1))
else:
print "> Failed to retrieve the \"GV\" cookie"
sys.exit()

print thecookies

mycookies = ""

for item in thecookies:
mycookies += item

print mycookies

#
# still got many things to do right here...
#

sys.exit()
 
W

wes weston

Matthew said:
Hello All,

today is the first day i try to programming in Python,
my assignment is, write a silly script that probably
will run a few times a day to check if the Gmail services
is ready or not. ;)

however, i encountered some problem when playing with the
list and string.

i'm using Python 2.2.2 on Redhat. if i write something like:

a = "one"
b = "two"
a += b
print a

i will get:

onetwo

ok, seems quite ok, however, not sure why it doesn't work on
my silly Gmail script (pls refer to my script belows):

for item in thecookies:
mycookies += item

print mycookies

i have exactly 4 items in the "thecookies" list, however, when
printing out "mycookies", it just show the last item (in fact,
seems the 4 items have been overlapped each others).

could somebody pls kindly take a look at my silly script and
gimme some advise?

thanks very much in advance! :)

---
matthew




import re
import string
import sys
import urllib

user = "(e-mail address removed)"
pswd = "dapassword"

schm = "https://"
host = "www.google.com"
path = "/accounts/ServiceLoginBoxAuth"
qstr = {"service" : "mail", \
"continue" : "http://gmail.google.com/", \
"Email" : user, \
"Passwd" : pswd}

qstr = urllib.urlencode(qstr)

url = schm + host + path + "?" + qstr

conn = urllib.urlopen(url)

headers = conn.info().headers
response = conn.read()

thecookies = []

#
# extract all the Set-Cookie from the HTTP response header and put it in thecookies
#

for header in headers:
matches = re.compile("^Set-Cookie: (.*)$").search(header)
if matches:
thecookies.append(matches.group(1))

#
# make sure we've grep the SID or die
#

foundsessionid = 0

for item in thecookies:
if re.compile("^SID").search(item):
foundsessionid = 1
break

if not foundsessionid:
print "> Failded to retrieve the \"SID\" cookie"
sys.exit()

#
# grep the GV cookie from the HTTP response or die
#

matches = re.compile("^\s*var cookieVal= \"(.*)\";.*", re.M).search(response)

if matches:
thecookies.append("GV=" + matches.group(1))
else:
print "> Failed to retrieve the \"GV\" cookie"
sys.exit()

print thecookies

mycookies = ""

for item in thecookies:
mycookies += item

print mycookies

#
# still got many things to do right here...
#

sys.exit()
>>> sum=""
>>> list=["a","b","c","d"]
>>> for x in list:
.... sum+=x
....'abcd'
 
M

Matthew

hello all,

thanks very much for you guys' replies. :)

i guess i already have properly indented the code,
and i also tried the string.join() method.

i changed the script a bit to test both the a+=b
and the string.join().

pls kindly take a look at the output and the script
below.

thanks in advance.

---
matthew


##########
# OUTPUT #
##########

=== the content of the "thecookies" list ===
['=zh_HK; Expires=Sat, 16-Apr-05 04:57:58 GMT; Path=/\r',
'Session=zh_HK\r', 'SID=AejhWhifAlWLXGi3lnBd3PiLeNUkoasZRP9kKXc0Es_o;Domain=.google.com;Path=/\r',
'GV=fbf1ad9eb8-4bbb676189c513f10bfa42556f57c6ac']

=== the content of the string: "mycookies" ===
GV=fbf1ad9eb8-4bbb676189c513f10bfa42556f57c6ac_o;Domain=.google.com;Path=/

=== use the string.join(): "".join(thecookies) ===
GV=fbf1ad9eb8-4bbb676189c513f10bfa42556f57c6ac_o;Domain=.google.com;Path=/

############
# Gmail.py #
############

#
# A Python script that would logging into the Gmail services and check
# if the message is still the "Sorry, Gmail is in limited test
mode..."
#
# Matthew Wong <[email protected]> 2004-04-15
#

import re
import string
import sys
import urllib

user = "(e-mail address removed)"
pswd = "maddog4096"

schm = "https://"
host = "www.google.com"
path = "/accounts/ServiceLoginBoxAuth"
qstr = {"service" : "mail", \
"continue" : "http://gmail.google.com/", \
"Email" : user, \
"Passwd" : pswd}

qstr = urllib.urlencode(qstr)

url = schm + host + path + "?" + qstr

conn = urllib.urlopen(url)

headers = conn.info().headers
response = conn.read()

thecookies = []

#
# extract all the Set-Cookie from the HTTP response header and put it
in thecookies
#

for header in headers:
matches = re.compile("^Set-Cookie: (.*)$").search(header)
if matches:
thecookies.append(matches.group(1))

#
# make sure we've grep the SID or die
#

foundsessionid = 0

for item in thecookies:
if re.compile("^SID").search(item):
foundsessionid = 1
break

if not foundsessionid:
print "> Failded to retrieve the \"SID\" cookie"
sys.exit()

#
# grep the GV cookie from the HTTP response or die
#

matches = re.compile("^\s*var cookieVal= \"(.*)\";.*",
re.M).search(response)

if matches:
thecookies.append("GV=" + matches.group(1))
else:
print "> Failed to retrieve the \"GV\" cookie"
sys.exit()

#
# dump the content of the list: thecookies
#

print "=== the content of the \"thecookies\" list ==="
print thecookies
print "\n"

#
# join the items in the "thecookies" list to
# the "mycookies" string by using the a += b
#

mycookies = ""

for item in thecookies:
mycookies += item

print "=== the content of the string: \"mycookies\" ==="
print mycookies
print "\n"

#
# join the items in the "thecookies" list to
# the "mycookies" string by using the string.join()
#

print "=== use the string.join(): \"\".join(thecookies) ==="
print "".join(thecookies)
print "\n"

#
# still got many things to do right here...
#

sys.exit()
 
M

Matthew

Hello all,

finally, i found a way to make to a+=b works,
but, i don't understand why it works... ;(

i changed the script from:

for item in thecookies:
mycookies += item

to:

for item in thecookies:
mycookies += repr(item)

and thing works fine.

i've also check the "type" of both the "item" &
"mycookies" and they both are "str".

i don't understand why i need to use the repr to
make it work...

sigh...
 
D

David Eppstein

finally, i found a way to make to a+=b works,
but, i don't understand why it works... ;(

I was seeing a lot of newline (^M) characters at the ends of the strings
you posted. When you concatenate them together then output them, the ^M
may cause the lines to overwrite each other causing you to think you're
only seeing the last one. But repr turns these characters into a
sequence of two characters, backslash followed by r. Is this your
problem? If so, maybe you want to call strip() on your strings before
concatenating or joining them?
 
J

Joe Mason

ok, seems quite ok, however, not sure why it doesn't work on
my silly Gmail script (pls refer to my script belows):

for item in thecookies:
mycookies += item

print mycookies

i have exactly 4 items in the "thecookies" list, however, when
printing out "mycookies", it just show the last item (in fact,
seems the 4 items have been overlapped each others).

I had to comment out the SID and GV tests, because they kept failing,
but then I got:

['=en_CA; Expires=Sat, 16-Apr-05 06:35:14 GMT; Path=/\r',
'Session=en_CA\r']
Session=en_CAes=Sat, 16-Apr-05 06:35:14 GMT; Path=/

Note the "\r" at the end of each cookie. That's carriage return, which
moves the cursor back to the beginning of the line. mycookies actually
contains all the data you want, but the print statement interprets the
control character so they overwrite each other.

Tip for future debugging: you don't need to do "sys.exit" at the end.
It will automatically exit for you. And if you don't do that, you can
run your script with "python -i" and get an interactive prompt when the
script is finished, so you can examine the variables directly instead of
going through print:
'=en_CA; Expires=Sat, 16-Apr-05 06:44:23 GMT; Path=/\rSession=en_CA\r'

Joe
 
M

Matthew

Hello David &Joe,

thanks very much for David's information about
the "newline" and thanks very much for Joe's tips
on sys.exit() and the "-i" parameter for debugging.

=)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top