Question about concatenation error

C

colonel

I am new to python and I am confused as to why when I try to
concatenate 3 strings, it isn't working properly.

Here is the code:

------------------------------------------------------------------------------------------
import string
import sys
import re
import urllib

linkArray = []
srcArray = []
website = sys.argv[1]

urllib.urlretrieve(website, 'getfile.txt')

filename = "getfile.txt"
input = open(filename, 'r')
reg1 = re.compile('href=".*"')
reg3 = re.compile('".*?"')
reg4 = re.compile('http')
Line = input.readline()

while Line:
searchstring1 = reg1.search(Line)
if searchstring1:
rawlink = searchstring1.group()
link = reg3.search(rawlink).group()
link2 = link.split('"')
cleanlink = link2[1:2]
fullink = reg4.search(str(cleanlink))
if fullink:
linkArray.append(cleanlink)
else:
cleanlink2 = str(website) + "/" + str(cleanlink)
linkArray.append(cleanlink2)
Line = input.readline()

print linkArray
-----------------------------------------------------------------------------------------------

I get this:

["http://www.slugnuts.com/['index.html']",
"http://www.slugnuts.com/['movies.html']",
"http://www.slugnuts.com/['ramblings.html']",
"http://www.slugnuts.com/['sluggies.html']",
"http://www.slugnuts.com/['movies.html']"]

instead of this:

["http://www.slugnuts.com/index.html]",
"http://www.slugnuts.com/movies.html]",
"http://www.slugnuts.com/ramblings.html]",
"http://www.slugnuts.com/sluggies.html]",
"http://www.slugnuts.com/movies.html]"]

The concatenation isn't working the way I expected it to. I suspect
that I am screwing up by mixing types, but I can't see where...

I would appreciate any advice or pointers.

Thanks.
 
C

colonel

I am new to python and I am confused as to why when I try to
concatenate 3 strings, it isn't working properly.

Here is the code:

------------------------------------------------------------------------------------------
import string
import sys
import re
import urllib

linkArray = []
srcArray = []
website = sys.argv[1]

urllib.urlretrieve(website, 'getfile.txt')

filename = "getfile.txt"
input = open(filename, 'r')
reg1 = re.compile('href=".*"')
reg3 = re.compile('".*?"')
reg4 = re.compile('http')
Line = input.readline()

while Line:
searchstring1 = reg1.search(Line)
if searchstring1:
rawlink = searchstring1.group()
link = reg3.search(rawlink).group()
link2 = link.split('"')
cleanlink = link2[1:2]
fullink = reg4.search(str(cleanlink))
if fullink:
linkArray.append(cleanlink)
else:
cleanlink2 = str(website) + "/" + str(cleanlink)
linkArray.append(cleanlink2)
Line = input.readline()

print linkArray
-----------------------------------------------------------------------------------------------

I get this:

["http://www.slugnuts.com/['index.html']",
"http://www.slugnuts.com/['movies.html']",
"http://www.slugnuts.com/['ramblings.html']",
"http://www.slugnuts.com/['sluggies.html']",
"http://www.slugnuts.com/['movies.html']"]

instead of this:

["http://www.slugnuts.com/index.html]",
"http://www.slugnuts.com/movies.html]",
"http://www.slugnuts.com/ramblings.html]",
"http://www.slugnuts.com/sluggies.html]",
"http://www.slugnuts.com/movies.html]"]

The concatenation isn't working the way I expected it to. I suspect
that I am screwing up by mixing types, but I can't see where...

I would appreciate any advice or pointers.

Thanks.


Okay. It works if I change:

fullink = reg4.search(str(cleanlink))
if fullink:
linkArray.append(cleanlink)
else:
cleanlink2 = str(website) + "/" + str(cleanlink)

to

fullink = reg4.search(cleanlink[0])
if fullink:
linkArray.append(cleanlink[0])
else:
cleanlink2 = str(website) + "/" + cleanlink[0]


so can anyone tell me why "cleanlink" gets coverted to a list? Is it
during the slicing?


Thanks.
 
S

Steve Holden

colonel said:
I am new to python and I am confused as to why when I try to
concatenate 3 strings, it isn't working properly.

Here is the code:

------------------------------------------------------------------------------------------
import string
import sys
import re
import urllib

linkArray = []
srcArray = []
website = sys.argv[1]

urllib.urlretrieve(website, 'getfile.txt')

filename = "getfile.txt"
input = open(filename, 'r')
reg1 = re.compile('href=".*"')
reg3 = re.compile('".*?"')
reg4 = re.compile('http')
Line = input.readline()

while Line:
searchstring1 = reg1.search(Line)
if searchstring1:
rawlink = searchstring1.group()
link = reg3.search(rawlink).group()
link2 = link.split('"')
cleanlink = link2[1:2]
fullink = reg4.search(str(cleanlink))
if fullink:
linkArray.append(cleanlink)
else:
cleanlink2 = str(website) + "/" + str(cleanlink)
linkArray.append(cleanlink2)
Line = input.readline()

print linkArray
-----------------------------------------------------------------------------------------------

I get this:

["http://www.slugnuts.com/['index.html']",
"http://www.slugnuts.com/['movies.html']",
"http://www.slugnuts.com/['ramblings.html']",
"http://www.slugnuts.com/['sluggies.html']",
"http://www.slugnuts.com/['movies.html']"]

instead of this:

["http://www.slugnuts.com/index.html]",
"http://www.slugnuts.com/movies.html]",
"http://www.slugnuts.com/ramblings.html]",
"http://www.slugnuts.com/sluggies.html]",
"http://www.slugnuts.com/movies.html]"]

The concatenation isn't working the way I expected it to. I suspect
that I am screwing up by mixing types, but I can't see where...

I would appreciate any advice or pointers.

Thanks.



Okay. It works if I change:

fullink = reg4.search(str(cleanlink))
if fullink:
linkArray.append(cleanlink)
else:
cleanlink2 = str(website) + "/" + str(cleanlink)

to

fullink = reg4.search(cleanlink[0])
if fullink:
linkArray.append(cleanlink[0])
else:
cleanlink2 = str(website) + "/" + cleanlink[0]


so can anyone tell me why "cleanlink" gets coverted to a list? Is it
during the slicing?


Thanks.

The statement

cleanlink = link2[1:2]

results in a list of one element. If you want to accesss element one
(the second in the list) then use

cleanlink = link2[1]

regards
Steve
 
T

Terry Hancock

I am new to python and I am confused as to why when I try to
concatenate 3 strings, it isn't working properly.

Here is the code:

I'm not taking the time to really study it, but at first
glance, the code looks like it's probably much more
complicated than it needs to be.

The tail end of that is the string representation of
a list containing one string, not of that string. I
suspect you needed to use ''.join() somewhere. Or,
you could, in principle have indexed the list, since
you only want one member of it, e.g.:
'index.html'
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top