spider, why isnt it finding the url?

N

notnorwegian

this program doesnt produce any output, however i know from testing
that the url-regexp matches urls...

import urllib
import re

site = urllib.urlopen("http://www.python.org")

email = re.compile(r'[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-zA-Z]{1,4}')
url = re.compile("^((ht|f)tp(s?)\:\/\/|~/|/)?([\w]+:\w+@)?([a-zA-Z]{1}
([\w\-]+\.)+([\w]{2,5})):)[\d]{1,5})?((/?\w+/)+|/?)(\w+\.[\w]{3,4})?
((\?\w+=\w+)?(&\w+=\w+)*)?")

for row in site:
obj = url.search(row)
if obj != None:
print obj.group()
 
N

notnorwegian

this program doesnt produce any output, however i know from testing
that the url-regexp matches urls...

import urllib
import re

site = urllib.urlopen("http://www.python.org")

email = re.compile(r'[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-zA-Z]{1,4}')
url = re.compile("^((ht|f)tp(s?)\:\/\/|~/|/)?([\w]+:\w+@)?([a-zA-Z]{1}
([\w\-]+\.)+([\w]{2,5})):)[\d]{1,5})?((/?\w+/)+|/?)(\w+\.[\w]{3,4})?
((\?\w+=\w+)?(&\w+=\w+)*)?")

for row in site:
obj = url.search(row)
if obj != None:
print obj.group()

hmm ok it it printing it rows per rows. not what i expected.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,049
Latest member
Allen00Reed

Latest Threads

Top