spider, why isnt it finding the url?

notnorwegian · May 23, 2008

this program doesnt produce any output, however i know from testing
that the url-regexp matches urls...

import urllib
import re

site = urllib.urlopen("http://www.python.org")

email = re.compile(r'[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-zA-Z]{1,4}')
url = re.compile("^((ht|f)tp(s?)\:\/\/|~/|/)?([\w]+:\w+@)?([a-zA-Z]{1}
([\w\-]+\.)+([\w]{2,5}))

[\d]{1,5})?((/?\w+/)+|/?)(\w+\.[\w]{3,4})?
((\?\w+=\w+)?(&\w+=\w+)*)?")

for row in site:
obj = url.search(row)
if obj != None:
print obj.group()

notnorwegian · May 23, 2008

this program doesnt produce any output, however i know from testing
that the url-regexp matches urls...

import urllib
import re

site = urllib.urlopen("http://www.python.org")

email = re.compile(r'[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-zA-Z]{1,4}')
url = re.compile("^((ht|f)tp(s?)\:\/\/|~/|/)?([\w]+:\w+@)?([a-zA-Z]{1}
([\w\-]+\.)+([\w]{2,5}))[\d]{1,5})?((/?\w+/)+|/?)(\w+\.[\w]{3,4})?
((\?\w+=\w+)?(&\w+=\w+)*)?")

for row in site:
obj = url.search(row)
if obj != None:
print obj.group()

hmm ok it it printing it rows per rows. not what i expected.

webspider, regexp not working, why?	2	May 23, 2008
simple url regexp	1	May 24, 2008
FLV download script works, but I want to enhance it	3	May 6, 2009
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
matching a sentence, greedy up!	1	Aug 10, 2003
ANN: 'rex', a module for easy creation and use of regular expressions	0	Jun 10, 2004
Can't make this page work	6	Mar 8, 2006
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006

spider, why isnt it finding the url?

notnorwegian

notnorwegian

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads