urlopen() error

T

Tempo

Hello. I am getting an error and it has gotten me stuck. I think the
best thing I can do is post my code and the error message and thank
everybody in advanced for any help that you give this issue. Thank you.

#############
Here's the code:
#############

import urllib2
import re
import xlrd
from BeautifulSoup import BeautifulSoup

book = xlrd.open_workbook("ige_virtualMoney.xls")
sh = book.sheet_by_index(0)
rx = 1
for rx in range(sh.nrows):
u = sh.cell_value(rx, 0)
page = urllib2.urlopen(u)
soup = BeautifulSoup(page)
p = soup.findAll('span', "sale")
p = str(p)
p2 = re.findall('\$\d+\.\d\d', p)
for price in p2:
print price

######################
Here are the error messages:
######################

Traceback (most recent call last):
File "E:\Python24\scraper.py", line 16, in -toplevel-
page = urllib2.urlopen(u)
File "E:\Python24\lib\urllib2.py", line 130, in urlopen
return _opener.open(url, data)
File "E:\Python24\lib\urllib2.py", line 350, in open
protocol = req.get_type()
File "E:\Python24\lib\urllib2.py", line 233, in get_type
raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: List
 
R

Rafal Zawadzki

Tempo said:
Hello. I am getting an error and it has gotten me stuck. I think the
best thing I can do is post my code and the error message and thank
everybody in advanced for any help that you give this issue. Thank you.

#############
Here's the code:
#############

import urllib2
import re
import xlrd
from BeautifulSoup import BeautifulSoup

book = xlrd.open_workbook("ige_virtualMoney.xls")
sh = book.sheet_by_index(0)
rx = 1
for rx in range(sh.nrows):
u = sh.cell_value(rx, 0)
page = urllib2.urlopen(u)
soup = BeautifulSoup(page)
p = soup.findAll('span', "sale")
p = str(p)
p2 = re.findall('\$\d+\.\d\d', p)
for price in p2:
print price
ValueError: unknown url type: List
^^^^^^^^^^^^^^^^^^^^^^

I don't xlrd, but:
http://docs.python.org/lib/module-urllib2.html
urlopen( url[, data])
Open the URL url, which can be either a string or a Request object.
data should be a string, which specifies additional data to send to the
server. In HTTP requests, which are the only ones that support data, it
should be a buffer in the format of application/x-www-form-urlencoded, for
example one returned from urllib.urlencode().

What is your _u_?
 
P

Paul McNett

Tempo said:
Hello. I am getting an error and it has gotten me stuck. I think the
best thing I can do is post my code and the error message and thank
everybody in advanced for any help that you give this issue. Thank you.

#############
Here's the code:
#############

import urllib2
import re
import xlrd
from BeautifulSoup import BeautifulSoup

book = xlrd.open_workbook("ige_virtualMoney.xls")
sh = book.sheet_by_index(0)
rx = 1
for rx in range(sh.nrows):
u = sh.cell_value(rx, 0)
page = urllib2.urlopen(u)
soup = BeautifulSoup(page)
p = soup.findAll('span', "sale")
p = str(p)
p2 = re.findall('\$\d+\.\d\d', p)
for price in p2:
print price

######################
Here are the error messages:
######################

Traceback (most recent call last):
File "E:\Python24\scraper.py", line 16, in -toplevel-
page = urllib2.urlopen(u)
File "E:\Python24\lib\urllib2.py", line 130, in urlopen
return _opener.open(url, data)
File "E:\Python24\lib\urllib2.py", line 350, in open
protocol = req.get_type()
File "E:\Python24\lib\urllib2.py", line 233, in get_type
raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: List

You were expecting u to be a url string like "http://google.com", but it
looks like it is actually a list. I'm not familiar with package xlrd but
cell_value() must be returning a list and not a cell value. Presumably,
the list contains the cell value probably in element 0. Put in a print
statement before your call to urlopen() like:

print u

You'll likely discover your error.
 
J

John Machin

The above 2 lines should probably be:
for rx.range(1, sh.nrows):
otherwise the likelihood is that a column heading will be treated as
data.
Now read on ;-)
You were expecting u to be a url string like "http://google.com", but it
looks like it is actually a list. I'm not familiar with package xlrd but
cell_value() must be returning a list and not a cell value. Presumably,
the list contains the cell value probably in element 0. Put in a print
statement before your call to urlopen() like:

print u

Sage advice. print repr(u) is in general even better advice.
You'll likely discover your error.

Just for the record:

1. The xlrd package's Book.Sheet.cell_value() does *not* return lists.
As its docs say, it returns scalars, of the following types: unicode,
int, float, strg

2. The error is nothing to do with Python lists, it's all about
malformed URLs. "unknown url type" means it's not one of http, ftp,
file, data, gopher, ...

|>>> x = urllib2.urlopen('List')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "C:\Python24\lib\urllib2.py", line 130, in urlopen
return _opener.open(url, data)
File "C:\Python24\lib\urllib2.py", line 350, in open
protocol = req.get_type()
File "C:\Python24\lib\urllib2.py", line 233, in get_type
raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: List

|>>> x = urllib2.urlopen('GOTCHA')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "C:\Python24\lib\urllib2.py", line 130, in urlopen
return _opener.open(url, data)
File "C:\Python24\lib\urllib2.py", line 350, in open
protocol = req.get_type()
File "C:\Python24\lib\urllib2.py", line 233, in get_type
raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: GOTCHA
|>>>

HTH,
John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,898
Latest member
BlairH7607

Latest Threads

Top