urlopen() error

Discussion in 'Python' started by Tempo, Sep 8, 2006.

  1. Tempo

    Tempo Guest

    Hello. I am getting an error and it has gotten me stuck. I think the
    best thing I can do is post my code and the error message and thank
    everybody in advanced for any help that you give this issue. Thank you.

    #############
    Here's the code:
    #############

    import urllib2
    import re
    import xlrd
    from BeautifulSoup import BeautifulSoup

    book = xlrd.open_workbook("ige_virtualMoney.xls")
    sh = book.sheet_by_index(0)
    rx = 1
    for rx in range(sh.nrows):
    u = sh.cell_value(rx, 0)
    page = urllib2.urlopen(u)
    soup = BeautifulSoup(page)
    p = soup.findAll('span', "sale")
    p = str(p)
    p2 = re.findall('\$\d+\.\d\d', p)
    for price in p2:
    print price

    ######################
    Here are the error messages:
    ######################

    Traceback (most recent call last):
    File "E:\Python24\scraper.py", line 16, in -toplevel-
    page = urllib2.urlopen(u)
    File "E:\Python24\lib\urllib2.py", line 130, in urlopen
    return _opener.open(url, data)
    File "E:\Python24\lib\urllib2.py", line 350, in open
    protocol = req.get_type()
    File "E:\Python24\lib\urllib2.py", line 233, in get_type
    raise ValueError, "unknown url type: %s" % self.__original
    ValueError: unknown url type: List
    Tempo, Sep 8, 2006
    #1
    1. Advertising

  2. Tempo wrote:

    > Hello. I am getting an error and it has gotten me stuck. I think the
    > best thing I can do is post my code and the error message and thank
    > everybody in advanced for any help that you give this issue. Thank you.
    >
    > #############
    > Here's the code:
    > #############
    >
    > import urllib2
    > import re
    > import xlrd
    > from BeautifulSoup import BeautifulSoup
    >
    > book = xlrd.open_workbook("ige_virtualMoney.xls")
    > sh = book.sheet_by_index(0)
    > rx = 1
    > for rx in range(sh.nrows):
    > u = sh.cell_value(rx, 0)
    > page = urllib2.urlopen(u)
    > soup = BeautifulSoup(page)
    > p = soup.findAll('span', "sale")
    > p = str(p)
    > p2 = re.findall('\$\d+\.\d\d', p)
    > for price in p2:
    > print price


    > ValueError: unknown url type: List

    ^^^^^^^^^^^^^^^^^^^^^^

    I don't xlrd, but:
    http://docs.python.org/lib/module-urllib2.html
    urlopen( url[, data])
    Open the URL url, which can be either a string or a Request object.
    data should be a string, which specifies additional data to send to the
    server. In HTTP requests, which are the only ones that support data, it
    should be a buffer in the format of application/x-www-form-urlencoded, for
    example one returned from urllib.urlencode().

    What is your _u_?
    --
    Rafał Zawadzki [jid/mail: , skype: blvszcz]
    http://glam.pl - używane ciuchy, vintage, secondhand
    http://bluszcz.net - moja strona domowa
    Rafal Zawadzki, Sep 8, 2006
    #2
    1. Advertising

  3. Tempo

    Paul McNett Guest

    Tempo wrote:
    > Hello. I am getting an error and it has gotten me stuck. I think the
    > best thing I can do is post my code and the error message and thank
    > everybody in advanced for any help that you give this issue. Thank you.
    >
    > #############
    > Here's the code:
    > #############
    >
    > import urllib2
    > import re
    > import xlrd
    > from BeautifulSoup import BeautifulSoup
    >
    > book = xlrd.open_workbook("ige_virtualMoney.xls")
    > sh = book.sheet_by_index(0)
    > rx = 1
    > for rx in range(sh.nrows):
    > u = sh.cell_value(rx, 0)
    > page = urllib2.urlopen(u)
    > soup = BeautifulSoup(page)
    > p = soup.findAll('span', "sale")
    > p = str(p)
    > p2 = re.findall('\$\d+\.\d\d', p)
    > for price in p2:
    > print price
    >
    > ######################
    > Here are the error messages:
    > ######################
    >
    > Traceback (most recent call last):
    > File "E:\Python24\scraper.py", line 16, in -toplevel-
    > page = urllib2.urlopen(u)
    > File "E:\Python24\lib\urllib2.py", line 130, in urlopen
    > return _opener.open(url, data)
    > File "E:\Python24\lib\urllib2.py", line 350, in open
    > protocol = req.get_type()
    > File "E:\Python24\lib\urllib2.py", line 233, in get_type
    > raise ValueError, "unknown url type: %s" % self.__original
    > ValueError: unknown url type: List


    You were expecting u to be a url string like "http://google.com", but it
    looks like it is actually a list. I'm not familiar with package xlrd but
    cell_value() must be returning a list and not a cell value. Presumably,
    the list contains the cell value probably in element 0. Put in a print
    statement before your call to urlopen() like:

    print u

    You'll likely discover your error.

    --
    Paul McNett
    http://paulmcnett.com
    http://dabodev.com
    Paul McNett, Sep 9, 2006
    #3
  4. Tempo

    John Machin Guest

    Paul McNett wrote:
    > Tempo wrote:
    > > Hello. I am getting an error and it has gotten me stuck. I think the
    > > best thing I can do is post my code and the error message and thank
    > > everybody in advanced for any help that you give this issue. Thank you.
    > >
    > > #############
    > > Here's the code:
    > > #############
    > >
    > > import urllib2
    > > import re
    > > import xlrd
    > > from BeautifulSoup import BeautifulSoup
    > >
    > > book = xlrd.open_workbook("ige_virtualMoney.xls")
    > > sh = book.sheet_by_index(0)
    > > rx = 1
    > > for rx in range(sh.nrows):


    The above 2 lines should probably be:
    for rx.range(1, sh.nrows):
    otherwise the likelihood is that a column heading will be treated as
    data.
    Now read on ;-)

    > > u = sh.cell_value(rx, 0)
    > > page = urllib2.urlopen(u)
    > > soup = BeautifulSoup(page)
    > > p = soup.findAll('span', "sale")
    > > p = str(p)
    > > p2 = re.findall('\$\d+\.\d\d', p)
    > > for price in p2:
    > > print price
    > >
    > > ######################
    > > Here are the error messages:
    > > ######################
    > >
    > > Traceback (most recent call last):
    > > File "E:\Python24\scraper.py", line 16, in -toplevel-
    > > page = urllib2.urlopen(u)
    > > File "E:\Python24\lib\urllib2.py", line 130, in urlopen
    > > return _opener.open(url, data)
    > > File "E:\Python24\lib\urllib2.py", line 350, in open
    > > protocol = req.get_type()
    > > File "E:\Python24\lib\urllib2.py", line 233, in get_type
    > > raise ValueError, "unknown url type: %s" % self.__original
    > > ValueError: unknown url type: List

    >
    > You were expecting u to be a url string like "http://google.com", but it
    > looks like it is actually a list. I'm not familiar with package xlrd but
    > cell_value() must be returning a list and not a cell value. Presumably,
    > the list contains the cell value probably in element 0. Put in a print
    > statement before your call to urlopen() like:
    >
    > print u


    Sage advice. print repr(u) is in general even better advice.

    >
    > You'll likely discover your error.
    >


    Just for the record:

    1. The xlrd package's Book.Sheet.cell_value() does *not* return lists.
    As its docs say, it returns scalars, of the following types: unicode,
    int, float, strg

    2. The error is nothing to do with Python lists, it's all about
    malformed URLs. "unknown url type" means it's not one of http, ftp,
    file, data, gopher, ...

    |>>> x = urllib2.urlopen('List')
    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    File "C:\Python24\lib\urllib2.py", line 130, in urlopen
    return _opener.open(url, data)
    File "C:\Python24\lib\urllib2.py", line 350, in open
    protocol = req.get_type()
    File "C:\Python24\lib\urllib2.py", line 233, in get_type
    raise ValueError, "unknown url type: %s" % self.__original
    ValueError: unknown url type: List

    |>>> x = urllib2.urlopen('GOTCHA')
    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    File "C:\Python24\lib\urllib2.py", line 130, in urlopen
    return _opener.open(url, data)
    File "C:\Python24\lib\urllib2.py", line 350, in open
    protocol = req.get_type()
    File "C:\Python24\lib\urllib2.py", line 233, in get_type
    raise ValueError, "unknown url type: %s" % self.__original
    ValueError: unknown url type: GOTCHA
    |>>>

    HTH,
    John
    John Machin, Sep 15, 2006
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Xu, C.S.
    Replies:
    5
    Views:
    462
    John J. Lee
    Sep 17, 2003
  2. John F Dutcher

    urllib2.urlopen(req) error........

    John F Dutcher, Jun 4, 2004, in forum: Python
    Replies:
    2
    Views:
    963
    John F Dutcher
    Jun 7, 2004
  3. Matt
    Replies:
    0
    Views:
    2,678
  4. Chris
    Replies:
    0
    Views:
    1,034
    Chris
    Jul 10, 2005
  5. Mark Devine
    Replies:
    2
    Views:
    1,065
    amadain
    Jun 29, 2009
Loading...

Share This Page