download all mib files from a web page


P

powah

I want to download all mib files from the web page:
http://www.juniper.net/techpubs/sof...t/juniper-specific-mibs-junos-nm.html#jN18E19

All mib filenames are of this format:
www.juniper.net/techpubs ... .txt

I write this program but has the following error.
Please help.
Thanks.

Code:
#!/usr/bin/env python
import urllib2,os,urlparse
url="http://www.juniper.net/techpubs/software/junos/junos94/swconfig-
net-mgmt/juniper-specific-mibs-junos-nm.html#jN18E19"
page=urllib2.urlopen(url)
f=0
links=[]
data=page.read().split("\n")
for item in data:
    if "www.juniper.net/techpubs" in item:
        httpind=item.index("www.juniper.net/techpubs")
        item=item[httpind:]
        #print "item " + item
        ind=item.index("<")
        links.append(item[:ind]) #grab all links
# download all links
for link in links:
    print "link " + link
    filename=link.split("/")[-1]
    print "downloading ... " + filename
    u=urllib2.urlopen(link)
    p=u.read()
    open(filename,"w").write(p)

$ ~/python/downloadjuniper.py
link www.juniper.net/techpubs/software/junos/junos94/swconfig-net-mgmt/mib-jnx-user-aaa.txt
downloading ... mib-jnx-user-aaa.txt
Traceback (most recent call last):
File "/home/powah/python/downloadjuniper.py", line 20, in ?
u=urllib2.urlopen(link)
File "/usr/lib/python2.4/urllib2.py", line 130, in urlopen
return _opener.open(url, data)
File "/usr/lib/python2.4/urllib2.py", line 350, in open
protocol = req.get_type()
File "/usr/lib/python2.4/urllib2.py", line 233, in get_type
raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type:
www.juniper.net/techpubs/software/junos/junos94/swconfig-net-mgmt/mib-jnx-user-aaa.txt



$ python
Python 2.4.4 (#1, Oct 23 2006, 13:58:00)
[GCC 4.1.1 20061011 (Red Hat 4.1.1-30)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
My computer is FC6 linux.
 
Ad

Advertisements

C

Chris Rebert

I want to download all mib files from the web page:
http://www.juniper.net/techpubs/sof...t/juniper-specific-mibs-junos-nm.html#jN18E19

All mib filenames are of this format:
www.juniper.net/techpubs ... .txt

I write this program but has the following error.
Please help.
Thanks.

Code:
#!/usr/bin/env python
import urllib2,os,urlparse
url="http://www.juniper.net/techpubs/software/junos/junos94/swconfig-
net-mgmt/juniper-specific-mibs-junos-nm.html#jN18E19"
page=urllib2.urlopen(url)
f=0
links=[]
data=page.read().split("\n")
for item in data:
   if "www.juniper.net/techpubs" in item:
       httpind=item.index("www.juniper.net/techpubs")
       item=item[httpind:]
       #print "item " + item
       ind=item.index("<")
       links.append(item[:ind]) #grab all links
# download all links
for link in links:
   print "link " + link
   filename=link.split("/")[-1]
   print "downloading ... " + filename
   u=urllib2.urlopen(link)
   p=u.read()
   open(filename,"w").write(p)

$ ~/python/downloadjuniper.py
link www.juniper.net/techpubs/software/junos/junos94/swconfig-net-mgmt/mib-jnx-user-aaa.txt
downloading ... mib-jnx-user-aaa.txt
Traceback (most recent call last):
 File "/home/powah/python/downloadjuniper.py", line 20, in ?
   u=urllib2.urlopen(link)
 File "/usr/lib/python2.4/urllib2.py", line 130, in urlopen
   return _opener.open(url, data)
 File "/usr/lib/python2.4/urllib2.py", line 350, in open
   protocol = req.get_type()
 File "/usr/lib/python2.4/urllib2.py", line 233, in get_type
   raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type:
www.juniper.net/techpubs/software/junos/junos94/swconfig-net-mgmt/mib-jnx-user-aaa.txt

You need to ensure that all URL strings include the protocol to use,
i.e. "http://"

Cheers,
Chris
 
J

Jeff McNeil

I want to download all mib files from the web page:http://www.juniper.net/techpubs/software/junos/junos94/swconfig-net-m...

All mib filenames are of this format:www.juniper.net/techpubs... .txt

I write this program but has the following error.
Please help.
Thanks.

Code:
#!/usr/bin/env python
import urllib2,os,urlparse
url="http://www.juniper.net/techpubs/software/junos/junos94/swconfig-
net-mgmt/juniper-specific-mibs-junos-nm.html#jN18E19"
page=urllib2.urlopen(url)
f=0
links=[]
data=page.read().split("\n")
for item in data:
    if "www.juniper.net/techpubs" in item:
        httpind=item.index("www.juniper.net/techpubs")
        item=item[httpind:]
        #print "item " + item
        ind=item.index("<")
        links.append(item[:ind]) #grab all links
# download all links
for link in links:
    print "link " + link
    filename=link.split("/")[-1]
    print "downloading ... " + filename
    u=urllib2.urlopen(link)
    p=u.read()
    open(filename,"w").write(p)

$ ~/python/downloadjuniper.py
linkwww.juniper.net/techpubs/software/junos/junos94/swconfig-net-mgmt/mib....
downloading ... mib-jnx-user-aaa.txt
Traceback (most recent call last):
  File "/home/powah/python/downloadjuniper.py", line 20, in ?
    u=urllib2.urlopen(link)
  File "/usr/lib/python2.4/urllib2.py", line 130, in urlopen
    return _opener.open(url, data)
  File "/usr/lib/python2.4/urllib2.py", line 350, in open
    protocol = req.get_type()
  File "/usr/lib/python2.4/urllib2.py", line 233, in get_type
    raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type:www.juniper.net/techpubs/software/junos/junos94/swconfig-net-mgmt/mib...

$ python
Python 2.4.4 (#1, Oct 23 2006, 13:58:00)
[GCC 4.1.1 20061011 (Red Hat 4.1.1-30)] on linux2
Type "help", "copyright", "credits" or "license" for more information.



My computer is FC6 linux.

There's only a couple dozen of them, right-click->Save As. I'm sure
Juniper would appreciate that much more than an automated crawler.

As far as your ValueError is concerned, consider that
'www.juniper.com' doesn't start with a protocol specification when
passed into urllib2.urlopen.

-Jeff
mcjeff.blogspot.com
 
P

powah

All mib filenames are of this format:www.juniper.net/techpubs... .txt
I write this program but has the following error.
Please help.
Thanks.
Code:
#!/usr/bin/env python
import urllib2,os,urlparse
url="http://www.juniper.net/techpubs/software/junos/junos94/swconfig-
net-mgmt/juniper-specific-mibs-junos-nm.html#jN18E19"
page=urllib2.urlopen(url)
f=0
links=[]
data=page.read().split("\n")
for item in data:
    if "www.juniper.net/techpubs" in item:
        httpind=item.index("www.juniper.net/techpubs")
        item=item[httpind:]
        #print "item " + item
        ind=item.index("<")
        links.append(item[:ind]) #grab all links
# download all links
for link in links:
    print "link " + link
    filename=link.split("/")[-1]
    print "downloading ... " + filename
    u=urllib2.urlopen(link)
    p=u.read()
    open(filename,"w").write(p)
$ ~/python/downloadjuniper.py
linkwww.juniper.net/techpubs/software/junos/junos94/swconfig-net-mgmt/mib...
downloading ... mib-jnx-user-aaa.txt
Traceback (most recent call last):
  File "/home/powah/python/downloadjuniper.py", line 20, in ?
    u=urllib2.urlopen(link)
  File "/usr/lib/python2.4/urllib2.py", line 130, in urlopen
    return _opener.open(url, data)
  File "/usr/lib/python2.4/urllib2.py", line 350, in open
    protocol = req.get_type()
  File "/usr/lib/python2.4/urllib2.py", line 233, in get_type
    raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type:www.juniper.net/techpubs/software/junos/junos94/swconfig-net-mgmt/mib...
$ python
Python 2.4.4 (#1, Oct 23 2006, 13:58:00)
[GCC 4.1.1 20061011 (Red Hat 4.1.1-30)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
My computer is FC6 linux.

There's only a couple dozen of them, right-click->Save As. I'm sure
Juniper would appreciate that much more than an automated crawler.

As far as your ValueError is concerned, consider that
'www.juniper.com'doesn't start with a protocol specification when
passed into urllib2.urlopen.

-Jeff
mcjeff.blogspot.com

Juniper's web page is simple, I am learning python so as to download
files from more complex web page and do other things as well.
 
P

powah

I want to download all mib files from the web page:http://www.juniper.net/techpubs/software/junos/junos94/swconfig-net-m...

All mib filenames are of this format:www.juniper.net/techpubs... .txt

I write this program but has the following error.
Please help.
Thanks.

Code:
#!/usr/bin/env python
import urllib2,os,urlparse
url="http://www.juniper.net/techpubs/software/junos/junos94/swconfig-
net-mgmt/juniper-specific-mibs-junos-nm.html#jN18E19"
page=urllib2.urlopen(url)
f=0
links=[]
data=page.read().split("\n")
for item in data:
    if "www.juniper.net/techpubs" in item:
        httpind=item.index("www.juniper.net/techpubs")
        item=item[httpind:]
        #print "item " + item
        ind=item.index("<")
        links.append(item[:ind]) #grab all links
# download all links
for link in links:
    print "link " + link
    filename=link.split("/")[-1]
    print "downloading ... " + filename
    u=urllib2.urlopen(link)
    p=u.read()
    open(filename,"w").write(p)

$ ~/python/downloadjuniper.py
linkwww.juniper.net/techpubs/software/junos/junos94/swconfig-net-mgmt/mib....
downloading ... mib-jnx-user-aaa.txt
Traceback (most recent call last):
  File "/home/powah/python/downloadjuniper.py", line 20, in ?
    u=urllib2.urlopen(link)
  File "/usr/lib/python2.4/urllib2.py", line 130, in urlopen
    return _opener.open(url, data)
  File "/usr/lib/python2.4/urllib2.py", line 350, in open
    protocol = req.get_type()
  File "/usr/lib/python2.4/urllib2.py", line 233, in get_type
    raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type:www.juniper.net/techpubs/software/junos/junos94/swconfig-net-mgmt/mib...

$ python
Python 2.4.4 (#1, Oct 23 2006, 13:58:00)
[GCC 4.1.1 20061011 (Red Hat 4.1.1-30)] on linux2
Type "help", "copyright", "credits" or "license" for more information.



My computer is FC6 linux.

I fixed one error, now if the filename is misspelled, how to ignore
the error and continue?
Code:
#!/usr/bin/env python
import urllib2,os,urlparse
url="http://www.juniper.net/techpubs/software/junos/junos94/swconfig-
net-mgmt/juniper-specific-mibs-junos-nm.html#jN18E19"
page=urllib2.urlopen(url)
f=0
links=[]
data=page.read().split("\n")
for item in data:
    if "www.juniper.net/techpubs" in item:
        httpind=item.index("www.juniper.net/techpubs")
        item=item[httpind:]
        #print "item " + item
        ind=item.index(".txt") + 4
        links.append(item[:ind]) #grab all links
# download all links
for link in links:
    filename=link.split("/")[-1]
    link = "http://" + link
    print "link " + link
    print "downloading ... " + filename
    u=urllib2.urlopen(link)
    p=u.read()
    open(filename,"w").write(p)

$ ~/python/downloadjuniper_onepage.py
link http://www.juniper.net/techpubs/software/junos/junos94/swconfig-net-mgmt/mib-jnx-virtual-chassis.txt
downloading ... mib-jnx-virtual-chassis.txt
Traceback (most recent call last):
File "/home/powah/python/downloadjuniper_onepage.py", line 7, in ?
u=urllib2.urlopen(link)
File "/usr/lib/python2.4/urllib2.py", line 130, in urlopen
return _opener.open(url, data)
File "/usr/lib/python2.4/urllib2.py", line 364, in open
response = meth(req, response)
File "/usr/lib/python2.4/urllib2.py", line 471, in http_response
response = self.parent.error(
File "/usr/lib/python2.4/urllib2.py", line 402, in error
return self._call_chain(*args)
File "/usr/lib/python2.4/urllib2.py", line 337, in _call_chain
result = func(*args)
File "/usr/lib/python2.4/urllib2.py", line 480, in
http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found
 
C

Chris Rebert

I want to download all mib files from the web page:http://www.juniper.net/techpubs/software/junos/junos94/swconfig-net-m...

All mib filenames are of this format:www.juniper.net/techpubs... .txt

I write this program but has the following error.
Please help.
Thanks.

Code:
#!/usr/bin/env python
import urllib2,os,urlparse
url="http://www.juniper.net/techpubs/software/junos/junos94/swconfig-
net-mgmt/juniper-specific-mibs-junos-nm.html#jN18E19"
page=urllib2.urlopen(url)
f=0
links=[]
data=page.read().split("\n")
for item in data:
    if "www.juniper.net/techpubs" in item:
        httpind=item.index("www.juniper.net/techpubs")
        item=item[httpind:]
        #print "item " + item
        ind=item.index("<")
        links.append(item[:ind]) #grab all links
# download all links
for link in links:
    print "link " + link
    filename=link.split("/")[-1]
    print "downloading ... " + filename
    u=urllib2.urlopen(link)
    p=u.read()
    open(filename,"w").write(p)

$ ~/python/downloadjuniper.py
linkwww.juniper.net/techpubs/software/junos/junos94/swconfig-net-mgmt/mib...
downloading ... mib-jnx-user-aaa.txt
Traceback (most recent call last):
  File "/home/powah/python/downloadjuniper.py", line 20, in ?
    u=urllib2.urlopen(link)
  File "/usr/lib/python2.4/urllib2.py", line 130, in urlopen
    return _opener.open(url, data)
  File "/usr/lib/python2.4/urllib2.py", line 350, in open
    protocol = req.get_type()
  File "/usr/lib/python2.4/urllib2.py", line 233, in get_type
    raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type:www.juniper.net/techpubs/software/junos/junos94/swconfig-net-mgmt/mib...

$ python
Python 2.4.4 (#1, Oct 23 2006, 13:58:00)
[GCC 4.1.1 20061011 (Red Hat 4.1.1-30)] on linux2
Type "help", "copyright", "credits" or "license" for more information.



My computer is FC6 linux.

I fixed one error, now if the filename is misspelled, how to ignore
the error and continue?

Read the fine tutorial: http://docs.python.org/tutorial/errors.html

Cheers,
Chris
 
Ad

Advertisements

P

powah

You really should go through the tutorial.  It will explain this and
other important things well.  But, since I'm feeling generous:

Replace this:>     u=urllib2.urlopen(link)

with this:
       try:
           u = urllib2.urlopen(link)
           p = u.read()
       except urllib2.HTTPError:
           pass
       else:
           dest = open(filename, "w")
           dest.write(p)
           dest.close()

--Scott David Daniels
(e-mail address removed)

Thanks!
 

Top