Instead of saving text files i need as html

S

Shani

I have the following code which takes a list of urls
"http://google.com", without the quotes ofcourse, and then saves there
source code as a text file. I wan to alter the code so that for the
list of URLs an html file is saved.

-----begin-----
import urllib
urlfile = open(r'c:\temp\url.txt', 'r')
for lines in urlfile:
try:
outfilename = lines.replace('/', '-')
urllib.urlretrieve(lines.strip('/n'), 'c:\\temp\\' \
+ outfilename.strip('\n')[7:] + '.txt')
except:
pass
-----end-----
 
L

Larry Bates

Then just write HTML around your list. I would guess
you want them inside a table. Just write appropriate
HTML tags before/after the urls. If you want the URLs
to be clickable make them in into <a href>url</a> lines.

-Larry Bates
 
3

3c273

Shani said:
I have the following code which takes a list of urls
"http://google.com", without the quotes ofcourse, and then saves there
source code as a text file. I wan to alter the code so that for the
list of URLs an html file is saved.

-----begin-----
import urllib
urlfile = open(r'c:\temp\url.txt', 'r')
for lines in urlfile:
try:
outfilename = lines.replace('/', '-')
urllib.urlretrieve(lines.strip('/n'), 'c:\\temp\\' \
+ outfilename.strip('\n')[7:] + '.txt')
except:
pass
-----end-----

Is this what you mean?

-----begin-----
import urllib
urlfile = open(r'c:\temp\url.txt', 'r')
for lines in urlfile:
try:
outfilename = lines.replace('/', '-')
urllib.urlretrieve(lines.strip('/n'), 'c:\\temp\\' \
+ outfilename.strip('\n')[7:] + '.html')
except:
pass
-----end-----
Louis
 
T

Tim Chase

Is this what you mean?
-----begin-----
import urllib
urlfile = open(r'c:\temp\url.txt', 'r')
for lines in urlfile:
try:
outfilename = lines.replace('/', '-')
urllib.urlretrieve(lines.strip('/n'), 'c:\\temp\\' \
+ outfilename.strip('\n')[7:] + '.html')
except:
pass
-----end-----

[laughs] I suspect the urlretrieve line should contain
"strip('\n')" instead of "strip('/n')", but otherwise, the
original code looked pretty kosher. I'm not sure what the odd
slicing is for, but I'll presume the OP knows what they're doing.

While not a python solution, the standard *nix tool would be
either wget or curl:

bash> wget -i listofurls.txt

which is freely available with the Cygwin suite of GNU tools for
Win32 platforms.

-tkc
 
3

3c273

Shani said:
I have the following code which takes a list of urls
"http://google.com", without the quotes ofcourse, and then saves there
source code as a text file. I wan to alter the code so that for the
list of URLs an html file is saved.

-----begin-----
import urllib
urlfile = open(r'c:\temp\url.txt', 'r')
for lines in urlfile:
try:
outfilename = lines.replace('/', '-')
urllib.urlretrieve(lines.strip('/n'), 'c:\\temp\\' \
+ outfilename.strip('\n')[7:] + '.txt')
except:
pass
-----end-----
Or is this what you mean?
-----begin-----
import urllib
urlfile = open('c:\\temp\\url.txt', 'r')
newurlfile = open('c:\\temp\\newurls.html', 'w')
newurlfile.write('<html> \n<body>\n')
for lines in urlfile:
try:
if lines == '\n':
pass
else:
lines = '<a href="' + lines.strip() +'">'\
+ lines.strip() + '</a>' + '<br>\n'
newurlfile.write(lines)
except:
pass
newurlfile.write('</body> \n</html>')
urlfile.close()
newurlfile.close()
-----end-----
Louis
 
3

3c273

3c273 said:
Or is this what you mean?
-----begin-----
import urllib
urlfile = open('c:\\temp\\url.txt', 'r')
newurlfile = open('c:\\temp\\newurls.html', 'w')
newurlfile.write('<html> \n<body>\n')
for lines in urlfile:
try:
if lines == '\n':
pass
else:
lines = '<a href="' + lines.strip() +'">'\
+ lines.strip() + '</a>' + '<br>\n'
newurlfile.write(lines)
except:
pass
newurlfile.write('</body> \n</html>')
urlfile.close()
newurlfile.close()
-----end-----
Louis
Oops, I guess we don't need "import urllib" anymore.
Louis
 
B

bruno at modulix

Shani said:
I have the following code which takes a list of urls
"http://google.com", without the quotes ofcourse, and then saves there
source code as a text file. I wan to alter the code so that for the
list of URLs an html file is saved.

What you write in a text file is up to you - and AFAICT, HTML is still
a text format.
 
S

Sion Arrowsmith

Tim Chase said:
[ ... ]
urllib.urlretrieve(lines.strip('/n'), 'c:\\temp\\' \
+ outfilename.strip('\n')[7:] + '.html')
[ ... ] I'm not sure what the odd
slicing is for, but I'll presume the OP knows what they're doing.

It's taking the "http://" off the front of the URL.
7
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,682
Members
48,796
Latest member
Greg L.

Latest Threads

Top