Instead of saving text files i need as html

Shani · Jun 8, 2006

I have the following code which takes a list of urls
"http://google.com", without the quotes ofcourse, and then saves there
source code as a text file. I wan to alter the code so that for the
list of URLs an html file is saved.

-----begin-----
import urllib
urlfile = open(r'c:\temp\url.txt', 'r')
for lines in urlfile:
try:
outfilename = lines.replace('/', '-')
urllib.urlretrieve(lines.strip('/n'), 'c:\\temp\\' \
+ outfilename.strip('\n')[7:] + '.txt')
except:
pass
-----end-----

Larry Bates · Jun 8, 2006

Then just write HTML around your list. I would guess
you want them inside a table. Just write appropriate
HTML tags before/after the urls. If you want the URLs
to be clickable make them in into <a href>url</a> lines.

-Larry Bates

3c273 · Jun 8, 2006

Shani said:
I have the following code which takes a list of urls
"http://google.com", without the quotes ofcourse, and then saves there
source code as a text file. I wan to alter the code so that for the
list of URLs an html file is saved.

-----begin-----
import urllib
urlfile = open(r'c:\temp\url.txt', 'r')
for lines in urlfile:
try:
outfilename = lines.replace('/', '-')
urllib.urlretrieve(lines.strip('/n'), 'c:\\temp\\' \
+ outfilename.strip('\n')[7:] + '.txt')
except:
pass
-----end-----

Is this what you mean?

-----begin-----
import urllib
urlfile = open(r'c:\temp\url.txt', 'r')
for lines in urlfile:
try:
outfilename = lines.replace('/', '-')
urllib.urlretrieve(lines.strip('/n'), 'c:\\temp\\' \
+ outfilename.strip('\n')[7:] + '.html')
except:
pass
-----end-----
Louis

Tim Chase · Jun 8, 2006

Is this what you mean?

-----begin-----
import urllib
urlfile = open(r'c:\temp\url.txt', 'r')
for lines in urlfile:
try:
outfilename = lines.replace('/', '-')
urllib.urlretrieve(lines.strip('/n'), 'c:\\temp\\' \
+ outfilename.strip('\n')[7:] + '.html')
except:
pass
-----end-----

[laughs] I suspect the urlretrieve line should contain
"strip('\n')" instead of "strip('/n')", but otherwise, the
original code looked pretty kosher. I'm not sure what the odd
slicing is for, but I'll presume the OP knows what they're doing.

While not a python solution, the standard *nix tool would be
either wget or curl:

bash> wget -i listofurls.txt

which is freely available with the Cygwin suite of GNU tools for
Win32 platforms.

-tkc

3c273 · Jun 8, 2006

Shani said:
I have the following code which takes a list of urls
"http://google.com", without the quotes ofcourse, and then saves there
source code as a text file. I wan to alter the code so that for the
list of URLs an html file is saved.

-----begin-----
import urllib
urlfile = open(r'c:\temp\url.txt', 'r')
for lines in urlfile:
try:
outfilename = lines.replace('/', '-')
urllib.urlretrieve(lines.strip('/n'), 'c:\\temp\\' \
+ outfilename.strip('\n')[7:] + '.txt')
except:
pass
-----end-----

Or is this what you mean?
-----begin-----
import urllib
urlfile = open('c:\\temp\\url.txt', 'r')
newurlfile = open('c:\\temp\\newurls.html', 'w')
newurlfile.write('<html> \n<body>\n')
for lines in urlfile:
try:
if lines == '\n':
pass
else:
lines = '<a href="' + lines.strip() +'">'\
+ lines.strip() + '</a>' + '<br>\n'
newurlfile.write(lines)
except:
pass
newurlfile.write('</body> \n</html>')
urlfile.close()
newurlfile.close()
-----end-----
Louis

3c273 · Jun 8, 2006

3c273 said:
Or is this what you mean?
-----begin-----
import urllib
urlfile = open('c:\\temp\\url.txt', 'r')
newurlfile = open('c:\\temp\\newurls.html', 'w')
newurlfile.write('<html> \n<body>\n')
for lines in urlfile:
try:
if lines == '\n':
pass
else:
lines = '<a href="' + lines.strip() +'">'\
+ lines.strip() + '</a>' + '<br>\n'
newurlfile.write(lines)
except:
pass
newurlfile.write('</body> \n</html>')
urlfile.close()
newurlfile.close()
-----end-----
Louis

Oops, I guess we don't need "import urllib" anymore.
Louis

bruno at modulix · Jun 8, 2006

Shani said:
I have the following code which takes a list of urls
"http://google.com", without the quotes ofcourse, and then saves there
source code as a text file. I wan to alter the code so that for the
list of URLs an html file is saved.

What you write in a text file is up to you - and AFAICT, HTML is still
a text format.

Sion Arrowsmith · Jun 9, 2006

Tim Chase said:
[ ... ]
urllib.urlretrieve(lines.strip('/n'), 'c:\\temp\\' \
+ outfilename.strip('\n')[7:] + '.html')

Click to expand...

[ ... ] I'm not sure what the odd
slicing is for, but I'll presume the OP knows what they're doing.

It's taking the "http://" off the front of the URL.
7

Saving as Text File	11	Dec 20, 2008
emacs lisp as text processing language...	1	Oct 29, 2007
need help with a cart I inherited, need to increase number of total characters allowed	3	Oct 21, 2007
Elisp Tutorial: HTML Syntax Coloring Code Block	6	Oct 18, 2007
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
(OT) wxPython & resizers, html text (code included)	1	Jun 21, 2004
Reading a large number of text files into an array	4	Apr 26, 2005
Need to download exact copy of text file...little squares replace carriage returns	2	Mar 5, 2005

Instead of saving text files i need as html

Shani

Larry Bates

3c273

Tim Chase

3c273

3c273

bruno at modulix

Sion Arrowsmith

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads