Fetch info from website and write to txt file.

P

Pitmairen

I want to make a program that get info from a website and prints it out
in a txt file.

I made this:

import urllib
f = urllib.urlopen("http://www.imdb.com/title/tt0407304/")
s = f.read()
k = open("test.txt","w")
k.write(s)
k.close()
f.close()

That saves all the html code into the test.txt file. But if i for
example only want the genre, plot outline and Cast overview to be
written to the txt file. How can i do that?


And another problem i have:

If the txt file i want the information to be saved in already have some
text saved in it. How can i save the info from the website between the
text that was there before?

for example:

blablablablablablablabla
blablablablablablablabla
blablablablablablablabla
(inset info from website here)
blablablablablablablabla
blablablablablablablabla
blablablablablablablabla


Pitmairen
 
G

gene tani

Pitmairen said:
I want to make a program that get info from a website and prints it out
in a txt file.

I made this:

import urllib
f = urllib.urlopen("http://www.imdb.com/title/tt0407304/")
s = f.read()
k = open("test.txt","w")
k.write(s)
k.close()
f.close()

That saves all the html code into the test.txt file. But if i for
example only want the genre, plot outline and Cast overview to be
written to the txt file. How can i do that?


And another problem i have:

If the txt file i want the information to be saved in already have some
text saved in it. How can i save the info from the website between the
text that was there before?

for example:

blablablablablablablabla
blablablablablablablabla
blablablablablablablabla
(inset info from website here)
blablablablablablablabla
blablablablablablablabla
blablablablablablablabla

to get a text file that looks like your web page, stripped of markup,
look at "lynx -dump" or "w3m -dump" ( i think links2 does the same).
else:

http://groups.google.com/group/comp...arch+this+group&&_doneTitle=Back+to+Search&&d
http://groups.google.com/group/comp...=2&as_maxy=2005&&_doneTitle=Back+to+Search&&d
 
D

Dennis Lee Bieber

That saves all the html code into the test.txt file. But if i for
example only want the genre, plot outline and Cast overview to be
written to the txt file. How can i do that?
Well, how would you do it by hand? Write down the steps you go
through to extract that information from your HTML file by hand... Clean
that up into a generalized algorithm... Write code the performs that
algorithm...

IOW: You'll going to have write code to parse the HTML (there may be
libraries available to help, but you still need to do the recognizer for
the parts you want).
And another problem i have:

If the txt file i want the information to be saved in already have some
text saved in it. How can i save the info from the website between the
text that was there before?

{I'm making enemies today}

Same answer... How would you do this by hand? Translate that
procedure to code.

Though I suspect, in this case, "by hand" would be to open the
entire file into memory (using notepad or some editor). Open the other
text into another memory-based editor. Select, copy, paste... But that
puts all the work of the insertion on the editor program (IE, someone
else had to code the same thing you are asking to make the editor work).

Question: how do you identify /where/ to do the insert... By number
of lines, by some keyword, etc.?

http://cis.stvincent.edu/swd/extsort/extsort.html

Modify as needed (it assumes each "line" is a record to be
sorted/merged, while you want to merge on some arbitrary boundary)
--
 
B

Bruno Desthuilliers

Pitmairen a écrit :
I want to make a program that get info from a website and prints it out
in a txt file.

I made this:

import urllib
f = urllib.urlopen("http://www.imdb.com/title/tt0407304/")
s = f.read()
k = open("test.txt","w")
k.write(s)
k.close()
f.close()

That saves all the html code into the test.txt file. But if i for
example only want the genre, plot outline and Cast overview to be
written to the txt file. How can i do that?

Seems like you want BeautifulSoup:
http://www.crummy.com/software/BeautifulSoup/

And another problem i have:

If the txt file i want the information to be saved in already have some
text saved in it. How can i save the info from the website between the
text that was there before?

for example:

blablablablablablablabla
blablablablablablablabla
blablablablablablablabla
(inset info from website here)
blablablablablablablabla
blablablablablablablabla
blablablablablablablabla

You need to be able to identify the place where you want to insert your
data. Then it's a matter of reading the original file, creating a temp
file, writing lines before insertion point, writing data to insert,
writing remaing lines, closing all files, replacing original file by the
temp file.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,008
Latest member
HaroldDark

Latest Threads

Top