Fetch info from website and write to txt file.

Discussion in 'Python' started by Pitmairen, Mar 6, 2006.

  1. Pitmairen

    Pitmairen Guest

    I want to make a program that get info from a website and prints it out
    in a txt file.

    I made this:

    import urllib
    f = urllib.urlopen("http://www.imdb.com/title/tt0407304/")
    s = f.read()
    k = open("test.txt","w")
    k.write(s)
    k.close()
    f.close()

    That saves all the html code into the test.txt file. But if i for
    example only want the genre, plot outline and Cast overview to be
    written to the txt file. How can i do that?


    And another problem i have:

    If the txt file i want the information to be saved in already have some
    text saved in it. How can i save the info from the website between the
    text that was there before?

    for example:

    blablablablablablablabla
    blablablablablablablabla
    blablablablablablablabla
    (inset info from website here)
    blablablablablablablabla
    blablablablablablablabla
    blablablablablablablabla


    Pitmairen
     
    Pitmairen, Mar 6, 2006
    #1
    1. Advertising

  2. Pitmairen

    gene tani Guest

    Pitmairen wrote:
    > I want to make a program that get info from a website and prints it out
    > in a txt file.
    >
    > I made this:
    >
    > import urllib
    > f = urllib.urlopen("http://www.imdb.com/title/tt0407304/")
    > s = f.read()
    > k = open("test.txt","w")
    > k.write(s)
    > k.close()
    > f.close()
    >
    > That saves all the html code into the test.txt file. But if i for
    > example only want the genre, plot outline and Cast overview to be
    > written to the txt file. How can i do that?
    >
    >
    > And another problem i have:
    >
    > If the txt file i want the information to be saved in already have some
    > text saved in it. How can i save the info from the website between the
    > text that was there before?
    >
    > for example:
    >
    > blablablablablablablabla
    > blablablablablablablabla
    > blablablablablablablabla
    > (inset info from website here)
    > blablablablablablablabla
    > blablablablablablablabla
    > blablablablablablablabla
    >


    to get a text file that looks like your web page, stripped of markup,
    look at "lynx -dump" or "w3m -dump" ( i think links2 does the same).
    else:

    http://groups.google.com/group/comp...arch this group&&_doneTitle=Back to Search&&d
    http://groups.google.com/group/comp...=2&as_maxy=2005&&_doneTitle=Back to Search&&d
     
    gene tani, Mar 6, 2006
    #2
    1. Advertising

  3. Pitmairen

    gene tani Guest

    Pitmairen wrote:
    > I want to make a program that get info from a website and prints it out
    > in a txt file.
    >
    > I made this:
    >
    > import urllib
    > f = urllib.urlopen("http://www.imdb.com/title/tt0407304/")


    path of even less resistance
    http://imdbpy.sourceforge.net/
     
    gene tani, Mar 6, 2006
    #3
  4. On 6 Mar 2006 10:08:44 -0800, "Pitmairen" <>
    declaimed the following in comp.lang.python:

    > That saves all the html code into the test.txt file. But if i for
    > example only want the genre, plot outline and Cast overview to be
    > written to the txt file. How can i do that?
    >

    Well, how would you do it by hand? Write down the steps you go
    through to extract that information from your HTML file by hand... Clean
    that up into a generalized algorithm... Write code the performs that
    algorithm...

    IOW: You'll going to have write code to parse the HTML (there may be
    libraries available to help, but you still need to do the recognizer for
    the parts you want).

    >
    > And another problem i have:
    >
    > If the txt file i want the information to be saved in already have some
    > text saved in it. How can i save the info from the website between the
    > text that was there before?
    >


    {I'm making enemies today}

    Same answer... How would you do this by hand? Translate that
    procedure to code.

    Though I suspect, in this case, "by hand" would be to open the
    entire file into memory (using notepad or some editor). Open the other
    text into another memory-based editor. Select, copy, paste... But that
    puts all the work of the insertion on the editor program (IE, someone
    else had to code the same thing you are asking to make the editor work).

    Question: how do you identify /where/ to do the insert... By number
    of lines, by some keyword, etc.?

    http://cis.stvincent.edu/swd/extsort/extsort.html

    Modify as needed (it assumes each "line" is a record to be
    sorted/merged, while you want to merge on some arbitrary boundary)
    --
    > ============================================================== <
    > | Wulfraed Dennis Lee Bieber KD6MOG <
    > | Bestiaria Support Staff <
    > ============================================================== <
    > Home Page: <http://www.dm.net/~wulfraed/> <
    > Overflow Page: <http://wlfraed.home.netcom.com/> <
     
    Dennis Lee Bieber, Mar 6, 2006
    #4
  5. Pitmairen a écrit :
    > I want to make a program that get info from a website and prints it out
    > in a txt file.
    >
    > I made this:
    >
    > import urllib
    > f = urllib.urlopen("http://www.imdb.com/title/tt0407304/")
    > s = f.read()
    > k = open("test.txt","w")
    > k.write(s)
    > k.close()
    > f.close()
    >
    > That saves all the html code into the test.txt file. But if i for
    > example only want the genre, plot outline and Cast overview to be
    > written to the txt file. How can i do that?
    >


    Seems like you want BeautifulSoup:
    http://www.crummy.com/software/BeautifulSoup/


    > And another problem i have:
    >
    > If the txt file i want the information to be saved in already have some
    > text saved in it. How can i save the info from the website between the
    > text that was there before?
    >
    > for example:
    >
    > blablablablablablablabla
    > blablablablablablablabla
    > blablablablablablablabla
    > (inset info from website here)
    > blablablablablablablabla
    > blablablablablablablabla
    > blablablablablablablabla
    >


    You need to be able to identify the place where you want to insert your
    data. Then it's a matter of reading the original file, creating a temp
    file, writing lines before insertion point, writing data to insert,
    writing remaing lines, closing all files, replacing original file by the
    temp file.
     
    Bruno Desthuilliers, Mar 6, 2006
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Sameen
    Replies:
    2
    Views:
    475
    Victor Bazarov
    Aug 29, 2005
  2. Replies:
    1
    Views:
    415
    Manish Pandit
    Aug 28, 2007
  3. Fetch session info

    , Jun 24, 2008, in forum: Java
    Replies:
    0
    Views:
    325
  4. Jochen Brenzlinger
    Replies:
    7
    Views:
    5,846
    Roedy Green
    Sep 15, 2011
  5. Ram
    Replies:
    3
    Views:
    277
    Tad McClellan
    Apr 26, 2007
Loading...

Share This Page