Discussion in 'Python' started by Pitmairen, Mar 6, 2006.

  1. Pitmairen

    Pitmairen Guest

    I want to make a program that get info from a website and prints it out
    in a txt file.

    I made this:

    import urllib
    f = urllib.urlopen("http://www.imdb.com/title/tt0407304/")
    s = f.read()
    k = open("test.txt","w")

    That saves all the html code into the test.txt file. But if i for
    example only want the genre, plot outline and Cast overview to be
    written to the txt file. How can i do that?

    And another problem i have:

    If the txt file i want the information to be saved in already have some
    text saved in it. How can i save the info from the website between the
    text that was there before?

    for example:

    (inset info from website here)

    Pitmairen, Mar 6, 2006
  2. Pitmairen

    gene tani Guest

    to get a text file that looks like your web page, stripped of markup,
    look at "lynx -dump" or "w3m -dump" ( i think links2 does the same).

    gene tani, Mar 6, 2006
  3. Pitmairen

    gene tani Guest

    gene tani, Mar 6, 2006
  4. Well, how would you do it by hand? Write down the steps you go
    through to extract that information from your HTML file by hand... Clean
    that up into a generalized algorithm... Write code the performs that

    IOW: You'll going to have write code to parse the HTML (there may be
    libraries available to help, but you still need to do the recognizer for
    the parts you want).
    {I'm making enemies today}

    Same answer... How would you do this by hand? Translate that
    procedure to code.

    Though I suspect, in this case, "by hand" would be to open the
    entire file into memory (using notepad or some editor). Open the other
    text into another memory-based editor. Select, copy, paste... But that
    puts all the work of the insertion on the editor program (IE, someone
    else had to code the same thing you are asking to make the editor work).

    Question: how do you identify /where/ to do the insert... By number
    of lines, by some keyword, etc.?


    Modify as needed (it assumes each "line" is a record to be
    sorted/merged, while you want to merge on some arbitrary boundary)
    Dennis Lee Bieber, Mar 6, 2006
  5. Pitmairen a écrit :
    Seems like you want BeautifulSoup:

    You need to be able to identify the place where you want to insert your
    data. Then it's a matter of reading the original file, creating a temp
    file, writing lines before insertion point, writing data to insert,
    writing remaing lines, closing all files, replacing original file by the
    temp file.
    Bruno Desthuilliers, Mar 6, 2006
