Create a string array of all comments in a html file...

Discussion in 'Python' started by sophie_newbie, Sep 30, 2007.

  1. Hi, I'm wondering how i'd go about extracting a string array of all
    comments in a HTML file, HTML comments obviously taking the format
    "<!-- Comment text here -->".

    I'm fairly stumped on how to do this? Maybe using regular expressions?

    Thanks.
     
    sophie_newbie, Sep 30, 2007
    #1
    1. Advertising

  2. sophie_newbie

    Robin Becker Guest

    sophie_newbie wrote:
    > Hi, I'm wondering how i'd go about extracting a string array of all
    > comments in a HTML file, HTML comments obviously taking the format
    > "<!-- Comment text here -->".
    >
    > I'm fairly stumped on how to do this? Maybe using regular expressions?
    >
    > Thanks.
    >

    You should probably eat beautiful soup at

    http://www.crummy.com/software/BeautifulSoup/documentation.html

    which helps with this sort of task.
    --
    Robin Becker
     
    Robin Becker, Sep 30, 2007
    #2
    1. Advertising

  3. On Sep 30, 10:39 am, sophie_newbie <> wrote:
    > Hi, I'm wondering how i'd go about extracting a string array of all
    > comments in a HTML file, HTML comments obviously taking the format
    > "<!-- Comment text here -->".
    >
    > I'm fairly stumped on how to do this? Maybe using regular expressions?
    >
    > Thanks.


    E:\Ruby>irb --prompt xmp
    "<!-- Comment
    here -->And <i>so</i> funny!
    <p>It was a dark and stormy night.
    </p><!-- Comment <> -->".scan(/<!--.*?-->/m)
    ==>["<!-- Comment\nhere -->", "<!-- Comment <> -->"]
     
    William James, Sep 30, 2007
    #3
  4. sophie_newbie

    Paul McGuire Guest

    On Sep 30, 10:39 am, sophie_newbie <> wrote:
    > Hi, I'm wondering how i'd go about extracting a string array of all
    > comments in a HTML file, HTML comments obviously taking the format
    > "<!-- Comment text here -->".
    >
    > I'm fairly stumped on how to do this? Maybe using regular expressions?
    >
    > Thanks.


    >>> from pyparsing import htmlComment
    >>> htmlComment.searchString("""<!-- Comment

    .... here -->And <i>so</i> funny!
    .... </p><!-- Comment <> -->""").asList()
    [['<!-- Comment \nhere -->'], ['<!-- Comment <> -->']]

    -- Paul
     
    Paul McGuire, Sep 30, 2007
    #4
  5. sophie_newbie wrote:
    > Hi, I'm wondering how i'd go about extracting a string array of all
    > comments in a HTML file, HTML comments obviously taking the format
    > "<!-- Comment text here -->".
    >
    > I'm fairly stumped on how to do this? Maybe using regular expressions?



    from lxml import etree

    parser = etree.HTMLParser()
    tree = etree.parse("somefile.html", parser)

    print tree.xpath("//comment()")


    http://codespeak.net/lxml

    Stefan
     
    Stefan Behnel, Oct 6, 2007
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Hessam
    Replies:
    0
    Views:
    2,182
    Hessam
    Aug 8, 2003
  2. Replies:
    0
    Views:
    1,132
  3. Monk
    Replies:
    10
    Views:
    1,476
    Michael Wojcik
    Apr 20, 2005
  4. Hessam
    Replies:
    1
    Views:
    235
    Teemu Keiski
    Aug 16, 2003
  5. Replies:
    4
    Views:
    611
    Dr John Stockton
    Jun 3, 2006
Loading...

Share This Page