regexp question + html::parser question on the side

Discussion in 'Perl Misc' started by boris bass, Sep 26, 2003.

  1. boris bass

    boris bass Guest

    i am trying to scan the html file for all the anchor tags and the
    following regex


    while ( $content =~ /<\s*a\s+.*?>/s )


    doesn't seem to work , what is wrong here?

    angle brackets don't need to be escaped, do they?

    obviously, the string i am trying to match is

    <a href="something">

    but i get no match


    ps. this could probably be done via HTML::parser module and not
    through the regular expressions.

    the task i am trying to accomplish: find

    <a href="something">

    and change it to

    <a href="">

    i.e. delete a link to whatever and leave an empty string in place of
    it.

    if somebody could post a code snippet how to do it would also be
    appreciated. doesn't have to be tested, just to point me at the right
    direction. i looked at html::parser doc page, but i haven't figured it
    out on my own


    thanks,


    boris
    boris bass, Sep 26, 2003
    #1
    1. Advertising

  2. boris bass

    Anno Siegel Guest

    boris bass <> wrote in comp.lang.perl.misc:
    > i am trying to scan the html file for all the anchor tags and the
    > following regex
    >
    >
    > while ( $content =~ /<\s*a\s+.*?>/s )
    >
    >
    > doesn't seem to work , what is wrong here?
    >
    > angle brackets don't need to be escaped, do they?
    >
    > obviously, the string i am trying to match is
    >
    > <a href="something">
    >
    > but i get no match


    I do. *shrug*

    > ps. this could probably be done via HTML::parser module and not
    > through the regular expressions.


    Yes.

    Anno
    Anno Siegel, Sep 26, 2003
    #2
    1. Advertising

  3. boris bass

    Guest

    (boris bass) writes:

    > ps. this could probably be done via HTML::parser module and not
    > through the regular expressions.
    >
    > the task i am trying to accomplish: find
    >
    > <a href="something">
    >
    > and change it to
    >
    > <a href="">
    >
    > i.e. delete a link to whatever and leave an empty string in place of
    > it.


    http://search.cpan.org/src/GAAS/HTML-Parser-3.31/eg/hrefsub is an
    example of some code that can do this (and more).

    --
    Gisle Aas
    , Sep 26, 2003
    #3
  4. boris bass <> wrote:

    > if somebody could post a code snippet how to do it would also be
    > appreciated. doesn't have to be tested, just to point me at the
    > right direction. i looked at html::parser doc page, but i haven't
    > figured it out on my own


    If HTML::parser seems weird, try HTML::TokeParser. It may seem more
    intuitive.


    --
    David Wall
    David K. Wall, Sep 26, 2003
    #4
  5. boris bass

    Bob Walton Guest

    boris bass wrote:

    > i am trying to scan the html file for all the anchor tags and the
    > following regex
    >
    >
    > while ( $content =~ /<\s*a\s+.*?>/s )
    >
    >
    > doesn't seem to work , what is wrong here?



    Mostly you need the g and the i switches. The g switch so you don't
    generate an infinite loop on the first match, and the i switch so you
    match something like <A href="xxx">. Example:

    {local $/;$content=<DATA>}
    while($content=~/<\s*a\s+.*?>/sgi){print "matched $&\n"}
    __END__
    some html <a href="sdflkj"> and <A href="sflkjwer"> and< a
    href="werlkj" > and <a href="sdlfkj">

    And note that that will not perform perfectly due to the possibility of
    stuff like maybe <img src="xxx" alt="<a b c d>"> etc etc in the HTML.
    For 100% performance, use one of the HTML parsers.


    ....


    > boris


    --
    Bob Walton
    Bob Walton, Sep 27, 2003
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. ZOCOR

    XML Parser VS HTML Parser

    ZOCOR, Oct 3, 2004, in forum: Java
    Replies:
    11
    Views:
    811
    Paul King
    Oct 5, 2004
  2. Zach Dennis

    HTML-Parser / SGML-Parser

    Zach Dennis, Oct 1, 2003, in forum: Ruby
    Replies:
    5
    Views:
    402
    Bernard Delmée
    Oct 1, 2003
  3. Greg Hurrell
    Replies:
    4
    Views:
    159
    James Edward Gray II
    Feb 14, 2007
  4. Mikel Lindsaar
    Replies:
    0
    Views:
    482
    Mikel Lindsaar
    Mar 31, 2008
  5. Joao Silva
    Replies:
    16
    Views:
    359
    7stud --
    Aug 21, 2009
Loading...

Share This Page