small regexp problem

Discussion in 'Perl Misc' started by David Morel, Dec 30, 2003.

  1. David Morel

    David Morel Guest

    Hi all,

    It's been a while since I've used regular expressions, and I'd like a
    bit of help.

    I have a string of html --- $html. What I want to do is to isolate the
    substrings that are in between a particular tag... say between <b> and
    </b>.

    So if $html = "asdf asdf asdf <b>foo</b> asdf asdf asdf <b>bar</b>", I
    would like to somehow get "foo" and "bar" into an array.

    This seems like it would be easy with the appropriate regexp.

    Thanks!
    David Morel, Dec 30, 2003
    #1
    1. Advertising

  2. David Morel

    Kris Jenkins Guest

    David Morel wrote:
    > Hi all,
    >
    > It's been a while since I've used regular expressions, and I'd like a
    > bit of help.
    >
    > I have a string of html --- $html. What I want to do is to isolate the
    > substrings that are in between a particular tag... say between <b> and
    > </b>.
    >
    > So if $html = "asdf asdf asdf <b>foo</b> asdf asdf asdf <b>bar</b>", I
    > would like to somehow get "foo" and "bar" into an array.
    >
    > This seems like it would be easy with the appropriate regexp.
    >
    > Thanks!


    Have a look at the document HTML::Tree:Scanning, under the "Scanning
    HTML Trees" heading. It gives a couple of recipes, suggests why regexs
    may be a little fragile for this task, and why HTML::TreeBuilder _might_
    be better*. (You can get it by CPANing HTML::Tree.)

    * For a given value of 'better'.

    Kris
    Kris Jenkins, Dec 30, 2003
    #2
    1. Advertising

  3. David Morel wrote:
    > I have a string of html --- $html. What I want to do is to isolate
    > the substrings that are in between a particular tag... say between
    > <b> and </b>.
    >
    > So if $html = "asdf asdf asdf <b>foo</b> asdf asdf asdf <b>bar</b>",
    > I would like to somehow get "foo" and "bar" into an array.
    >
    > This seems like it would be easy with the appropriate regexp.


    It rather seems like you should explore one of the modules for parsing
    HTML, such as HTML::parser.

    But I still had to play with a regex... I used the one from

    perldoc -q "remove HTML"

    as a starting point for writing a sub, that captures the substrings in
    a reference to a hash of arrays:

    sub extract {
    my ($html, $elements) = @_;
    my %substrings;
    for my $elem (@$elements) {
    while ( $$html =~ m{
    <\s*($elem)\b(?:[^>'"]*|(['"]).*?\2)*>
    (.+?)
    <\s*/\s*$elem\s*>}gisx ) {
    push @{$substrings{$1}}, $3;
    }
    }
    return \%substrings;
    }

    my $html = <<HTML;
    asdf asdf asdf <b>foo</b> asdf asdf asdf <b>bar</b>
    <a href="http://search.cpan.org/">search.cpan.org</a>
    HTML

    my $substrings = extract( \$html, [ qw/a b/ ] );

    for ( keys %$substrings ) {
    print "Element: $_\n";
    for ( @{ $substrings->{$_} } ) {
    print " $_\n";
    }
    print "\n";
    }

    Outputs:
    Element: a
    search.cpan.org

    Element: b
    foo
    bar

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Dec 30, 2003
    #3
  4. David Morel <> wrote:

    > I have a string of html


    > This seems like it would be easy with the appropriate regexp.



    Then you weren't paying attention when you read the Perl FAQs
    about HTML. :)


    perldoc -q HTML

    How do I remove HTML from a string?


    wherein there are examples that make it hard rather than easy.

    Use a module that understands HTML data when you need to
    process HTML data.


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Dec 30, 2003
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Claus K.
    Replies:
    2
    Views:
    376
    Claus K.
    May 7, 2006
  2. Replies:
    8
    Views:
    107
    Logan Capaldo
    Mar 10, 2006
  3. Joao Silva
    Replies:
    16
    Views:
    344
    7stud --
    Aug 21, 2009
  4. Shai

    Regexp small question

    Shai, Mar 1, 2005, in forum: Perl Misc
    Replies:
    8
    Views:
    95
    Brian McCauley
    Mar 1, 2005
  5. rusi

    small regexp help

    rusi, Oct 30, 2013, in forum: Python
    Replies:
    1
    Views:
    101
Loading...

Share This Page