regular expression could do this?(newbie)

Discussion in 'Perl Misc' started by Alont, Sep 21, 2004.

  1. Alont

    Alont Guest

    I want to pattern a text block, but the text block very large(and
    multi-line), the first line should be:
    <html><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

    and the end of the text block:
    rel="external" target="new">Forum</a></li>
    </ul>
    </div>
    </div>
    </div>

    so, how I can pattern the text block in a html file(many html files
    waiting for pattern and then replace to"<!-- #include
    virtual="/Head.inc" -->")

    I have seen much examples, but can't find a example could do this
    --
    Your fault as a Government is My failure as a citizen
     
    Alont, Sep 21, 2004
    #1
    1. Advertising

  2. Alont

    wfsp Guest

    "Alont" <> wrote in message
    news:41519803.43923109@130.133.1.4...
    >I want to pattern a text block, but the text block very large(and
    > multi-line), the first line should be:
    > <html><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    >
    > and the end of the text block:
    > rel="external" target="new">Forum</a></li>
    > </ul>
    > </div>
    > </div>
    > </div>
    >
    > so, how I can pattern the text block in a html file(many html files
    > waiting for pattern and then replace to"<!-- #include
    > virtual="/Head.inc" -->")
    >
    > I have seen much examples, but can't find a example could do this
    > --
    > Your fault as a Government is My failure as a citizen


    Using regexs on HTML is _very_ difficult; especially "many" "very large"
    files. My advice would be to not even consider it. There are many good
    modules to parse HTML (I use HTML::Tokeparser) and I would urge you to have
    a look at them. If you hit any snags come back with what you have tried and
    we'll see how we go from there.
    Best of luck.
     
    wfsp, Sep 21, 2004
    #2
    1. Advertising

  3. Alont

    Alont Guest

    "wfsp" <>Wrote at Tue, 21 Sep 2004 07:35:32 +0000
    (UTC):
    >Using regexs on HTML is _very_ difficult; especially "many" "very large"
    >files. My advice would be to not even consider it. There are many good
    >modules to parse HTML (I use HTML::Tokeparser) and I would urge you to have
    >a look at them. If you hit any snags come back with what you have tried and
    >we'll see how we go from there.
    >Best of luck.
    >


    I'll try what you say, thank you:)
    --
    Your fault as a Government is My failure as a citizen
     
    Alont, Sep 21, 2004
    #3
  4. Alont

    Jim Keenan Guest

    Alont <> wrote in message news:<41519803.43923109@130.133.1.4>...
    > I want to pattern a text block, but the text block very large(and
    > multi-line), the first line should be:
    > <html><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    >
    > and the end of the text block:
    > rel="external" target="new">Forum</a></li>
    > </ul>
    > </div>
    > </div>
    > </div>
    >
    > so, how I can pattern the text block in a html file(many html files
    > waiting for pattern and then replace to"<!-- #include
    > virtual="/Head.inc" -->")
    >


    The keys to solving a regex like this are: (1) use the 's' qualifier
    so '\n' gets counted in '.'; (2) use the 'x' qualifier so that you can
    include comments and whitespace within the substitution code; (3)
    build up the successful matches incrementally. I built up the
    successful match using the commented-out lines below beginning with
    'if'.

    my $str = '<html><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
    Transitional//EN"

    text in the middle:
    rel="external" target="new">Forum</a></li>
    </ul>
    </div>
    </div>
    </div>
    ';

    print $str, "\n";

    # if ($str =~ s{<html><!DOCTYPE\shtml\sPUBLIC\s"}
    # if ($str =~ s{<html><!DOCTYPE\shtml\sPUBLIC\s"-\/\/W3C}
    # if ($str =~ s{<html><!DOCTYPE\shtml\sPUBLIC\s"-\/\/W3C\/\/DTD\sXHTML\s1.0\s}
    # if ($str =~ s{<html><!DOCTYPE\shtml\sPUBLIC\s"-\/\/W3C\/\/DTD\sXHTML\s1.0\sTransitional\/\/EN"\n}
    # failure
    if ($str =~ s{<html><!DOCTYPE\shtml\sPUBLIC\s"-\/\/W3C\/\/DTD\sXHTML\s1.0\sTransitional\/\/EN"\s
    .*\s
    rel="external"\starget="new">Forum<\/a><\/li>\s
    \s+<\/ul>\s
    \s+<\/div>\s
    \s+<\/div>\s
    <\/div>\s
    } # end of pattern to be matched
    {"<!-- #include virtual="\/Head.inc" -->"}sx # text to be
    substituted
    # qualifiers to make \n work as \s, ignore whitespace and
    comments
    ) # end of 'if' condition
    {
    print "Success! String is now:\n";
    print "$str\n";
    } else {
    print "Failure\n";
    }
     
    Jim Keenan, Sep 21, 2004
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Keith-Earl
    Replies:
    1
    Views:
    478
    Mary Chipman
    Jun 15, 2004
  2. VSK
    Replies:
    2
    Views:
    2,398
  3. Lee
    Replies:
    6
    Views:
    367
    Alan Moore
    Oct 14, 2003
  4. =?iso-8859-1?B?bW9vcJk=?=

    Matching abitrary expression in a regular expression

    =?iso-8859-1?B?bW9vcJk=?=, Dec 1, 2005, in forum: Java
    Replies:
    8
    Views:
    886
    Alan Moore
    Dec 2, 2005
  5. GIMME
    Replies:
    3
    Views:
    12,053
    vforvikash
    Dec 29, 2008
Loading...

Share This Page