replacing tags between tags

Discussion in 'Perl Misc' started by beartiger@gmail.com, Sep 18, 2005.

  1. Guest

    Suppose I wanted to make sure that between any:

    <blockquote></blockquote>

    *all* the <p>s were replaced with <br>s.

    How would I do that?

    E.g.:

    <blockquote>
    That time of year thou mayst in me behold<p>
    When yellow leaves, or none, or few, do hang<p>
    Upon those boughs which shake against the cold,<p>
    Bare ruin'd choirs, where late the sweet birds sang.<p>
    </blockquote>

    Would become:

    <blockquote>
    That time of year thou mayst in me behold<br>
    When yellow leaves, or none, or few, do hang<br>
    Upon those boughs which shake against the cold,<br>
    Bare ruin'd choirs, where late the sweet birds sang.<br>
    </blockquote>

    But all other <p>s outside of <blockquote>s would remain <p>s.

    J
    , Sep 18, 2005
    #1
    1. Advertising

  2. John Bokma Guest

    wrote:

    > Suppose I wanted to make sure that between any:
    >
    > <blockquote></blockquote>
    >
    > *all* the <p>s were replaced with <br>s.
    >
    > How would I do that?


    Remove the <p>'s and use <pre>:

    <blockquote>
    <pre>
    That time of year thou mayst in me behold
    When yellow leaves, or none, or few, do hang
    Upon those boughs which shake against the cold,
    Bare ruin'd choirs, where late the sweet birds sang.
    </pre>
    </blockquote>

    You could use s/// to do this, but it might fail. Better to parse the HTML,
    fix it, and write it out.

    --
    John Small Perl scripts: http://johnbokma.com/perl/
    Perl programmer available: http://castleamber.com/
    Happy Customers: http://castleamber.com/testimonials.html
    John Bokma, Sep 18, 2005
    #2
    1. Advertising

  3. Guest

    John Bokma wrote:
    > wrote:
    >
    > > Suppose I wanted to make sure that between any:
    > >
    > > <blockquote></blockquote>
    > >
    > > *all* the <p>s were replaced with <br>s.
    > >
    > > How would I do that?

    >
    > Remove the <p>'s and use <pre>:
    >
    > <blockquote>
    > <pre>
    > That time of year thou mayst in me behold
    > When yellow leaves, or none, or few, do hang
    > Upon those boughs which shake against the cold,
    > Bare ruin'd choirs, where late the sweet birds sang.
    > </pre>
    > </blockquote>
    >
    > You could use s/// to do this, but it might fail. Better to parse the HTML,
    > fix it, and write it out.


    That answers the specific example, but I was looking for something to
    answer the general case.


    Thanks,
    John
    , Sep 18, 2005
    #3
  4. wrote:
    > John Bokma wrote:

    [...]
    >> You could use s/// to do this, but it might fail. Better to parse
    >> the HTML, fix it, and write it out.

    >
    > That answers the specific example, but I was looking for something to
    > answer the general case.


    Why do you think parsing the HTML would _not_ work in the general case?

    jue
    Jürgen Exner, Sep 18, 2005
    #4
  5. Guest

    Jürgen Exner wrote:
    > wrote:
    > > John Bokma wrote:

    > [...]
    > >> You could use s/// to do this, but it might fail. Better to parse
    > >> the HTML, fix it, and write it out.

    > >
    > > That answers the specific example, but I was looking for something to
    > > answer the general case.

    >
    > Why do you think parsing the HTML would _not_ work in the general case?


    I don't. Would you please illustrate what you mean?


    J
    , Sep 18, 2005
    #5
  6. wrote:
    > Suppose I wanted to make sure that between any:
    >
    > <blockquote></blockquote>
    >
    > *all* the <p>s were replaced with <br>s.
    >
    > How would I do that?
    >
    > E.g.:
    >
    > <blockquote>
    > That time of year thou mayst in me behold<p>
    > When yellow leaves, or none, or few, do hang<p>
    > Upon those boughs which shake against the cold,<p>
    > Bare ruin'd choirs, where late the sweet birds sang.<p>
    > </blockquote>
    >
    > Would become:
    >
    > <blockquote>
    > That time of year thou mayst in me behold<br>
    > When yellow leaves, or none, or few, do hang<br>
    > Upon those boughs which shake against the cold,<br>
    > Bare ruin'd choirs, where late the sweet birds sang.<br>
    > </blockquote>
    >
    > But all other <p>s outside of <blockquote>s would remain <p>s.
    >
    > J


    I suggest using Ruby.

    text = DATA.read
    text.gsub!( %r{<blockquote>.*?</blockquote>}m ){ |str|
    str.gsub( /<p>/, "<br>" )
    }
    puts text

    __END__
    <p>Now begins the plaint.</p>
    <blockquote>
    That time of year thou mayst in me behold<p>
    When yellow leaves, or none, or few, do hang<p>
    Upon those boughs which shake against the cold,<p>
    Bare ruin'd choirs, where late the sweet birds sang.<p>
    </blockquote>
    <p>Another:</p>
    <blockquote>
    Stars, I have seen them fall,<p>
    But when they drop and die<p>
    No star is lost at all<p>
    From all the star-sown sky.<p>
    </blockquote>

    --------------------------------------------------------------

    Output:

    <p>Now begins the plaint.</p>
    <blockquote>
    That time of year thou mayst in me behold<br>
    When yellow leaves, or none, or few, do hang<br>
    Upon those boughs which shake against the cold,<br>
    Bare ruin'd choirs, where late the sweet birds sang.<br>
    </blockquote>
    <p>Another:</p>
    <blockquote>
    Stars, I have seen them fall,<br>
    But when they drop and die<br>
    No star is lost at all<br>
    From all the star-sown sky.<br>
    </blockquote>
    William James, Sep 19, 2005
    #6
  7. Guest

    I ended up using a simple all-purpose tag parser, a la:

    sub parse_tagged_text
    {

    my $text=shift;

    my @parsed;

    while($text)
    {

    $text=~/^(<[^>]*>|[^<]*)/gs;

    if($& eq "")
    {
    print "<!-- parse_tagged_text looped -->\n";
    print substr($text,0,50)."\n";
    exit;
    }

    push(@parsed, $&);

    $text=$';

    }

    @parsed;
    }

    Seems to work great.


    Thanks to all,

    J
    , Sep 19, 2005
    #7
  8. John Bokma Guest

    John Bokma, Sep 19, 2005
    #8
  9. wrote:
    > Jürgen Exner wrote:
    >> wrote:
    >>> John Bokma wrote:

    >> [...]
    >>>> You could use s/// to do this, but it might fail. Better to parse
    >>>> the HTML, fix it, and write it out.
    >>>
    >>> That answers the specific example, but I was looking for something
    >>> to answer the general case.

    >>
    >> Why do you think parsing the HTML would _not_ work in the general
    >> case?

    >
    > I don't. Would you please illustrate what you mean?


    Well, John wrote:
    <quote>Better to parse the HTML, fix it, and write it out.</quote>

    You replied:
    <quote>
    >>> That answers the specific example, but I was looking for something
    >>> to answer the general case.

    </quote>

    To me that seems to imply that you do not believe that parsing the HTML
    would work only for the specific example but not for the general case. If
    this was not what you meant then I obviously misunderstood what you wrote.

    Anyway, this topic has been discussed a gazillion times before. To parse
    HTML use a proper HTML parser because contrary to popular believe parsing
    HMTL is not trivial. For further details please see DejaNews and the FAQ
    (perldoc -q HTML: " How do I remove HTML from a string?").

    jue
    Jürgen Exner, Sep 19, 2005
    #9
  10. William James wrote:
    > I suggest using Ruby.


    Which of course is widely off topic in a Perl NG

    > text = DATA.read
    > text.gsub!( %r{<blockquote>.*?</blockquote>}m ){ |str|
    > str.gsub( /<p>/, "<br>" )


    and fails for the same reasons as any other simple minded approach to parse
    HTML using REs.

    jue
    Jürgen Exner, Sep 19, 2005
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Erik

    replacing @javadoc tags

    Erik, Feb 6, 2004, in forum: Java
    Replies:
    2
    Views:
    640
    Darryl L. Pierce
    Feb 7, 2004
  2. jumblesale

    Replacing html tags

    jumblesale, Oct 4, 2006, in forum: ASP .Net
    Replies:
    3
    Views:
    425
    Mark Fitzpatrick
    Oct 4, 2006
  3. Rob Meade

    Replacing - and not Replacing...

    Rob Meade, Apr 5, 2005, in forum: ASP General
    Replies:
    5
    Views:
    255
    Chris Hohmann
    Apr 11, 2005
  4. Replies:
    7
    Views:
    114
    Peter Makholm
    Sep 11, 2007
  5. laredotornado

    Replacing urls with anchor tags

    laredotornado, Nov 9, 2010, in forum: Perl Misc
    Replies:
    0
    Views:
    104
    laredotornado
    Nov 9, 2010
Loading...

Share This Page