strip newlines in TD cell ?

Discussion in 'Perl Misc' started by Richard A. DeVenezia, Sep 29, 2003.

  1. Can't figure this one out...

    How can I strip all the newlines of stuff between <TD and </TD> ?

    I read in and join some HTML
    <TABLE><TR><TD>1
    2
    3
    </TD>
    <TR><TD>A
    B
    C</TD></TR></TABLE>

    that I want process as

    <TABLE><TR><TD>1 2 3</TD>
    <TR><TD>A B C</TD></TR></TABLE>

    Thanks.
     
    Richard A. DeVenezia, Sep 29, 2003
    #1
    1. Advertisements

  2. Maybe by using an HTML parser to parse HTML?
    Contrary to popular believe parsing HTML correctly is close to rocket
    science and nobody with a sane mind would attempt to do that using REs
    alone.

    For further details please see the FAQ. 'perldoc -q HTML':
    "How do I remove HTML from a string?"

    jue
     
    Jürgen Exner, Sep 29, 2003
    #2
    1. Advertisements


  3. Use a module that can properly parse HTML.


    Then why did you say you wanted to _strip_ newlines?

    If you stripped newlines, you'd end up with:

    <TABLE><TR><TD>123</TD>
    <TR><TD>ABC</TD></TR></TABLE>

    It appears that what you actually want is to replace newlines
    with spaces...


    s#(<TD>.*?</TD>)# $a=$1; $a =~ tr/\n/ /; $a #gse;


    But that does not produce output like your example either.

    I'll leave it to you to make it do whatever it is that you want done...
     
    Tad McClellan, Sep 29, 2003
    #3
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.