Regexp to combine table cells

Discussion in 'Perl Misc' started by Bart Van der Donck, May 9, 2014.

  1. Hello,

    I'm having difficulties to find a regular expression starting from the following input:

    my $row = '
    <tr>
    <td>UNIQUESTRING</td>
    <td>A</td>
    <td>A</td>
    <td>A</td>
    <td>B</td>
    <td>B</td>
    <td>B</td>
    <td>B</td>
    <td>B</td>
    <td>A</td>
    <td>A</td>
    <td></td>
    <td></td>
    <td>C</td>
    <td></td>
    </tr>
    ';

    What I would like to achieve:

    <tr>
    <td>UNIQUESTRING</td>
    <td colspan="3">A</td>
    <td colspan="5">B</td>
    <td colspan="2">A</td>
    <td colspan="2"></td>
    <td>C</td>
    <td></td>
    </tr>

    Thanks

    --
    Bart
     
    Bart Van der Donck, May 9, 2014
    #1
    1. Advertising

  2. Bart Van der Donck <> writes:

    > Hello,
    >
    > I'm having difficulties to find a regular expression starting from the
    > following input:


    Regular expressions are really not the right toll for that task. It can
    be done with perl regular expressions, but it wont be pretty.

    You should look at a real HTML parser instead. My personal preference
    would be to use HTML::TreeBuilder to parse your HTML. This would leave
    you with a navigatable perl structure where you can itterate over the
    <td> elements and the generate a new list of <td> elements.

    Should be quite simple...

    //Makholm
     
    Peter Makholm, May 9, 2014
    #2
    1. Advertising

  3. Bart Van der Donck

    gamo Guest

    El 09/05/14 10:59, Bart Van der Donck escribió:
    > Hello,
    >
    > I'm having difficulties to find a regular expression starting from the following input:
    >
    > my $row = '
    > <tr>
    > <td>UNIQUESTRING</td>
    > <td>A</td>
    > <td>A</td>
    > <td>A</td>
    > <td>B</td>
    > <td>B</td>
    > <td>B</td>
    > <td>B</td>
    > <td>B</td>
    > <td>A</td>
    > <td>A</td>
    > <td></td>
    > <td></td>
    > <td>C</td>
    > <td></td>
    > </tr>
    > ';
    >
    > What I would like to achieve:
    >
    > <tr>
    > <td>UNIQUESTRING</td>
    > <td colspan="3">A</td>
    > <td colspan="5">B</td>
    > <td colspan="2">A</td>
    > <td colspan="2"></td>
    > <td>C</td>
    > <td></td>
    > </tr>
    >
    > Thanks
    >
    > --
    > Bart
    >


    There is poetry out there remarking that html is not regex.
    How I do this specific example, in 3 steps:
    1) Deleting all html tags with tr///, and UNIQUESTRING
    2) Counting the sequence which results
    3) Printing the new table with the counters

    HTH



    --
    http://www.telecable.es/personales/gamo/
     
    gamo, May 9, 2014
    #3
  4. Bart Van der Donck <> writes:

    > Hello,
    >
    > I'm having difficulties to find a regular expression starting from the
    > following input:
    >
    > my $row = '
    > <tr>
    > <td>UNIQUESTRING</td>
    > <td>A</td>
    > <td>A</td>
    > <td>A</td>
    > <td>B</td>
    > <td>B</td>
    > <td>B</td>
    > <td>B</td>
    > <td>B</td>
    > <td>A</td>
    > <td>A</td>
    > <td></td>
    > <td></td>
    > <td>C</td>
    > <td></td>
    > </tr>
    > ';
    >
    > What I would like to achieve:
    >
    > <tr>
    > <td>UNIQUESTRING</td>
    > <td colspan="3">A</td>
    > <td colspan="5">B</td>
    > <td colspan="2">A</td>
    > <td colspan="2"></td>
    > <td>C</td>
    > <td></td>
    > </tr>


    As has been said, REs are probably not the right solution. They often
    result in fragile code when used with HTML. Still, it's a interesting
    questions so here's an answer:

    $row =~ s! <td> ([^<]*) </td> ((?: \s* <td>\1</td> )+)
    ! "<td colspan='" . ((() = $2 =~ /<td>/g) + 1) . "'>$1</td>" !xeg;

    If the A, B and C of your examples are actually very much more complex
    then my [^<]* won't do and the whole thing will look a lot worse.

    --
    Ben.
     
    Ben Bacarisse, May 9, 2014
    #4
  5. Bart Van der Donck

    Tim McDaniel Guest

    In article <lkianp$mqs$>, gamo <>
    wrote:
    >El 09/05/14 10:59, Bart Van der Donck escribió:
    >> my $row = '
    >> <tr>
    >> <td>UNIQUESTRING</td>
    >> <td>A</td>
    >> <td>A</td>
    >> <td>A</td>
    >> <td>B</td>
    >> <td>B</td>
    >> <td>B</td>
    >> <td>B</td>
    >> <td>B</td>
    >> <td>A</td>
    >> <td>A</td>
    >> <td></td>
    >> <td></td>
    >> <td>C</td>
    >> <td></td>
    >> </tr>
    >> ';

    ....
    >There is poetry out there remarking that html is not regex.


    Not regexpable in general. "This ... is wrong tool. Never use this."

    >1) Deleting all html tags with tr///, and UNIQUESTRING


    tr/// does single-character substitution. You can't get rid of the
    HTML tags with tr. s///, probably, if there's nothing like < inside
    comments.

    --
    Tim McDaniel,
     
    Tim McDaniel, May 9, 2014
    #5
  6. Bart Van der Donck

    gamo Guest

    El 09/05/14 18:11, Tim McDaniel escribió:
    > tr/// does single-character substitution. You can't get rid of the
    > HTML tags with tr. s///, probably, if there's nothing like < inside
    > comments.


    Thank you for the correction.

    --
    http://www.telecable.es/personales/gamo/
     
    gamo, May 9, 2014
    #6
  7. # have fun !


    use strict;
    use warnings;


    my $row = '
    <tr>
    <td>UNIQUESTRING</td>
    <td>A</td>
    <td>A</td>
    <td>A</td>
    <td>B</td>
    <td>B</td>
    <td>B</td>
    <td>B</td>
    <td>B</td>
    <td>A</td>
    <td>A</td>
    <td></td>
    <td></td>
    <td>C</td>
    <td></td>
    </tr>
    ';



    my @data;
    my $regex = qr/<td>(.*?)<\/td>/o;
    my $i=0;
    my $out = "<tr>\n";
    my $match_previous = 'UNIQUESTRING';


    while ( $row =~ /$regex/g )
    {
    $i++ if $match_previous ne $^N;
    $data[$i]->{ $^N }++;
    $match_previous = $^N;;
    }


    foreach (@data) {
    my ($k,$v) = each %{$_};
    $out .= $v ==1 ? "<td>$k</td>\n" : "<td colspan=\"$v\">$k</td>\n"
    }

    $out .= '</tr>';


    print $out;
     
    George Mpouras, May 9, 2014
    #7
  8. Thanks to everyone for the input. Ben's solution is technically brilliant, but indeed the A/B/C are more complex; and the <td>'s have their arguments as well. George's approach fails at EOL, but appears to be okay with an initial

    $row =~ tr/\015\012//d;

    Then still a trick to handle the arguments of each <td>. Fortunately I'm ina situation where identical cell values always have identical <td>-arguments:

    # at init
    my $row = '
    <tr>
    <td>§ class="c1" §A</td>
    <td>§ class="c1" §A</td>
    <td>§ class="c2" §B</td>
    <td></td>
    </tr>
    ';

    # final regex
    $out =~ s/>§ (.*?) §/$1>/g;

    And I believe that should do it...

    --
    Bart
     
    Bart Van der Donck, May 10, 2014
    #8
  9. Bart Van der Donck

    Kaz Kylheku Guest

    On 2014-05-09, Bart Van der Donck <> wrote:
    > Hello,
    >
    > I'm having difficulties to find a regular expression starting from the following input:
    >
    > my $row = '
    > <tr>
    > <td>UNIQUESTRING</td>
    > <td>A</td>
    > <td>A</td>
    > <td>A</td>
    > <td>B</td>
    > <td>B</td>
    > <td>B</td>
    > <td>B</td>
    > <td>B</td>
    > <td>A</td>
    > <td>A</td>
    > <td></td>
    > <td></td>
    > <td>C</td>
    > <td></td>
    > </tr>
    > ';


    TXR language, version 89:

    @(output :into str)
    <tr>
    <td>UNIQUESTRING</td>
    <td>A</td>
    <td>A</td>
    <td>A</td>
    <td>B</td>
    <td>B</td>
    <td>B</td>
    <td>B</td>
    <td>B</td>
    <td>A</td>
    <td>A</td>
    <td></td>
    <td></td>
    <td>C</td>
    <td></td>
    </tr>
    @(end)
    @(next :list str)
    <tr>
    <td>@header</td>
    @(collect :vars (item count))
    @(bind count 1)
    <td>@item</td>
    @(collect :gap 0)
    <td>@item</td>
    @(do (inc count))
    @(end)
    @(until)
    </tr>
    @(end)
    @(output)
    <tr>
    <td>@header</td>
    @(repeat :vars (count))
    <td@(if (> count 1) ` colspan="@count"` "")>@item</td>
    @(end)
    </tr>
    @(end)


    Output:

    <tr>
    <td>UNIQUESTRING</td>
    <td colspan="3">A</td>
    <td colspan="5">B</td>
    <td colspan="2">A</td>
    <td colspan="2"></td>
    <td>C</td>
    <td></td>
    </tr>
     
    Kaz Kylheku, May 12, 2014
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    530
  2. bbxrider
    Replies:
    0
    Views:
    606
    bbxrider
    Jul 14, 2003
  3. Cofa via CofaMail
    Replies:
    1
    Views:
    1,545
    Cofa via CofaMail
    Mar 1, 2010
  4. Joel Finkel

    Cells[].Text or Cells[].Controls[0]

    Joel Finkel, Sep 1, 2003, in forum: ASP .Net Datagrid Control
    Replies:
    0
    Views:
    311
    Joel Finkel
    Sep 1, 2003
  5. Joao Silva
    Replies:
    16
    Views:
    363
    7stud --
    Aug 21, 2009
Loading...

Share This Page