perl html parser

Discussion in 'Perl Misc' started by kevin kitenik, Nov 11, 2010.

  1. Hi everybody,

    i have a piece of html file, that countain special if-then-else statements :
    like these ones:
    <if condition="$vboptions['hometitle']"><a href="$vboptions[homeurl]">$vboptions[hometitle]</a> -</
    if>
    <if condition="$vboptions[privacyurl]"><a href="$vboptions[privacyurl]"><else><tr><td>test here
    </if>

    thos i statement can be imbricated :
    if ... then ...
    else
    if .. then ...
    fi
    fi


    the problem is i awant a wat to tansform these staments to :
    ((cond1)) ? (exec1)) : ((exec2)) styles.

    how can i do this ???
    i used the cpan without any succes !!

    use Parse::RecDescent;
    my @s=( q{<if condition="$vboptions['hometitle']"> <a href="$vboptions[homeurl]">$vboptions
    [hometitle]</a> - </if>});

    &pars;
    sub pars {
    my $parser = new Parse::RecDescent( q{
    startrule: S
    S: if ifC '">' then S else S fi {$return="(($item[2]) ? (\"$item[5]\") : ($item[7]))";}
    | if ifC '">' then S fi {$return="(($item[2]) ? (\"$item[5]\") : (\"\"))";}
    | html {$return=$item[1];}
    if: '<if condition="'
    fi: '</if>'
    ifC: /[^"]+/
    then: ''
    else: '<else>' | '<else />'
    html: /[\w\d_\$,\[\] ="\/\<\>-]+/ });
    foreach my $s (@s){
    print $s . ":\n" . $parser->startrule( $s ) . "\n"} }


    i thank you in advance, for any syggestions, cause i have a headeack ;-)
    --
    thanks a lot.
     
    kevin kitenik, Nov 11, 2010
    #1
    1. Advertising

  2. kevin kitenik

    Guest

    On 11 Nov 2010 13:06:53 GMT, kevin kitenik <> wrote:

    >Hi everybody,
    >
    >i have a piece of html file, that countain special if-then-else statements :
    >like these ones:
    ><if condition="$vboptions['hometitle']"><a href="$vboptions[homeurl]">$vboptions[hometitle]</a> -</
    >if>
    ><if condition="$vboptions[privacyurl]"><a href="$vboptions[privacyurl]"><else><tr><td>test here
    ></if>
    >
    >thos i statement can be imbricated :
    >if ... then ...
    >else
    > if .. then ...
    > fi
    >fi
    >
    >
    >the problem is i awant a wat to tansform these staments to :
    >((cond1)) ? (exec1)) : ((exec2)) styles.
    >
    >how can i do this ???
    >i used the cpan without any succes !!
    >
    >use Parse::RecDescent;
    >my @s=( q{<if condition="$vboptions['hometitle']"> <a href="$vboptions[homeurl]">$vboptions
    >[hometitle]</a> - </if>});
    >
    >&pars;
    >sub pars {
    > my $parser = new Parse::RecDescent( q{
    >startrule: S
    >S: if ifC '">' then S else S fi {$return="(($item[2]) ? (\"$item[5]\") : ($item[7]))";}
    > | if ifC '">' then S fi {$return="(($item[2]) ? (\"$item[5]\") : (\"\"))";}
    > | html {$return=$item[1];}
    >if: '<if condition="'
    >fi: '</if>'
    >ifC: /[^"]+/
    >then: ''
    >else: '<else>' | '<else />'
    >html: /[\w\d_\$,\[\] ="\/\<\>-]+/ });
    >foreach my $s (@s){
    > print $s . ":\n" . $parser->startrule( $s ) . "\n"} }
    >
    >
    >i thank you in advance, for any syggestions, cause i have a headeack ;-)


    I'm not supprised you have a headache.
    You could see what its doing if you set $::RD_TRACE = 1;

    Lets look at one of your data strings.
    q{<if condition="$vboptions[privacytitle]"><a href="$vboptions[privacyurl]">
    <else><tr><td> test here
    </if>}

    -----------------------------------
    use strict;
    use warnings;
    use Parse::RecDescent;


    $::RD_TRACE = 1;

    my @s=(
    q{<if condition="$vboptions[privacytitle]"><a href="$vboptions[privacyurl]">
    <else><tr><td> test here
    </if>}
    );

    &pars;

    sub pars {
    my $parser = new Parse::RecDescent( q{

    startrule: S
    S: if ifC '">' then S else S fi {$return="(($item[2]) ? (\"$item[5]\") : ($item[7]))";}
    | if ifC '">' then S fi {$return="(($item[2]) ? (\"$item[5]\") : (\"\"))";}
    | html {$return=$item[1];}

    if: '<if condition="'
    fi: '</if>'
    ifC: /[^"]*/
    then: ''
    else: '<else>' | '<else />'
    html: /[\w\d_\$,\[\] ="\/<>-]+/ });
    foreach my $s (@s)
    {
    print "\n",'+'x30,"\n",$s,":\n",'-'x30,"\n", ($parser->startrule( $s )),"\n";
    }
    }

    __END__
    -----------------------------------


    The first time through S, it finds
    if ifC '">' then
    which is
    if: $item[1] - '<if condition="' (literal)
    ifC: $item[2] - '$vboptions[privacytitle]' =~ /[^"]+/
    $item[3] - '">' (literal)
    then: $item[4] - '' (literal)

    Then it recurses S, it finds
    html
    which is
    $item[5] - '<a href="$vboptions[privacyurl]">' =~ /[\w\d_\$,\[\] ="\/<>-]+/

    Back from recursion, it then finds
    else
    which is
    $item[6] - '<else>' (literal)

    Then, recurse S again, it finds
    html
    which is
    $item[7] - '<tr><td> test here' =~ /[\w\d_\$,\[\] ="\/<>-]+/

    Back from recursion, it then finds
    fi
    which is
    $item[8] - '</fi>' (literal)

    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


    This code will produce a proper result if and
    only if there is a separator between

    S (separator) else
    and
    S (separator) fi

    that is NOT in the html production /[\w\d_\$,\[\] ="\/<>-]+/.

    This can be a TAB or a NewLine because that is not in the character
    class of that regex.

    For example, this:
    q{<if condition="$vboptions[privacytitle]"><a href="$vboptions[privacyurl]"><else>
    <tr><td> test here
    </if>}
    will fail because
    '<a href="$vboptions[privacyurl]"><else>' =~ /[\w\d_\$,\[\] ="\/<>-]+/
    will match, taking else: with it

    And,
    q{<if condition="$vboptions[privacytitle]"><a href="$vboptions[privacyurl]">
    <else><tr><td> test here </if>}
    will fail because
    '<tr><td> test here </if>' =~ /[\w\d_\$,\[\] ="\/<>-]+/
    will match, taking fi: with it

    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    The regular expressions (as you have them there) are independent.
    They are not set up to backtrack.
    I think that backtracking is available as a more advanced production concept,
    however this can't be done with something as trivial as
    /[\w\d_\$,\[\] ="\/<>-]+/
    Indeed, the whole realm of discreet, character level parsing is needed for markup.

    If however, you are in control of creating the input data, just fashion it so
    that known delimeters are inserted where necessary. Then you can generate the correct
    html, or whatever it is you are doing.

    -sln
     
    , Nov 12, 2010
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mitchua
    Replies:
    1
    Views:
    7,113
    Ice Demon
    Jul 15, 2003
  2. ZOCOR

    XML Parser VS HTML Parser

    ZOCOR, Oct 3, 2004, in forum: Java
    Replies:
    11
    Views:
    831
    Paul King
    Oct 5, 2004
  3. Zach Dennis

    HTML-Parser / SGML-Parser

    Zach Dennis, Oct 1, 2003, in forum: Ruby
    Replies:
    5
    Views:
    423
    Bernard Delmée
    Oct 1, 2003
  4. Guest
    Replies:
    1
    Views:
    697
    Ben Morrow
    Oct 12, 2004
  5. Chris
    Replies:
    9
    Views:
    174
    Chris
    Mar 1, 2006
Loading...

Share This Page