can some one please explain this regex?!

Discussion in 'Perl Misc' started by Geoff Cox, Dec 7, 2003.

  1. Geoff Cox

    Geoff Cox Guest

    Hello,

    this comes from my posting re how to match more than 1 line (from
    Gunnar) but would appreciate any one just explaining what is matched
    as the code does not work for me. If I could learn from this I could
    probably sort it out for myself ..

    Thanks

    Geoff

    if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
    .+?
    Address.+?<TD[^>]+>([^<]+)
    /isx ) {
     
    Geoff Cox, Dec 7, 2003
    #1
    1. Advertising

  2. Geoff Cox

    Matt Garrish Guest

    "Geoff Cox" <> wrote in message
    news:...
    > Hello,
    >
    > this comes from my posting re how to match more than 1 line (from
    > Gunnar) but would appreciate any one just explaining what is matched
    > as the code does not work for me. If I could learn from this I could
    > probably sort it out for myself ..
    >
    >


    To break it down piece by piece:

    /Head\s+Teacher.+?<TD[^>]+>([^<]+).+?Address.+?<TD[^>]+>([^<]+)/is

    matches "head" (you have the /i switch on, so it will match any case)
    followed by one or more whitespace characters, followed by "teacher",
    followed by one or more characters up to an opening <td. You then have a
    negated character class, so it will match all text up to the next closing >,
    and then another negated character class will match and capture anything up
    to the next opening <.

    I imagine this might be where your problem is. None of your match patterns
    allow for zero occurrences, which means that there has to be at least one
    character between the <td and closing >. In other words, your pattern would
    never match <td>, but only something like <td class="foo">.

    Moving on, you then have two non-greedy matches (.+?). The first will match
    anything up to "address" and the second will match anything up to the next
    <td. The regex then repeats itself with the two negated classes: one looking
    for the end of the <td> and the other capturing everything up to the next
    opening <. And once again, your pattern will fail unless there is at least
    one character between the <td and >.

    (I removed the /x from your original posting because it just allows
    whitespace and comments in your regex, which didn't help the readability of
    it, in my opinion of course.)

    Matt
     
    Matt Garrish, Dec 7, 2003
    #2
    1. Advertising

  3. Geoff Cox

    Geoff Cox Guest

    On Sun, 07 Dec 2003 18:02:07 GMT, Geoff Cox
    <> wrote:

    I should have made things a bit clearer - so here is the whole code
    and a sample of html which it is to work on .. can any one see why it
    doesn't get the name and address info?!

    Cheers

    Geoff


    My code is as follows but it does not work!

    ---------------------------
    use strict;

    print ("name of html file?\n");
    my $namehtml = <STDIN>;

    print ("name of email list file?\n");
    my $newhtml = <STDIN>;


    open(IN, "$namehtml");
    open(OUT, ">>$newhtml");

    my $line = <IN>;

    while (defined($line=<IN>)) {
    # if ($line =~ /&nbsp;&nbsp;(.*?)<\/H6>/i) {
    # print OUT ("$1 \n");
    # }

    if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
    .+?
    Address.+?<TD[^>]+>([^<]+)
    /isx ) {
    print OUT ("Name: $1\nAddress: $2\n");
    }

    }

    close (IN);
    close (OUT);

    -----------------------------

    which is working on for example


    <TD align=left width="20%" colSpan=2><B>Head Teacher</B></TD>
    <TD vAlign=top width="80%" colSpan=2>Fred Green</TD></TR>
    <TR>
    <TD align=left width="20%" colSpan=2><B>Address</B></TD>
    <TD vAlign=top width="80%" colSpan=2>Park Road, Northgate,
    London N88 5XX</TD></TR>


    Cheers

    Geoff
     
    Geoff Cox, Dec 7, 2003
    #3
  4. Geoff Cox

    Bob Walton Guest

    Geoff Cox wrote:

    > On Sun, 07 Dec 2003 18:02:07 GMT, Geoff Cox
    > <> wrote:
    >
    > I should have made things a bit clearer - so here is the whole code
    > and a sample of html which it is to work on .. can any one see why it
    > doesn't get the name and address info?!
    >
    > Cheers
    >
    > Geoff
    >
    >
    > My code is as follows but it does not work!


    -------------------------------^^^^^^^^^^^^^
    A much more specific description of what your code does/doesn't do it
    called for in a newsgroup posting. Please state exactly what it does
    that it shouldn't do, or what it doesn't do that it should do. "Doesn't
    work" is next to meaningless -- we can't read your mind.


    >
    > ---------------------------
    > use strict;


    use warnings;


    >
    > print ("name of html file?\n");
    > my $namehtml = <STDIN>;
    >
    > print ("name of email list file?\n");
    > my $newhtml = <STDIN>;
    >
    >
    > open(IN, "$namehtml");
    > open(OUT, ">>$newhtml");
    >
    > my $line = <IN>;


    Since you didn't modify $/, this will read only one line. I think
    that's your fundamental problem. Try:

    my $line;
    {local $/;$line=<IN>} #slurp the input

    and see if that works better.


    >
    > while (defined($line=<IN>)) {


    Here you are reading the rest of the lines of filehandle IN, but one at
    a time. You will have skipped the first line (which was read above).
    If you slurp the input, you should get rid of the while loop.


    > # if ($line =~ /&nbsp;&nbsp;(.*?)<\/H6>/i) {
    > # print OUT ("$1 \n");
    > # }
    >
    > if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
    > .+?
    > Address.+?<TD[^>]+>([^<]+)
    > /isx ) {
    > print OUT ("Name: $1\nAddress: $2\n");
    > }
    >
    > }
    >
    > close (IN);
    > close (OUT);
    >
    > -----------------------------
    >
    > which is working on for example
    >
    >
    > <TD align=left width="20%" colSpan=2><B>Head Teacher</B></TD>
    > <TD vAlign=top width="80%" colSpan=2>Fred Green</TD></TR>
    > <TR>
    > <TD align=left width="20%" colSpan=2><B>Address</B></TD>
    > <TD vAlign=top width="80%" colSpan=2>Park Road, Northgate,
    > London N88 5XX</TD></TR>

    ....


    > Geoff


    Yes: you read the first line of your file, and throw it away. That was
    the line with Teacher etc in it. But even if you didn't do that, the
    remainder of the lines are read one at a time, and no one line contains
    enough stuff to match your pattern. Slurp it all, and your pattern
    might match. Here is a slightly modified standalone copy/paste/execute
    style copy of your program that looks like it might "work":

    use strict;
    use warnings;
    #print ("name of html file?\n");
    #my $namehtml = <STDIN>;

    #print ("name of email list file?\n");
    #my $newhtml = <STDIN>;


    #open(IN, "$namehtml");
    #open(OUT, ">>$newhtml");

    my $line;
    {local $/;$line = <DATA>} #slurp the file

    #while (defined($line=<DATA>)) {
    # if ($line =~ /&nbsp;&nbsp;(.*?)<\/H6>/i) {
    # print OUT ("$1 \n");
    # }
    if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
    .+?
    Address.+?<TD[^>]+>([^<]+)
    /isx ) {
    print ("Name: $1\nAddress: $2\n");
    }

    #}

    #close (IN);
    #close (OUT);

    __END__
    <TD align=left width="20%" colSpan=2><B>Head Teacher</B></TD>
    <TD vAlign=top width="80%" colSpan=2>Fred Green</TD></TR>
    <TR>
    <TD align=left width="20%" colSpan=2><B>Address</B></TD>
    <TD vAlign=top width="80%" colSpan=2>Park Road, Northgate,
    London N88 5XX</TD></TR>

    HTH.
    --
    Bob Walton
    Email: http://bwalton.com/cgi-bin/emailbob.pl
     
    Bob Walton, Dec 7, 2003
    #4
  5. Geoff Cox wrote:
    > here is the whole code and a sample of html which it is to work on


    And, as I suspected, the problem has nothing to do with the regex...
    Read Bob's explanation carefully!

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Dec 7, 2003
    #5
  6. Matt Garrish wrote:
    > Geoff Cox wrote:
    >> this comes from my posting re how to match more than 1 line (from
    >> Gunnar) but would appreciate any one just explaining what is
    >> matched as the code does not work for me. If I could learn from
    >> this I could probably sort it out for myself ..

    >
    > To break it down piece by piece:
    >
    > /Head\s+Teacher.+?<TD[^>]+>([^<]+).+?Address.+?<TD[^>]+>([^<]+)/is


    <snip>

    > I imagine this might be where your problem is. None of your match
    > patterns allow for zero occurrences, which means that there has to
    > be at least one character between the <td and closing >. In other
    > words, your pattern would never match <td>, but only something like
    > <td class="foo">.


    Yeah, you are right, of course. Both the occurrences of

    <TD[^>]+>

    should better be

    <TD[^>]*>

    (But, as explained in other posts, that limitation was not the reason
    why OP's code didn't "work".)

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Dec 7, 2003
    #6
  7. Geoff Cox

    Geoff Cox Guest

    On Sun, 07 Dec 2003 19:53:03 GMT, Bob Walton
    <> wrote:

    Bob,

    many thanks for your thoughts - the following code gets the first set
    of name/address data but stops at that point - 'afraid I haven't used
    your form of slurp before and do not see how to move through the rest
    of the file containing the name/address data?

    Geoff

    use strict;
    use warnings;
    print ("name of html file?\n");
    my $namehtml = <STDIN>;

    print ("name of email list file?\n");
    my $newhtml = <STDIN>;


    open(DATA, "$namehtml");
    open(OUT, ">>$newhtml");

    my $line;
    {local $/;$line = <DATA>} #slurp the file

    #while (defined($line=<DATA>)) {
    # if ($line =~ /&nbsp;&nbsp;(.*?)<\/H6>/i) {
    # print OUT ("$1 \n");
    # }
    if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
    .+?
    Address.+?<TD[^>]+>([^<]+)
    /isx ) {
    print OUT ("Name: $1\nAddress: $2\n");
    }

    #}

    close (IN);
    close (OUT);




    >Geoff Cox wrote:
    >
    >> On Sun, 07 Dec 2003 18:02:07 GMT, Geoff Cox
    >> <> wrote:
    >>
    >> I should have made things a bit clearer - so here is the whole code
    >> and a sample of html which it is to work on .. can any one see why it
    >> doesn't get the name and address info?!
    >>
    >> Cheers
    >>
    >> Geoff
    >>
    >>
    >> My code is as follows but it does not work!

    >
    >-------------------------------^^^^^^^^^^^^^
    >A much more specific description of what your code does/doesn't do it
    >called for in a newsgroup posting. Please state exactly what it does
    >that it shouldn't do, or what it doesn't do that it should do. "Doesn't
    >work" is next to meaningless -- we can't read your mind.
    >
    >
    >>
    >> ---------------------------
    >> use strict;

    >
    >use warnings;
    >
    >
    >>
    >> print ("name of html file?\n");
    >> my $namehtml = <STDIN>;
    >>
    >> print ("name of email list file?\n");
    >> my $newhtml = <STDIN>;
    >>
    >>
    >> open(IN, "$namehtml");
    >> open(OUT, ">>$newhtml");
    >>
    >> my $line = <IN>;

    >
    >Since you didn't modify $/, this will read only one line. I think
    >that's your fundamental problem. Try:
    >
    > my $line;
    > {local $/;$line=<IN>} #slurp the input
    >
    >and see if that works better.
    >
    >
    >>
    >> while (defined($line=<IN>)) {

    >
    >Here you are reading the rest of the lines of filehandle IN, but one at
    >a time. You will have skipped the first line (which was read above).
    >If you slurp the input, you should get rid of the while loop.
    >
    >
    >> # if ($line =~ /&nbsp;&nbsp;(.*?)<\/H6>/i) {
    >> # print OUT ("$1 \n");
    >> # }
    >>
    >> if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
    >> .+?
    >> Address.+?<TD[^>]+>([^<]+)
    >> /isx ) {
    >> print OUT ("Name: $1\nAddress: $2\n");
    >> }
    >>
    >> }
    >>
    >> close (IN);
    >> close (OUT);
    >>
    >> -----------------------------
    >>
    >> which is working on for example
    >>
    >>
    >> <TD align=left width="20%" colSpan=2><B>Head Teacher</B></TD>
    >> <TD vAlign=top width="80%" colSpan=2>Fred Green</TD></TR>
    >> <TR>
    >> <TD align=left width="20%" colSpan=2><B>Address</B></TD>
    >> <TD vAlign=top width="80%" colSpan=2>Park Road, Northgate,
    >> London N88 5XX</TD></TR>

    >...
    >
    >
    >> Geoff

    >
    >Yes: you read the first line of your file, and throw it away. That was
    >the line with Teacher etc in it. But even if you didn't do that, the
    >remainder of the lines are read one at a time, and no one line contains
    >enough stuff to match your pattern. Slurp it all, and your pattern
    >might match. Here is a slightly modified standalone copy/paste/execute
    >style copy of your program that looks like it might "work":
    >
    >use strict;
    >use warnings;
    >#print ("name of html file?\n");
    >#my $namehtml = <STDIN>;
    >
    >#print ("name of email list file?\n");
    >#my $newhtml = <STDIN>;
    >
    >
    >#open(IN, "$namehtml");
    >#open(OUT, ">>$newhtml");
    >
    >my $line;
    >{local $/;$line = <DATA>} #slurp the file
    >
    >#while (defined($line=<DATA>)) {
    ># if ($line =~ /&nbsp;&nbsp;(.*?)<\/H6>/i) {
    ># print OUT ("$1 \n");
    ># }
    > if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
    > .+?
    > Address.+?<TD[^>]+>([^<]+)
    > /isx ) {
    > print ("Name: $1\nAddress: $2\n");
    > }
    >
    >#}
    >
    >#close (IN);
    >#close (OUT);
    >
    >__END__
    > <TD align=left width="20%" colSpan=2><B>Head Teacher</B></TD>
    ><TD vAlign=top width="80%" colSpan=2>Fred Green</TD></TR>
    ><TR>
    ><TD align=left width="20%" colSpan=2><B>Address</B></TD>
    ><TD vAlign=top width="80%" colSpan=2>Park Road, Northgate,
    >London N88 5XX</TD></TR>
    >
    >HTH.
     
    Geoff Cox, Dec 7, 2003
    #7
  8. Geoff Cox

    Geoff Cox Guest

    On Sun, 07 Dec 2003 21:01:48 +0100, Gunnar Hjalmarsson
    <> wrote:

    >Geoff Cox wrote:
    >> here is the whole code and a sample of html which it is to work on

    >
    >And, as I suspected, the problem has nothing to do with the regex...
    >Read Bob's explanation carefully!


    Gunnar

    must be almost there - I have posted my version based on Bob's code
    .... but it only gets the first name/address info - not clear how I
    move through the rest of the file?

    by the way - your code seems to work fine minus my suggestion re the
    additional < ?!

    Cheers

    Geoff
     
    Geoff Cox, Dec 7, 2003
    #8
  9. Geoff Cox

    Geoff Cox Guest

    On Sun, 7 Dec 2003 14:24:23 -0500, "Matt Garrish"
    <> wrote:

    >
    >"Geoff Cox" <> wrote in message
    >news:...
    >> Hello,
    >>
    >> this comes from my posting re how to match more than 1 line (from
    >> Gunnar) but would appreciate any one just explaining what is matched
    >> as the code does not work for me. If I could learn from this I could
    >> probably sort it out for myself ..
    >>
    >>

    >
    >To break it down piece by piece:


    Matt,

    many thanks - will read in a minute - but you might like to look at
    following code - thsi works OK except that it only gets the first set
    of name/address data - I do not see at the moment how to move along
    the slurped input to get the other sets of name/address info ..? any
    ideas?! Cheers Geoff

    use strict;
    use warnings;
    print ("name of html file?\n");
    my $namehtml = <STDIN>;

    print ("name of email list file?\n");
    my $newhtml = <STDIN>;


    open(DATA, "$namehtml");
    open(OUT, ">>$newhtml");

    my $line;
    {local $/;$line = <DATA>} #slurp the file

    #while (defined($line=<DATA>)) {
    # if ($line =~ /&nbsp;&nbsp;(.*?)<\/H6>/i) {
    # print OUT ("$1 \n");
    # }
    if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
    .+?
    Address.+?<TD[^>]+>([^<]+)
    /isx ) {
    print OUT ("Name: $1\nAddress: $2\n");
    }

    #}

    close (DATA);
    close (OUT);




    >
    >/Head\s+Teacher.+?<TD[^>]+>([^<]+).+?Address.+?<TD[^>]+>([^<]+)/is
    >
    >matches "head" (you have the /i switch on, so it will match any case)
    >followed by one or more whitespace characters, followed by "teacher",
    >followed by one or more characters up to an opening <td. You then have a
    >negated character class, so it will match all text up to the next closing >,
    >and then another negated character class will match and capture anything up
    >to the next opening <.
    >
    >I imagine this might be where your problem is. None of your match patterns
    >allow for zero occurrences, which means that there has to be at least one
    >character between the <td and closing >. In other words, your pattern would
    >never match <td>, but only something like <td class="foo">.
    >
    >Moving on, you then have two non-greedy matches (.+?). The first will match
    >anything up to "address" and the second will match anything up to the next
    ><td. The regex then repeats itself with the two negated classes: one looking
    >for the end of the <td> and the other capturing everything up to the next
    >opening <. And once again, your pattern will fail unless there is at least
    >one character between the <td and >.
    >
    >(I removed the /x from your original posting because it just allows
    >whitespace and comments in your regex, which didn't help the readability of
    >it, in my opinion of course.)
    >
    >Matt
    >
     
    Geoff Cox, Dec 7, 2003
    #9
  10. Geoff Cox wrote:
    > Bob,
    >
    > many thanks for your thoughts - the following code gets the first
    > set of name/address data but stops at that point - 'afraid I
    > haven't used your form of slurp before and do not see how to move
    > through the rest of the file containing the name/address data?


    Well, you haven't told us before that there are more than one
    name/address pair.

    > if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)


    Try to change that to

    while ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
    ----^^^^^

    > /isx ) {


    and that to

    /gisx ) {
    -------------------^

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Dec 7, 2003
    #10
  11. Geoff Cox

    Geoff Cox Guest

    On Sun, 07 Dec 2003 20:58:19 GMT, Geoff Cox
    <> wrote:

    >On Sun, 07 Dec 2003 19:53:03 GMT, Bob Walton
    ><> wrote:
    >
    >Bob,
    >
    >many thanks for your thoughts - the following code gets the first set
    >of name/address data but stops at that point - 'afraid I haven't used
    >your form of slurp before and do not see how to move through the rest
    >of the file containing the name/address data?


    Obvious really !! just need to use while instead of if and add the g
    option ..

    Thanks everyone for all the help!

    Cheers

    Geoff



    >
    >Geoff
    >
    >use strict;
    >use warnings;
    >print ("name of html file?\n");
    >my $namehtml = <STDIN>;
    >
    >print ("name of email list file?\n");
    >my $newhtml = <STDIN>;
    >
    >
    >open(DATA, "$namehtml");
    >open(OUT, ">>$newhtml");
    >
    >my $line;
    >{local $/;$line = <DATA>} #slurp the file
    >
    >#while (defined($line=<DATA>)) {
    ># if ($line =~ /&nbsp;&nbsp;(.*?)<\/H6>/i) {
    ># print OUT ("$1 \n");
    ># }
    > if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
    > .+?
    > Address.+?<TD[^>]+>([^<]+)
    > /isx ) {
    > print OUT ("Name: $1\nAddress: $2\n");
    > }
    >
    >#}
    >
    >close (IN);
    >close (OUT);
    >
    >
    >
    >
    >>Geoff Cox wrote:
    >>
    >>> On Sun, 07 Dec 2003 18:02:07 GMT, Geoff Cox
    >>> <> wrote:
    >>>
    >>> I should have made things a bit clearer - so here is the whole code
    >>> and a sample of html which it is to work on .. can any one see why it
    >>> doesn't get the name and address info?!
    >>>
    >>> Cheers
    >>>
    >>> Geoff
    >>>
    >>>
    >>> My code is as follows but it does not work!

    >>
    >>-------------------------------^^^^^^^^^^^^^
    >>A much more specific description of what your code does/doesn't do it
    >>called for in a newsgroup posting. Please state exactly what it does
    >>that it shouldn't do, or what it doesn't do that it should do. "Doesn't
    >>work" is next to meaningless -- we can't read your mind.
    >>
    >>
    >>>
    >>> ---------------------------
    >>> use strict;

    >>
    >>use warnings;
    >>
    >>
    >>>
    >>> print ("name of html file?\n");
    >>> my $namehtml = <STDIN>;
    >>>
    >>> print ("name of email list file?\n");
    >>> my $newhtml = <STDIN>;
    >>>
    >>>
    >>> open(IN, "$namehtml");
    >>> open(OUT, ">>$newhtml");
    >>>
    >>> my $line = <IN>;

    >>
    >>Since you didn't modify $/, this will read only one line. I think
    >>that's your fundamental problem. Try:
    >>
    >> my $line;
    >> {local $/;$line=<IN>} #slurp the input
    >>
    >>and see if that works better.
    >>
    >>
    >>>
    >>> while (defined($line=<IN>)) {

    >>
    >>Here you are reading the rest of the lines of filehandle IN, but one at
    >>a time. You will have skipped the first line (which was read above).
    >>If you slurp the input, you should get rid of the while loop.
    >>
    >>
    >>> # if ($line =~ /&nbsp;&nbsp;(.*?)<\/H6>/i) {
    >>> # print OUT ("$1 \n");
    >>> # }
    >>>
    >>> if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
    >>> .+?
    >>> Address.+?<TD[^>]+>([^<]+)
    >>> /isx ) {
    >>> print OUT ("Name: $1\nAddress: $2\n");
    >>> }
    >>>
    >>> }
    >>>
    >>> close (IN);
    >>> close (OUT);
    >>>
    >>> -----------------------------
    >>>
    >>> which is working on for example
    >>>
    >>>
    >>> <TD align=left width="20%" colSpan=2><B>Head Teacher</B></TD>
    >>> <TD vAlign=top width="80%" colSpan=2>Fred Green</TD></TR>
    >>> <TR>
    >>> <TD align=left width="20%" colSpan=2><B>Address</B></TD>
    >>> <TD vAlign=top width="80%" colSpan=2>Park Road, Northgate,
    >>> London N88 5XX</TD></TR>

    >>...
    >>
    >>
    >>> Geoff

    >>
    >>Yes: you read the first line of your file, and throw it away. That was
    >>the line with Teacher etc in it. But even if you didn't do that, the
    >>remainder of the lines are read one at a time, and no one line contains
    >>enough stuff to match your pattern. Slurp it all, and your pattern
    >>might match. Here is a slightly modified standalone copy/paste/execute
    >>style copy of your program that looks like it might "work":
    >>
    >>use strict;
    >>use warnings;
    >>#print ("name of html file?\n");
    >>#my $namehtml = <STDIN>;
    >>
    >>#print ("name of email list file?\n");
    >>#my $newhtml = <STDIN>;
    >>
    >>
    >>#open(IN, "$namehtml");
    >>#open(OUT, ">>$newhtml");
    >>
    >>my $line;
    >>{local $/;$line = <DATA>} #slurp the file
    >>
    >>#while (defined($line=<DATA>)) {
    >># if ($line =~ /&nbsp;&nbsp;(.*?)<\/H6>/i) {
    >># print OUT ("$1 \n");
    >># }
    >> if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
    >> .+?
    >> Address.+?<TD[^>]+>([^<]+)
    >> /isx ) {
    >> print ("Name: $1\nAddress: $2\n");
    >> }
    >>
    >>#}
    >>
    >>#close (IN);
    >>#close (OUT);
    >>
    >>__END__
    >> <TD align=left width="20%" colSpan=2><B>Head Teacher</B></TD>
    >><TD vAlign=top width="80%" colSpan=2>Fred Green</TD></TR>
    >><TR>
    >><TD align=left width="20%" colSpan=2><B>Address</B></TD>
    >><TD vAlign=top width="80%" colSpan=2>Park Road, Northgate,
    >>London N88 5XX</TD></TR>
    >>
    >>HTH.
     
    Geoff Cox, Dec 7, 2003
    #11
  12. Geoff Cox

    Geoff Cox Guest

    On Sun, 07 Dec 2003 22:48:43 +0100, Gunnar Hjalmarsson
    <> wrote:

    >Geoff Cox wrote:
    >> Bob,
    >>
    >> many thanks for your thoughts - the following code gets the first
    >> set of name/address data but stops at that point - 'afraid I
    >> haven't used your form of slurp before and do not see how to move
    >> through the rest of the file containing the name/address data?

    >
    >Well, you haven't told us before that there are more than one
    >name/address pair.


    Gunnar,

    sorry - I thought I had made it clear that the text I'd given was just
    a sample of the file ... any way - all's wel that ends well !

    Many thanks for all your help. I've learnt quite a bit tonight!

    Cheers

    Geoff

    >
    >> if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)

    >
    >Try to change that to
    >
    > while ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
    >----^^^^^
    >
    >> /isx ) {

    >
    >and that to
    >
    > /gisx ) {
    >-------------------^
     
    Geoff Cox, Dec 7, 2003
    #12
  13. Geoff Cox <> wrote:

    > open(DATA, "$namehtml");

    ^ ^

    perldoc -q vars

    What's wrong with always quoting "$vars"?


    The DATA filehandle is special, I leave it for its special uses,
    choose some other name.


    You should always, yes *always*, check the return value from open():

    open(NAME, $namehtml) or die "could not open '$namehtml' $!";




    [ snip 150 lines of full-quote. Please stop doing that. Soon. ]

    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Dec 7, 2003
    #13
  14. Geoff Cox

    Geoff Cox Guest

    On Sun, 7 Dec 2003 17:59:08 -0600, (Tad
    McClellan) wrote:

    >Geoff Cox <> wrote:
    >
    >> open(DATA, "$namehtml");

    > ^ ^
    >
    > perldoc -q vars
    >
    > What's wrong with always quoting "$vars"?


    Tad, not sure what you mean above?


    >The DATA filehandle is special, I leave it for its special uses,
    >choose some other name.


    OK will do.

    >You should always, yes *always*, check the return value from open():
    >
    > open(NAME, $namehtml) or die "could not open '$namehtml' $!";


    Thanks for the reminder.

    >[ snip 150 lines of full-quote. Please stop doing that. Soon. ]


    ditto..

    Geoff
     
    Geoff Cox, Dec 8, 2003
    #14
  15. Geoff Cox

    Tintin Guest

    "Geoff Cox" <> wrote in message
    news:...
    > On Sun, 7 Dec 2003 17:59:08 -0600, (Tad
    > McClellan) wrote:
    >
    > >Geoff Cox <> wrote:
    > >
    > >> open(DATA, "$namehtml");

    > > ^ ^
    > >
    > > perldoc -q vars
    > >
    > > What's wrong with always quoting "$vars"?

    >
    > Tad, not sure what you mean above?


    Have you read what it says? Is there something you don't understand in the
    documentation?
     
    Tintin, Dec 8, 2003
    #15
  16. Geoff Cox

    Geoff Cox Guest

    On Mon, 8 Dec 2003 22:28:36 +1300, "Tintin" <> wrote:

    >
    >"Geoff Cox" <> wrote in message
    >news:...
    >> On Sun, 7 Dec 2003 17:59:08 -0600, (Tad
    >> McClellan) wrote:
    >>
    >> >Geoff Cox <> wrote:
    >> >
    >> >> open(DATA, "$namehtml");
    >> > ^ ^
    >> >
    >> > perldoc -q vars
    >> >
    >> > What's wrong with always quoting "$vars"?


    I assuem you mean simply that there is no need to have the quotes
    round $vars if it is on its own?

    Geoff



    >>
    >> Tad, not sure what you mean above?

    >
    >Have you read what it says? Is there something you don't understand in the
    >documentation?
    >
     
    Geoff Cox, Dec 8, 2003
    #16
  17. Geoff Cox <> wrote:

    >>> >> open(DATA, "$namehtml");
    >>> > ^ ^
    >>> >
    >>> > perldoc -q vars
    >>> >
    >>> > What's wrong with always quoting "$vars"?

    >
    > I assuem you mean simply that there is no need to have the quotes
    > round $vars if it is on its own?



    Yes.


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Dec 9, 2003
    #17
  18. Geoff Cox <> writes:
    >>> On Sun, 7 Dec 2003 17:59:08 -0600, (Tad
    >>> McClellan) wrote:
    >>>
    >>> >Geoff Cox <> wrote:
    >>> >
    >>> >> open(DATA, "$namehtml");
    >>> > ^ ^
    >>> >
    >>> > perldoc -q vars
    >>> >
    >>> > What's wrong with always quoting "$vars"?

    >
    > I assuem you mean simply that there is no need to have the quotes
    > round $vars if it is on its own?


    No, he means you should run 'perldoc -q vars' in a shell window, and
    read the answer titled 'What's wrong with always quoting "$vars"?'.
    It sounds like you should probably run 'perldoc perldoc' as well.
    Perl comes with probably the best built-in documentation I've seen for
    a programming language, but it's useless if you don't spend a bit of
    time learning how to read it.

    -=Eric
    --
    Come to think of it, there are already a million monkeys on a million
    typewriters, and Usenet is NOTHING like Shakespeare.
    -- Blair Houghton.
     
    Eric Schwartz, Dec 9, 2003
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Shapper
    Replies:
    3
    Views:
    421
    willow
    Jun 10, 2005
  2. Richard
    Replies:
    7
    Views:
    395
    Richard
    Jan 26, 2004
  3. raghukumar
    Replies:
    13
    Views:
    562
    raghukumar
    Nov 17, 2007
  4. Replies:
    3
    Views:
    797
    Reedick, Andrew
    Jul 1, 2008
  5. Kaye Ng
    Replies:
    8
    Views:
    183
    Josh Cheek
    Jun 8, 2010
Loading...

Share This Page