regex to extract color guide from html

Discussion in 'Perl Misc' started by cp, Oct 26, 2004.

  1. cp

    cp Guest

    I copied a webpage that had a color guide that I liked. I wanted to extract
    the color names and codes and make a list of name alternating with code,
    which, of course, could be made into a hash or saved in a file or whatever.
    Below is some random clippings from the html so you can see what I am
    working with. Below that is the foreach loop that goes through and looks
    for the color name and color code. The html file is already loaded into
    @data. I thought that it worked fine until I realized that some colors
    were missed. I then observed that the first color to be picked up on a
    line was picked up but the remaining colors on the same line were skipped.
    I thought that adding the g modifier at the end of the regex would fix it
    but it produced the same exact output. Any suggestions would be greatly
    appreciated.


    class=s><br>&nbsp;<td>mediumseagreen (<a href="colorsvg.html">SVG</a>)
    #3CB371<td bgcolor="#3CB371" class=s><td>gray24 #3D3D3D<td
    bgcolor="#3D3D3D" class=s>^M
    <tr align=right><td>cobalt #3D59AB<td bgcolor="#3D59AB"
    class=s><br>&nbsp;<td>cobaltgreen #3D9140<td bgcolor="#3D9140"
    class=s><td>gray25 #404040<td bgcolor="#404040" class=s>^M

    <tr align=right><td>dodgerblue4 #104E8B<td bgcolor="#104E8B"
    class=s><br>&nbsp;<td>ultramarine #120A8F<td bgcolor="#120A8F"
    class=s><td>gray7 #121212<td bgcolor="#121212" class=s>^M


    foreach(@data)
    {
    next if not /td\>(\S+\s?\S*)\s*(\#[[:xdigit:]]+)\<td/g;
    my $s1 = "$1\n";
    my $s2 = "$2\n";
    push @output,($s1,$s2);
    }



    --
    www.cherryplankton.com
     
    cp, Oct 26, 2004
    #1
    1. Advertising

  2. cp wrote:
    > I then observed that the first color to be picked up on a line was
    > picked up but the remaining colors on the same line were skipped. I
    > thought that adding the g modifier at the end of the regex would fix
    > it but it produced the same exact output.


    <snip>

    > foreach(@data)
    > {
    > next if not /td\>(\S+\s?\S*)\s*(\#[[:xdigit:]]+)\<td/g;
    > my $s1 = "$1\n";
    > my $s2 = "$2\n";
    > push @output,($s1,$s2);
    > }


    You are assigning $s1 and $s2 only once per line, so only the last pair
    on respective line is added to @output.

    One possible solution is to process each line in a while loop:

    foreach(@data) {
    while (/td>(\S+\s?\S*)\s*(#[[:xdigit:]]+)<td/g) {
    my $s1 = "$1\n";
    my $s2 = "$2\n";
    push @output,($s1,$s2);
    }
    }

    But what happens if the color name and color code are on different
    lines? A better solution is to slurp the whole file as one string into a
    scalar variable, and drop the foreach loop:

    my $data = do { local $/; <FILE> };
    while ($data =~ /td>(\S+\s?\S*)\s*(#[[:xdigit:]]+)<td/g) {
    ...

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Oct 26, 2004
    #2
    1. Advertising

  3. cp

    cp Guest

    Gunnar Hjalmarsson wrote:

    > cp wrote:
    >> I then observed that the first color to be picked up on a line was
    >> picked up but the remaining colors on the same line were skipped. I
    >> thought that adding the g modifier at the end of the regex would fix
    >> it but it produced the same exact output.

    >
    > <snip>
    >
    >> foreach(@data)
    >> {
    >> next if not /td\>(\S+\s?\S*)\s*(\#[[:xdigit:]]+)\<td/g;
    >> my $s1 = "$1\n";
    >> my $s2 = "$2\n";
    >> push @output,($s1,$s2);
    >> }

    >
    > You are assigning $s1 and $s2 only once per line, so only the last pair
    > on respective line is added to @output.
    >
    > One possible solution is to process each line in a while loop:
    >
    > foreach(@data) {
    > while (/td>(\S+\s?\S*)\s*(#[[:xdigit:]]+)<td/g) {
    > my $s1 = "$1\n";
    > my $s2 = "$2\n";
    > push @output,($s1,$s2);
    > }
    > }
    >
    > But what happens if the color name and color code are on different
    > lines? A better solution is to slurp the whole file as one string into a
    > scalar variable, and drop the foreach loop:
    >
    > my $data = do { local $/; <FILE> };
    > while ($data =~ /td>(\S+\s?\S*)\s*(#[[:xdigit:]]+)<td/g) {
    > ...
    >


    Thanks to all for helpful advice. I followed and now have 570 named colors
    instead of the 243 I had before! I did finally go with the all in one
    string solution. I am going to send them up to my website now...Thanks
    again!

    --
    www.cherryplankton.com
     
    cp, Oct 27, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. moondaddy
    Replies:
    3
    Views:
    38,732
    moondaddy
    Apr 28, 2004
  2. PJ6
    Replies:
    4
    Views:
    4,913
  3. Andrew Arace

    BGR Color to Java Color

    Andrew Arace, Sep 13, 2003, in forum: Java
    Replies:
    7
    Views:
    7,899
    Stephen Gilbert
    Sep 16, 2003
  4. Replies:
    3
    Views:
    807
    Reedick, Andrew
    Jul 1, 2008
  5. Kamaljeet Saini
    Replies:
    0
    Views:
    463
    Kamaljeet Saini
    Feb 13, 2009
Loading...

Share This Page