regex to extract color guide from html

C

cp

I copied a webpage that had a color guide that I liked. I wanted to extract
the color names and codes and make a list of name alternating with code,
which, of course, could be made into a hash or saved in a file or whatever.
Below is some random clippings from the html so you can see what I am
working with. Below that is the foreach loop that goes through and looks
for the color name and color code. The html file is already loaded into
@data. I thought that it worked fine until I realized that some colors
were missed. I then observed that the first color to be picked up on a
line was picked up but the remaining colors on the same line were skipped.
I thought that adding the g modifier at the end of the regex would fix it
but it produced the same exact output. Any suggestions would be greatly
appreciated.


class=s><br>&nbsp;<td>mediumseagreen (<a href="colorsvg.html">SVG</a>)
#3CB371<td bgcolor="#3CB371" class=s><td>gray24 #3D3D3D<td
bgcolor="#3D3D3D" class=s>^M
<tr align=right><td>cobalt #3D59AB<td bgcolor="#3D59AB"
class=s><br>&nbsp;<td>cobaltgreen #3D9140<td bgcolor="#3D9140"
class=s><td>gray25 #404040<td bgcolor="#404040" class=s>^M

<tr align=right><td>dodgerblue4 #104E8B<td bgcolor="#104E8B"
class=s><br>&nbsp;<td>ultramarine #120A8F<td bgcolor="#120A8F"
class=s><td>gray7 #121212<td bgcolor="#121212" class=s>^M


foreach(@data)
{
next if not /td\>(\S+\s?\S*)\s*(\#[[:xdigit:]]+)\<td/g;
my $s1 = "$1\n";
my $s2 = "$2\n";
push @output,($s1,$s2);
}
 
G

Gunnar Hjalmarsson

cp said:
I then observed that the first color to be picked up on a line was
picked up but the remaining colors on the same line were skipped. I
thought that adding the g modifier at the end of the regex would fix
it but it produced the same exact output.

foreach(@data)
{
next if not /td\>(\S+\s?\S*)\s*(\#[[:xdigit:]]+)\<td/g;
my $s1 = "$1\n";
my $s2 = "$2\n";
push @output,($s1,$s2);
}

You are assigning $s1 and $s2 only once per line, so only the last pair
on respective line is added to @output.

One possible solution is to process each line in a while loop:

foreach(@data) {
while (/td>(\S+\s?\S*)\s*(#[[:xdigit:]]+)<td/g) {
my $s1 = "$1\n";
my $s2 = "$2\n";
push @output,($s1,$s2);
}
}

But what happens if the color name and color code are on different
lines? A better solution is to slurp the whole file as one string into a
scalar variable, and drop the foreach loop:

my $data = do { local $/; <FILE> };
while ($data =~ /td>(\S+\s?\S*)\s*(#[[:xdigit:]]+)<td/g) {
...
 
C

cp

Gunnar said:
cp said:
I then observed that the first color to be picked up on a line was
picked up but the remaining colors on the same line were skipped. I
thought that adding the g modifier at the end of the regex would fix
it but it produced the same exact output.

foreach(@data)
{
next if not /td\>(\S+\s?\S*)\s*(\#[[:xdigit:]]+)\<td/g;
my $s1 = "$1\n";
my $s2 = "$2\n";
push @output,($s1,$s2);
}

You are assigning $s1 and $s2 only once per line, so only the last pair
on respective line is added to @output.

One possible solution is to process each line in a while loop:

foreach(@data) {
while (/td>(\S+\s?\S*)\s*(#[[:xdigit:]]+)<td/g) {
my $s1 = "$1\n";
my $s2 = "$2\n";
push @output,($s1,$s2);
}
}

But what happens if the color name and color code are on different
lines? A better solution is to slurp the whole file as one string into a
scalar variable, and drop the foreach loop:

my $data = do { local $/; <FILE> };
while ($data =~ /td>(\S+\s?\S*)\s*(#[[:xdigit:]]+)<td/g) {
...

Thanks to all for helpful advice. I followed and now have 570 named colors
instead of the 243 I had before! I did finally go with the all in one
string solution. I am going to send them up to my website now...Thanks
again!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top