regular expressions problem

Shailesh Humbad · Dec 9, 2004

I want to parse the values from the second-to-last row in an html
table.

....
<tr class="odd">
<td style="text-align: right;" nowrap="nowrap">99</td>
<td style="text-align: right;" nowrap="nowrap">111</td>
<td style="text-align: right;" nowrap="nowrap">52255</td>
<td style="text-align: right;" nowrap="nowrap">333</td>
<td style="text-align: right;" nowrap="nowrap">2323</td>
</tr>
<tr class="totals">
....

I can identify the last row by the "totals" class. So I want the regex
to work backward from there and get the values in each of the cells of
the previous row. It should ignore all prior content and whitespace
between tags. Can anyone help? Here is what I have so far:
/([\s\S]*?)<tr class\=\"totals/

Keith Keller · Dec 9, 2004

I want to parse the values from the second-to-last row in an html
table.

Have you looked at the various HTML parsers available on CPAN? Doing
this with a regex is bound to cause problems. (I'm partial to
HTML::TreeBuilder, myself, but I'm sure that others can make additional
suggestions.)

--keith

Shailesh Humbad · Dec 9, 2004

Keith said:
Have you looked at the various HTML parsers available on CPAN? Doing
this with a regex is bound to cause problems. (I'm partial to
HTML::TreeBuilder, myself, but I'm sure that others can make additional
suggestions.)

--keith

Trouble is, I am using regular expressions in a VBScript file, so I
don't have any Perl support... Even then, the page is probably not
valid HTML. I could use multiple regular expressions in steps. At
least, is there a way to match from "<tr class=\"totals" to the
immediately previous "<tr"? From there I could figure it out. Maybe
I'll try searching within a reversed copy of the string.

Jürgen Exner · Dec 9, 2004

Shailesh said:
I want to parse the values from the second-to-last row in an html
table.

...
<tr class="odd">
<td style="text-align: right;" nowrap="nowrap">99</td>
<td style="text-align: right;" nowrap="nowrap">111</td>
<td style="text-align: right;" nowrap="nowrap">52255</td>
<td style="text-align: right;" nowrap="nowrap">333</td>
<td style="text-align: right;" nowrap="nowrap">2323</td>
</tr>
<tr class="totals">

As has been mentioned here _very_ frequently parsing HTML correctly using
REs is insane. It hasn't even been proven if the extended REs in Perl would
be powerful enough to do it (normal REs are definitely not sufficient!), let
alone finding a usable RE to do it.

Use an HTML parser to parse HTML. There are several on CPAN.
And please read the FAQ before asking frequently asked questions (perldoc -q
"remove HTML").

jue

Sherm Pendley · Dec 9, 2004

Shailesh said:
Trouble is, I am using regular expressions in a VBScript file, so I
don't have any Perl support...

The VBScript group is down the hall on your left. Don't let the door hit
you on the way out.

sherm--

Tad McClellan · Dec 9, 2004

Shailesh Humbad said:
I want to parse the values from the second-to-last row in an html
table.

use HTML::TableExtract;

Scott Bryce · Dec 9, 2004

Sherm said:
The VBScript group is down the hall on your left. Don't let the door hit
you on the way out.

Which, when translated, means...

Regular expressions in VBScript are different than regular expressions
in Perl. Any help we give you may not carry over into VBScript. Asking
in a Perl newsgroup about programming in VBScript is a waste of our time
and yours.

Bill Karwin · Dec 9, 2004

Shailesh said:
Trouble is, I am using regular expressions in a VBScript file, so I
don't have any Perl support... Even then, the page is probably not
valid HTML.

There are XML & HTML parsers for Microsoft languages. You'll be much
more successful using something like that than trying to create a custom
regular expression. These types of problems tend to mutate, and very
quickly any regular expression(s) you create will not be appropriate for
the task. Better to use the right tool for the job.

Here's an introduction to the Microsoft XML parser, which supports
several languages including VBScript and Perl (see? on topic! ;-)

http://www.w3schools.com/dom/dom_parser.asp

Regards,
Bill K.

Shailesh Humbad · Dec 10, 2004

Ask for regex help in a VBScript forum? Cmon. Besides, my OP didn't
mention VBScript, but seeked a regex solution. Anyway, I solved it on
my own, and I present it here in Perl for those pedants who would
rather complain about formalities than help someone.

#!/usr/bin/perl -W

$TestString = qq{
<td style="text-align: right;" nowrap="nowrap">433</td>
</tr>
<tr class="odd">
<td style="text-align: right;" nowrap="nowrap">99</td>
<td style="text-align: right;" nowrap="nowrap">111</td>
<td style="text-align: right;" nowrap="nowrap">52255</td>
<td style="text-align: right;" nowrap="nowrap">333</td>
<td style="text-align: right;" nowrap="nowrap">2323</td>
</tr>
<tr class="totals">
<td style="text-align: right;" nowrap="nowrap">122</td>
};

# get the second-to-last row
$TestString = reverse($TestString);
$TestString =~ m/slatot\"=ssalc rt<\s*>rt\/<([\s\S]*?)>rt\/</gi;
$LastRow = reverse($1);
print $LastRow."\n";

# Get the columns in the second-to-last row
$LastRow =~ m/\s*<tr[\s\S]*?<td[\s\S]*?>([\s\S]*?)<\/td>
\s*<td[\s\S]*?>([\s\S]*?)<\/td>
\s*<td[\s\S]*?>([\s\S]*?)<\/td>/gix;
print $1."\n";
print $2."\n";
print $3."\n";
# etc.

Uri Guttman · Dec 10, 2004

SH> Ask for regex help in a VBScript forum? Cmon. Besides, my OP didn't
SH> mention VBScript, but seeked a regex solution. Anyway, I solved it on
SH> my own, and I present it here in Perl for those pedants who would
SH> rather complain about formalities than help someone.

and i bet your regex solution isn't even compatible with vbscript's.

SH> # get the second-to-last row
SH> $TestString = reverse($TestString);
SH> $TestString =~ m/slatot\"=ssalc rt<\s*>rt\/<([\s\S]*?)>rt\/</gi;
^^^^^^

why do that? slow and for sure that is a perlish feature.
why escape the "? it is not special in a regex.

SH> $LastRow = reverse($1);
SH> print $LastRow."\n";

SH> # Get the columns in the second-to-last row
SH> $LastRow =~ m/\s*<tr[\s\S]*?<td[\s\S]*?>([\s\S]*?)<\/td>
SH> \s*<td[\s\S]*?>([\s\S]*?)<\/td>
SH> \s*<td[\s\S]*?>([\s\S]*?)<\/td>/gix;

and that is impossible to read. choose an alternate delimiter. use /x
properly by breaking it up more and adding comments.

so as a pedant, i say your solution is poor and not as useful as you
claim it is. /x will almost surely be another perlish thing that other
regexes don't have.

so try again. see if you can keep up the high quality of your work while
answering all the posts that are off topic. why don't you help the
electrician track down his stalker too?

uri

Shailesh Humbad · Dec 10, 2004

That code is a contrived and translated version of my actual
(VBScript--actually windows scripting) code solely to show the solution
here in the ng, so that it might help someone in the future. Last I
checked, there is no newsgroup for regular expressions, so I thought
this would be the closest thing.

My question should really have been this. Is there a way, in Perl
regular expressions, to search backward in a string after searching
forward to a particular anchor point? In words, the algorithm would
be:

1. Search forward until you match 'b'.
2. Then search backward until you match 'a'.
3. Give me the contents between 'a' and 'b'.
('a' and 'b' are some pattern)

So is there a regex way to do this?

Anno Siegel · Dec 10, 2004

Shailesh Humbad said:
That code is a contrived and translated version of my actual
(VBScript--actually windows scripting) code solely to show the solution
here in the ng, so that it might help someone in the future. Last I
checked, there is no newsgroup for regular expressions, so I thought
this would be the closest thing.

My question should really have been this. Is there a way, in Perl
regular expressions, to search backward in a string after searching
forward to a particular anchor point? In words, the algorithm would
be:

1. Search forward until you match 'b'.
2. Then search backward until you match 'a'.
3. Give me the contents between 'a' and 'b'.
('a' and 'b' are some pattern)

So is there a regex way to do this?

Why bother to ask this in a newsgroup full of "pedants who would rather
complain about formalities than help someone"?

Anno

Can someone tell me if this a real tracker? Or is it one designed to show you a different message at certain times, ie. acting like one?	0	Jan 10, 2021
How can I calculate the last payment of the year to be the sum of all previous payments for that year and subtracting it from Research Costs value?	7	Aug 22, 2023
Sort by number of characters	1	Nov 2, 2023
Having trouble centering contents of td ?	3	May 2, 2023
Can anyone please help? HTML - two tables applying different styles	4	Dec 1, 2020
Filter table rows based on multiple checkboxes value	2	Jan 13, 2023
Image shifts to the right when export the page to pdf	4	May 5, 2023
Registration form	13	May 19, 2021

regular expressions problem

Shailesh Humbad

Keith Keller

Shailesh Humbad

Jürgen Exner

Sherm Pendley

Tad McClellan

Scott Bryce

Bill Karwin

Shailesh Humbad

Uri Guttman

Shailesh Humbad

Anno Siegel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads