regular expressions problem

  • Thread starter Shailesh Humbad
  • Start date
S

Shailesh Humbad

I want to parse the values from the second-to-last row in an html
table.

....
<tr class="odd">
<td style="text-align: right;" nowrap="nowrap">99</td>
<td style="text-align: right;" nowrap="nowrap">111</td>
<td style="text-align: right;" nowrap="nowrap">52255</td>
<td style="text-align: right;" nowrap="nowrap">333</td>
<td style="text-align: right;" nowrap="nowrap">2323</td>
</tr>
<tr class="totals">
....

I can identify the last row by the "totals" class. So I want the regex
to work backward from there and get the values in each of the cells of
the previous row. It should ignore all prior content and whitespace
between tags. Can anyone help? Here is what I have so far:
/([\s\S]*?)<tr class\=\"totals/
 
K

Keith Keller

I want to parse the values from the second-to-last row in an html
table.

Have you looked at the various HTML parsers available on CPAN? Doing
this with a regex is bound to cause problems. (I'm partial to
HTML::TreeBuilder, myself, but I'm sure that others can make additional
suggestions.)

--keith
 
S

Shailesh Humbad

Keith said:
Have you looked at the various HTML parsers available on CPAN? Doing
this with a regex is bound to cause problems. (I'm partial to
HTML::TreeBuilder, myself, but I'm sure that others can make additional
suggestions.)

--keith

Trouble is, I am using regular expressions in a VBScript file, so I
don't have any Perl support... Even then, the page is probably not
valid HTML. I could use multiple regular expressions in steps. At
least, is there a way to match from "<tr class=\"totals" to the
immediately previous "<tr"? From there I could figure it out. Maybe
I'll try searching within a reversed copy of the string.
 
J

Jürgen Exner

Shailesh said:
I want to parse the values from the second-to-last row in an html
table.

...
<tr class="odd">
<td style="text-align: right;" nowrap="nowrap">99</td>
<td style="text-align: right;" nowrap="nowrap">111</td>
<td style="text-align: right;" nowrap="nowrap">52255</td>
<td style="text-align: right;" nowrap="nowrap">333</td>
<td style="text-align: right;" nowrap="nowrap">2323</td>
</tr>
<tr class="totals">

As has been mentioned here _very_ frequently parsing HTML correctly using
REs is insane. It hasn't even been proven if the extended REs in Perl would
be powerful enough to do it (normal REs are definitely not sufficient!), let
alone finding a usable RE to do it.

Use an HTML parser to parse HTML. There are several on CPAN.
And please read the FAQ before asking frequently asked questions (perldoc -q
"remove HTML").

jue
 
S

Sherm Pendley

Shailesh said:
Trouble is, I am using regular expressions in a VBScript file, so I
don't have any Perl support...

The VBScript group is down the hall on your left. Don't let the door hit
you on the way out.

sherm--
 
S

Scott Bryce

Sherm said:
The VBScript group is down the hall on your left. Don't let the door hit
you on the way out.

Which, when translated, means...

Regular expressions in VBScript are different than regular expressions
in Perl. Any help we give you may not carry over into VBScript. Asking
in a Perl newsgroup about programming in VBScript is a waste of our time
and yours.
 
B

Bill Karwin

Shailesh said:
Trouble is, I am using regular expressions in a VBScript file, so I
don't have any Perl support... Even then, the page is probably not
valid HTML.

There are XML & HTML parsers for Microsoft languages. You'll be much
more successful using something like that than trying to create a custom
regular expression. These types of problems tend to mutate, and very
quickly any regular expression(s) you create will not be appropriate for
the task. Better to use the right tool for the job.

Here's an introduction to the Microsoft XML parser, which supports
several languages including VBScript and Perl (see? on topic! ;-)

http://www.w3schools.com/dom/dom_parser.asp

Regards,
Bill K.
 
S

Shailesh Humbad

Ask for regex help in a VBScript forum? Cmon. Besides, my OP didn't
mention VBScript, but seeked a regex solution. Anyway, I solved it on
my own, and I present it here in Perl for those pedants who would
rather complain about formalities than help someone.

#!/usr/bin/perl -W

$TestString = qq{
<td style="text-align: right;" nowrap="nowrap">433</td>
</tr>
<tr class="odd">
<td style="text-align: right;" nowrap="nowrap">99</td>
<td style="text-align: right;" nowrap="nowrap">111</td>
<td style="text-align: right;" nowrap="nowrap">52255</td>
<td style="text-align: right;" nowrap="nowrap">333</td>
<td style="text-align: right;" nowrap="nowrap">2323</td>
</tr>
<tr class="totals">
<td style="text-align: right;" nowrap="nowrap">122</td>
};

# get the second-to-last row
$TestString = reverse($TestString);
$TestString =~ m/slatot\"=ssalc rt<\s*>rt\/<([\s\S]*?)>rt\/</gi;
$LastRow = reverse($1);
print $LastRow."\n";

# Get the columns in the second-to-last row
$LastRow =~ m/\s*<tr[\s\S]*?<td[\s\S]*?>([\s\S]*?)<\/td>
\s*<td[\s\S]*?>([\s\S]*?)<\/td>
\s*<td[\s\S]*?>([\s\S]*?)<\/td>/gix;
print $1."\n";
print $2."\n";
print $3."\n";
# etc.
 
U

Uri Guttman

SH> Ask for regex help in a VBScript forum? Cmon. Besides, my OP didn't
SH> mention VBScript, but seeked a regex solution. Anyway, I solved it on
SH> my own, and I present it here in Perl for those pedants who would
SH> rather complain about formalities than help someone.

and i bet your regex solution isn't even compatible with vbscript's.

SH> # get the second-to-last row
SH> $TestString = reverse($TestString);
SH> $TestString =~ m/slatot\"=ssalc rt<\s*>rt\/<([\s\S]*?)>rt\/</gi;
^^^^^^

why do that? slow and for sure that is a perlish feature.
why escape the "? it is not special in a regex.

SH> $LastRow = reverse($1);
SH> print $LastRow."\n";

SH> # Get the columns in the second-to-last row
SH> $LastRow =~ m/\s*<tr[\s\S]*?<td[\s\S]*?>([\s\S]*?)<\/td>
SH> \s*<td[\s\S]*?>([\s\S]*?)<\/td>
SH> \s*<td[\s\S]*?>([\s\S]*?)<\/td>/gix;

and that is impossible to read. choose an alternate delimiter. use /x
properly by breaking it up more and adding comments.

so as a pedant, i say your solution is poor and not as useful as you
claim it is. /x will almost surely be another perlish thing that other
regexes don't have.

so try again. see if you can keep up the high quality of your work while
answering all the posts that are off topic. why don't you help the
electrician track down his stalker too?

uri
 
S

Shailesh Humbad

That code is a contrived and translated version of my actual
(VBScript--actually windows scripting) code solely to show the solution
here in the ng, so that it might help someone in the future. Last I
checked, there is no newsgroup for regular expressions, so I thought
this would be the closest thing.

My question should really have been this. Is there a way, in Perl
regular expressions, to search backward in a string after searching
forward to a particular anchor point? In words, the algorithm would
be:

1. Search forward until you match 'b'.
2. Then search backward until you match 'a'.
3. Give me the contents between 'a' and 'b'.
('a' and 'b' are some pattern)

So is there a regex way to do this?
 
A

Anno Siegel

Shailesh Humbad said:
That code is a contrived and translated version of my actual
(VBScript--actually windows scripting) code solely to show the solution
here in the ng, so that it might help someone in the future. Last I
checked, there is no newsgroup for regular expressions, so I thought
this would be the closest thing.

My question should really have been this. Is there a way, in Perl
regular expressions, to search backward in a string after searching
forward to a particular anchor point? In words, the algorithm would
be:

1. Search forward until you match 'b'.
2. Then search backward until you match 'a'.
3. Give me the contents between 'a' and 'b'.
('a' and 'b' are some pattern)

So is there a regex way to do this?

Why bother to ask this in a newsgroup full of "pedants who would rather
complain about formalities than help someone"?

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top