regular expression

L

Luciano Tolomei

i have to match some content in an html file.

the file is a single line.

i have to get some cell content from it.

i have build an expression but it match a lot of <tr></tr> together instead of matching on a time...

the line is more complex (with rowspan... ) and i have to retrieve a lot of cell's and there are a lot of rows.

but here a simplified example:

<tr><td><font color=\"#EFAD00\">xxxx</font></td></tr><tr><td><font color=\"#EFAD00\">yyyy</font></td></tr>

<tr>(.*)<font color=\"#EFAD00\">([^<]*)(.*)</tr>

it match yyyy, i need to match xxxx and yyyy.

i think that i have to change (.*) to make it match everything but not the </tr>
but i do not know how to do it.
 
J

John Bokma

Luciano said:
i have to match some content in an html file.

So put this in your subject. A regexp might not be the answer.

Look at File::Slurp to read the file in one go.

[ snip ]
i think that i have to change (.*) to make it match everything but not
the </tr> but i do not know how to do it.

I think you have to have a peek at HTML::TreeBuilder

see:
http://johnbokma.com/perl/froogle-script.html
http://johnbokma.com/perl/phpbb-remote-backup.html

for examples.
 
F

Fabian Pilkowski

* Luciano Tolomei said:
i have to match some content in an html file. the file is a single
line. i have to get some cell content from it. i have build an
expression but it match a lot of <tr></tr> together instead of
matching on a time...
the line is more complex (with rowspan... ) and i have to retrieve
a lot of cell's and there are a lot of rows. but here a simplified
example:

<tr><td><font color=\"#EFAD00\">xxxx</font></td></tr><tr><td><font
color=\"#EFAD00\">yyyy</font></td></tr>

<tr>(.*)<font color=\"#EFAD00\">([^<]*)(.*)</tr>

it match yyyy, i need to match xxxx and yyyy. i think that i have to
change (.*) to make it match everything but not the </tr> but i do
not know how to do it.

It seems you want to strip all the HTML tags from your data. Once you
were doing that with your example above the string "xxxx yyyy" remains.

Have a look for HTML::Strip on CPAN. This module is for stripping all
the HTML stuff. Afterwards you could try to split() your data as usual.

regards,
fabian
 
T

Tad McClellan

[ Please limit your line lengths to the conventional 70-72 characters. ]


Luciano Tolomei said:
Subject: regular expression


That is not the Right Tool for your job.

i have to match some content in an html file.


You should use a module that understands HTML for processing HTML data.

i have to get some cell content from it.
the line is more complex (with rowspan... ) and i have to retrieve
a lot of cell's and there are a lot of rows.


There is a module that can handle that for you, no need to
reinvent that wheel.

i think that i have to change (.*) to make it match everything


I think you have to change to:

use HTML::TableExtract;

:)
 
M

Martijn Lievaart

i think that i have to change (.*) to make it match everything but not the </tr>
but i do not know how to do it.

Have a look at (.*?) instead of (.*). Bu as others already noted, regexps
are not the best way to tackle this.

M4
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,540
Members
45,025
Latest member
KetoRushACVFitness

Latest Threads

Top