regular expression

Luciano Tolomei · Apr 28, 2005

i have to match some content in an html file.

the file is a single line.

i have to get some cell content from it.

i have build an expression but it match a lot of <tr></tr> together instead of matching on a time...

the line is more complex (with rowspan... ) and i have to retrieve a lot of cell's and there are a lot of rows.

but here a simplified example:

<tr><td>xxxx</td></tr><tr><td>yyyy</td></tr>

<tr>(.*)([^<]*)(.*)</tr>

it match yyyy, i need to match xxxx and yyyy.

i think that i have to change (.*) to make it match everything but not the </tr>
but i do not know how to do it.

John Bokma · Apr 28, 2005

Luciano said:
i have to match some content in an html file.

So put this in your subject. A regexp might not be the answer.

Look at File::Slurp to read the file in one go.

[ snip ]

i think that i have to change (.*) to make it match everything but not
the </tr> but i do not know how to do it.

I think you have to have a peek at HTML::TreeBuilder

see:
http://johnbokma.com/perl/froogle-script.html
http://johnbokma.com/perl/phpbb-remote-backup.html

for examples.

Fabian Pilkowski · Apr 28, 2005

* Luciano Tolomei said:
i have to match some content in an html file. the file is a single
line. i have to get some cell content from it. i have build an
expression but it match a lot of <tr></tr> together instead of
matching on a time...
the line is more complex (with rowspan... ) and i have to retrieve
a lot of cell's and there are a lot of rows. but here a simplified
example:

<tr><td>xxxx</td></tr><tr><td>yyyy</td></tr>

<tr>(.*)([^<]*)(.*)</tr>

it match yyyy, i need to match xxxx and yyyy. i think that i have to
change (.*) to make it match everything but not the </tr> but i do
not know how to do it.

It seems you want to strip all the HTML tags from your data. Once you
were doing that with your example above the string "xxxx yyyy" remains.

Have a look for HTML::Strip on CPAN. This module is for stripping all
the HTML stuff. Afterwards you could try to split() your data as usual.

regards,
fabian

Tad McClellan · Apr 28, 2005

[ Please limit your line lengths to the conventional 70-72 characters. ]

Luciano Tolomei said:
Subject: regular expression

That is not the Right Tool for your job.

i have to match some content in an html file.

You should use a module that understands HTML for processing HTML data.

i have to get some cell content from it.

the line is more complex (with rowspan... ) and i have to retrieve
a lot of cell's and there are a lot of rows.

There is a module that can handle that for you, no need to
reinvent that wheel.

i think that i have to change (.*) to make it match everything

I think you have to change to:

use HTML::TableExtract;

Martijn Lievaart · Apr 28, 2005

i think that i have to change (.*) to make it match everything but not the </tr>
but i do not know how to do it.

Have a look at (.*?) instead of (.*). Bu as others already noted, regexps
are not the best way to tackle this.

M4

Luciano Tolomei · Apr 28, 2005

Tad McClellan wrote:

I think you have to change to:

use HTML::TableExtract;

i do
really thanks.

Can someone tell me if this a real tracker? Or is it one designed to show you a different message at certain times, ie. acting like one?	0	Jan 10, 2021
Can anyone please help? HTML - two tables applying different styles	4	Dec 1, 2020
Sort by number of characters	1	Nov 2, 2023
Javascript DOM	1	Mar 29, 2023
Getting extra blank rows from appending HTML..?	2	Oct 24, 2023
Filter table rows based on multiple checkboxes value	2	Jan 13, 2023
When I send email as HTML, why do erroneous whitespaces getintroduced to the HTML source and a few <	2	Nov 8, 2013
SendGrid email issue in responsive Gmail	1	Nov 4, 2021

regular expression

Luciano Tolomei

John Bokma

Fabian Pilkowski

Tad McClellan

Martijn Lievaart

Luciano Tolomei

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads