regexp

Jayme Assuncao Casimiro · Jan 30, 2004

I have this piece of html text from Amazon.com

<dt><a
href="/exec/obidos/ASIN/0965761762/qid=917872216/sr=1-1/002-1496444-0064804">1
Business, 2 Approaches : How to Succeed in Internet Business by Employing
Real-World Strategies</a>
~ Usually ships in 2-3 days<dd>
Ron Gielgun / Hardcover / Published 1998
 
Our Price: $13.97 ~ You Save: $5.98
(30%)
 
<a
href="/exec/obidos/ASIN/0965761762/qid=917872216/sr=1-1/002-1496444-0064804">Read
more about this title...</a>


And I would like to use only one regexp to extract the title, the price,
and the desconunt in percent.

On the above example it would be:
title = 1 Business, 2 Approaches : How to Succeed in Internet Business byEmploying
Real-World Strategies
Price = $13.97
Descount = 30%

I have used:
($title) = $_ =~ m{<a.*?>(.*?)</a>};
($price) = $_ =~ m{.*Our Price:\s(\$?[\d\,.]+)};
($descount) = $_ =~ m{.*You Save:.*?[\d\,.]+.*?([\d\,.]+)};

But I would like to use only one regexp.

Thanks
+---------------------------------------------+
| Jayme Assuncao Casimiro |
| Graduado em Ciência da Computação |
| Estudante de Mestrado em Computação |
| Universidade Federal de Minas Gerais - UFMG |
+---------------------------------------------+

Gunnar Hjalmarsson · Jan 30, 2004

Jayme said:
I have used:
($title) = $_ =~ m{<a.*?>(.*?)</a>};
($price) = $_ =~ m{.*Our Price:\s(\$?[\d\,.]+)};
($descount) = $_ =~ m{.*You Save:.*?[\d\,.]+.*?([\d\,.]+)};

But I would like to use only one regexp.

So, what stops you?

($title, $price, $discount) = m{...};
------------------------------------^^^
(to be filles with the regex)

David K. Wall · Jan 30, 2004

Jayme Assuncao Casimiro said:
I have this piece of html text from Amazon.com
[snip HTML]

And I would like to use only one regexp to extract the title, the price,
and the desconunt in percent.

Don't do that. Use one of the modules designed for parsing HTML. Using REs
to parse HTML is painful and produces easily-broken code.

Gunnar Hjalmarsson · Jan 30, 2004

David said:
Jayme Assuncao Casimiro said:

I have this piece of html text from Amazon.com

[snip HTML]

And I would like to use only one regexp to extract the title, the
price, and the desconunt in percent.

Click to expand...

Don't do that. Use one of the modules designed for parsing HTML.
Using REs to parse HTML is painful and produces easily-broken code.

For extracting the first link and two other parts that are not
identified by help of HTML markup? Please, David, there are more
colours in this world than black and white. ;-)

perlfaq9 is less rigid:

http://www.perldoc.com/perl5.8.0/pod/perlfaq9.html#How-do-I-remove-HTML-from-a-string-

http://www.perldoc.com/perl5.8.0/pod/perlfaq9.html#How-do-I-extract-URLs-

David K. Wall · Jan 30, 2004

Gunnar Hjalmarsson said:
David said:

Jayme Assuncao Casimiro said:

I have this piece of html text from Amazon.com

[snip HTML]

And I would like to use only one regexp to extract the title, the
price, and the desconunt in percent.

Click to expand...

Don't do that. Use one of the modules designed for parsing HTML.
Using REs to parse HTML is painful and produces easily-broken code.

Click to expand...

For extracting the first link and two other parts that are not
identified by help of HTML markup? Please, David, there are more
colours in this world than black and white. ;-)

Yeah, you're right. <insert standard excuses>. Thanks for the reality
check.

David K. Wall · Jan 30, 2004

Jayme Assuncao Casimiro said:
I have this piece of html text from Amazon.com

<dt><a
href="/exec/obidos/ASIN/0965761762/qid=917872216/sr=1-1/002-1496444-00648
04">1 Business, 2 Approaches : How to Succeed in Internet Business by
Employing Real-World Strategies</a>
~ Usually ships in 2-3 days<dd>
Ron Gielgun / Hardcover / Published 1998
 
Our Price: $13.97 ~ You Save: $5.98
(30%)
 
<a
href="/exec/obidos/ASIN/0965761762/qid=917872216/sr=1-1/002-1496444-00648
04">Read more about this title...</a>


And I would like to use only one regexp to extract the title, the price,
and the desconunt in percent.

I still think you should use one of the HTML parsing modules to get the
otherwise unremarkable piece of HTML, but below is one regex that captures
all three things. Ugly and fragile.

my ($price, $title, $discount);
if ($html =~ m{
<dt>\s*
\s*
<a\s+href\s*=\s*"\S+">
([^<]+) # title
</a>\s*

.*?
Our\s+Price:\s+
(\S+) # price
.*?
You\s+Save:\s+\S+\s+
$(\S+)$ # discount
}xs )
{
($title, $price, $discount) = ($1, $2, $3);
$title =~ s/\s+/ /g;

print "title: $title\n\n";
print "price: $price\n\n";
print "discount: $discount\n";

}

Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
Help with code	0	Jun 12, 2022
xml html codes	1	Feb 19, 2004
Database SQL problem....	2	May 31, 2006
OT - Book for MCSD.NET	1	May 4, 2004
OOo and regexp	0	Dec 3, 2006
A very short story about Python in a Nutshell	1	Jul 10, 2003
Fundamentals of Financial Management Concise 7e Brigham Houston	0	May 1, 2011

regexp

Jayme Assuncao Casimiro

Gunnar Hjalmarsson

David K. Wall

Gunnar Hjalmarsson

David K. Wall

David K. Wall

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads