Regexp Ruby selection

touffik · Jul 25, 2008

Hi folks,
I'm trying to code a ruby script that select the content of a HTML
table in a HTML page.
I used rubular to test my regexp syntax which is
/ <td class="TabIntCenContenuto"[^>]*>(.*)  /
with rubular the result of my expression is :
Result 1
1. 12345678
Result 2
1. SAN FRANCESCO DA PAOLA
Result 3
1. Via San Francesco Da Paola, 10
Result 4
1. 10123
Result 5
1. TORINO
etc....
But with my script :

File.open('D:/testt/1.txt', 'r') do |filein|

while line = filein.gets
p line if line =~ /<td class="TabIntCenContenuto"[^>]*>/ .. line
=~ /\/A /
end
fileout.puts p
end
end

I got this result
"</td><td class=\"TabIntCenContenuto\">12345678 \n"
"</td><td class=\"TabIntCenContenuto\">SAN FRANCESCO DA PAOLA </
td>\n"
"<td class=\"TabIntCenContenuto\">Via San Francesco Da Paola,
10 </td>\n"
"<td class=\"TabIntCenContenuto\">10123 </td>\n"
"<td class=\"TabIntCenContenuto\" align=\"left\">TORINO </td>\n"

I thought the .. between 2 "line =~" was like (...) in rubular which
let catch the content ??
Moreover I would like to transform this html code in XML. But I can"t
find an idea how to transform these HTML line in XML.
<root>
<number>12345678</number>
But there is no attribut 'name' or wathever in the <td> so making and
match/replace would be difficult ?
...

So, if someone can help me I would be very grateful.
Nice day

Srijayanth Sridhar · Jul 25, 2008

Any specific reason you can't use hpricot or other HTML parsers?

Jayanth

Shadowfirebird · Jul 25, 2008

I'll second that. Hpricot is really quite remarkable. It'll almost
certainly save you days and days of pain. Unless you are doing this
for fun / learning, of course.

Any specific reason you can't use hpricot or other HTML parsers?

Jayanth

Hi folks,
I'm trying to code a ruby script that select the content of a HTML
table in a HTML page.
I used rubular to test my regexp syntax which is
/ <td class="TabIntCenContenuto"[^>]*>(.*)  /
with rubular the result of my expression is :
Result 1
1. 12345678
Result 2
1. SAN FRANCESCO DA PAOLA
Result 3
1. Via San Francesco Da Paola, 10
Result 4
1. 10123
Result 5
1. TORINO
etc....
But with my script :

File.open('D:/testt/1.txt', 'r') do |filein|

while line = filein.gets
p line if line =~ /<td class="TabIntCenContenuto"[^>]*>/ .. line
=~ /\/A /
end
fileout.puts p
end
end

I got this result
"</td><td class=\"TabIntCenContenuto\">12345678 \n"
"</td><td class=\"TabIntCenContenuto\">SAN FRANCESCO DA PAOLA </
td>\n"
"<td class=\"TabIntCenContenuto\">Via San Francesco Da Paola,
10 </td>\n"
"<td class=\"TabIntCenContenuto\">10123 </td>\n"
"<td class=\"TabIntCenContenuto\" align=\"left\">TORINO </td>\n"

I thought the .. between 2 "line =~" was like (...) in rubular which
let catch the content ??
Moreover I would like to transform this html code in XML. But I can"t
find an idea how to transform these HTML line in XML.
<root>
<number>12345678</number>
But there is no attribut 'name' or wathever in the <td> so making and
match/replace would be difficult ?
..

So, if someone can help me I would be very grateful.
Nice day

Click to expand...

--
Me, I imagine places that I have never seen / The colored lights in
fountains, blue and green / And I imagine places that I will never go
/ Behind these clouds that hang here dark and low
But it's there when I'm holding you / There when I'm sleeping too /
There when there's nothing left of me / Hanging out behind the
burned-out factories / Out of reach but leading me / Into the
beautiful sea

Sebastian Hungerecker · Jul 25, 2008

I thought the .. between 2 "line =~" was like (...) in rubular which
let catch the content ??

Generally in ruby .. denotes a range. Like starting_value .. end_value.
In this case though it denotes a flip flop, which is evil and should never
ever be used because it makes my head hurt. Here's what it does though:
some_loop {
do_something if foo .. bar
}
This will do nothing until foo is true. When foo is true it will do_something.
It will then keep doing_something in every iteration of the loop until bar
becomes true. After bar became true it will stop doing_something until foo is
true again.
So as a summary: It doesn't do what you thought it did. As a matter of fact it
doesn't do anything sane. So just keep as far away from it as possible.

HTH,
Sebastian

touffik · Jul 25, 2008

Any specific reason you can't use hpricot or other HTML parsers?

I didn't know this tool for ruby. I used once a parser named tidy but
that's all. I'll try now and let you know.

Generally in ruby .. denotes a range. Like starting_value .. end_value.
In this case though it denotes a flip flop, which is evil and should never
ever be used because it makes my head hurt. Here's what it does though:
some_loop {
do_something if foo .. bar}

This will do nothing until foo is true. When foo is true it will do_something.
It will then keep doing_something in every iteration of the loop until bar
becomes true. After bar became true it will stop doing_something until foo is
true again.
So as a summary: It doesn't do what you thought it did. As a matter of fact it
doesn't do anything sane. So just keep as far away from it as possible.

So i was wrong .. Thanks you for your explaination of this wrong use
of the loop.

Thanks you.

Peña, Botp · Jul 25, 2008

From: (e-mail address removed) [mailto:[email protected]]=20
# I'm trying to code a ruby script that select the content of a HTML
# table in a HTML page.
# I used rubular to test my regexp syntax which is
# / <td class=3D"TabIntCenContenuto"[^>]*>(.*)  /

the re is fine, you can use that

# with rubular the result of my expression is :
# Result 1
# 1. 12345678
# Result 2
# 1. SAN FRANCESCO DA PAOLA
# Result 3
# 1. Via San Francesco Da Paola, 10
# Result 4
# 1. 10123
# Result 5
# 1. TORINO
# etc....
# But with my script :
#=20
# File.open('D:/testt/1.txt', 'r') do |filein|
# while line =3D filein.gets
# p line if line =3D~ /<td class=3D"TabIntCenContenuto"[^>]*>/ .. =
line
# =3D~ /\/A /
# end
# fileout.puts p
# end
# end
# I got this result
# "</td><td class=3D\"TabIntCenContenuto\">12345678 \n"
# "</td><td class=3D\"TabIntCenContenuto\">SAN FRANCESCO DA =
PAOLA </
# td>\n"
# "<td class=3D\"TabIntCenContenuto\">Via San Francesco Da Paola,
# 10 </td>\n"
# "<td class=3D\"TabIntCenContenuto\">10123 </td>\n"
# "<td class=3D\"TabIntCenContenuto\" =
align=3D\"left\">TORINO </td>\n"

you already got it, but you did not capture

sample code & run,

botp@botp-desktop:~$ cat test.rb
File.open('test.txt') do |f|
while line =3D f.gets
if line=3D~/<td class=3D"TabIntCenContenuto"[^>]*>(.*) /
p $1
end
end
end

botp@botp-desktop:~$ ruby test.rb
"12345678"
"SAN FRANCESCO DA PAOLA"
"Via San Francesco Da Paola,10"
"10123"
"TORINO"
=20
# I thought the .. between 2 "line =3D~" was like (...) in rubular which
# let catch the content ??

you are making it harder. keep it simple.

# Moreover I would like to transform this html code in XML. But I can"t
# find an idea how to transform these HTML line in XML.
# <root>
# <number>12345678</number>
# But there is no attribut 'name' or wathever in the <td> so making and
# match/replace would be difficult ?

if the html is nicely formatted, you can loop through the table.=20
if you want to be sure, try outputting all the data you can capture =
first. Then output that again with xml tags inserted.

do not worry. xml, like html, is just text w tags. Manipulating text is =
a good learning exercise for ruby.

kind regards -botp

SendGrid email issue in responsive Gmail	1	Nov 4, 2021
What does this warning actually mean?	1	Aug 2, 2007
Checking dynamically populated data using ajax with user entered value	5	Apr 11, 2020
Problem with regex using mx flags	1	Dec 21, 2005
Html + Javascript + Ruby	11	Feb 18, 2011
Regexp: How to do this...	1	Oct 22, 2006
regexp(ing) Backus-Naurish expressions ...	7	Mar 13, 2013
Help with my responsive home page	2	Dec 14, 2022

Regexp Ruby selection

touffik

Srijayanth Sridhar

Shadowfirebird

Sebastian Hungerecker

touffik

Peña, Botp

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads