[Newbie] Getting data from html-ish like crap.

G

Guest

Hi,
I wanted to learn something, and choosed ruby,
since it looked awesome, and can't say it isn't.
I am not expierienced programmer (tried some
Pascal, then PHP), and decided to do something
small, but usefull. Let's get straight into the
problem.

At the url:
https://www.knightonlineworld.com/index.php?pg=rankings&sub=2&radServer=1&clanid=12199
I've got something similar to html (at least
it's not table based, but still damn ugly code
there) with online statistics.

I don't want to parse that one, i just want to
'crop that crap' and retrive informations from
data inside one element a div with an
id="bleet". There are tables in there, but seems
impossible to navigate there.

I am really suck at strings :(

<tr bgcolor="#FFFFFF">
<td align="center" >Quatrina</td> # i want this
<td align="center" >52</td> #
<td align="center" >Shaman</td> #
<td align="center" >563</td> # and this one
</tr> # preferable everything :p

I would want to hash that data, to have it
usefull in future (aiming a rails app in
future), but selecting two columns in each row,
located in the last table in the id'ed element
placed in not-well-made document looks
impossible for me. I don't even know where to
start, and it's FAR away from the things i
wanted to do (counting numbers*, assinging
additional data).


All i've done already is getting the document:
require 'uri'
require 'net/http'

trg = "https://www.knightonlineworld.com/index.php?pg=rankings&sub=2&radServer=1&clanid=12199"
puts 'processing' + cel + " :\n"

r = Net::HTTP.get_response(URI.parse(trg).host, URI.parse(trg).path)

puts r.body


Thanks for reading.

*/Notice it does count "Loyality" wrong/
 
W

William James

spam_monkey said:
Hi,
I wanted to learn something, and choosed ruby,
since it looked awesome, and can't say it isn't.
I am not expierienced programmer (tried some
Pascal, then PHP), and decided to do something
small, but usefull. Let's get straight into the
problem.

At the url:
https://www.knightonlineworld.com/index.php?pg=rankings&sub=2&radServer=1&clanid=12199
I've got something similar to html (at least
it's not table based, but still damn ugly code
there) with online statistics.

I don't want to parse that one, i just want to
'crop that crap' and retrive informations from
data inside one element a div with an
id="bleet". There are tables in there, but seems
impossible to navigate there.

I am really suck at strings :(

<tr bgcolor="#FFFFFF">
<td align="center" >Quatrina</td> # i want this
<td align="center" >52</td> #
<td align="center" >Shaman</td> #
<td align="center" >563</td> # and this one
</tr> # preferable everything :p

I would want to hash that data, to have it
usefull in future (aiming a rails app in
future), but selecting two columns in each row,
located in the last table in the id'ed element
placed in not-well-made document looks
impossible for me. I don't even know where to
start, and it's FAR away from the things i
wanted to do (counting numbers*, assinging
additional data).


All i've done already is getting the document:

r.body does not contain "Quatrina".
 
J

Justin Bailey

------=_Part_317_16517320.1141231381682
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

First, don't use Net::HTTP. Require 'open-uri' at the top and you can
simplify your code a lot:

open(https://www.knightonlineworld
.com/index.php?pg=3Drankings&sub=3D2&radServer=3D1&clanid=3D12199) do |pa=
ge|


html =3D page.gets(nil)
end


Which will get the whole document into the 'html' variable.

Next, look at StringScanner and using regular expressions. It will allow yo=
u
to iterate through your document quickly and pick up the columns you want.
Some pseudo-code might look like:

scanner =3D StringScanner.new
while scanner.check(/<tr>.*<td>(.*)<\/td>.*<td>.*<\/td>.*<td>.*<\/td>.*<td>=
(.*)<\/td>.*</tr>/m)
do
name =3D scanner[1]
points =3D scanner[2]
end

That will extract the name and level from each row. The parantheses in the
regular expression are "capture groups", and they relate to the assignments
in the loop (name =3D scanner[1], points =3D scanner[2]). The 'm' following=
the
regular expression makes sure a multi-line match is performed, which is
probably necessary as the table cells are on different lines.

For further syntax and library help check http://www.ruby-doc.org.
Especially check out the 'Programming Ruby' book and read up Ruby's regular
expressions, if you aren't familiar with them.

Hope that helps!


Hi,
I wanted to learn something, and choosed ruby,
since it looked awesome, and can't say it isn't.
I am not expierienced programmer (tried some
Pascal, then PHP), and decided to do something
small, but usefull. Let's get straight into the
problem.

At the url:

https://www.knightonlineworld.com/index.php?pg=3Drankings&sub=3D2&radServ= er=3D1&clanid=3D12199
I've got something similar to html (at least
it's not table based, but still damn ugly code
there) with online statistics.

I don't want to parse that one, i just want to
'crop that crap' and retrive informations from
data inside one element a div with an
id=3D"bleet". There are tables in there, but seems
impossible to navigate there.

I am really suck at strings :(

<tr bgcolor=3D"#FFFFFF">
<td align=3D"center" >Quatrina</td> # i want this
<td align=3D"center" >52</td> #
<td align=3D"center" >Shaman</td> #
<td align=3D"center" >563</td> # and this one
</tr> # preferable everything :p

I would want to hash that data, to have it
usefull in future (aiming a rails app in
future), but selecting two columns in each row,
located in the last table in the id'ed element
placed in not-well-made document looks
impossible for me. I don't even know where to
start, and it's FAR away from the things i
wanted to do (counting numbers*, assinging
additional data).


All i've done already is getting the document:



Thanks for reading.

*/Notice it does count "Loyality" wrong/

------=_Part_317_16517320.1141231381682--
 
C

Charlie Bowman

--=-7mQ6BB4fWm/p5PEspuGC
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Here's some sample code you might enjoy. It's a random chuck norris
joke generator that pulls the jokes off of a website.

require 'net/http'


page=3D'http://www.4q.cc/index.php?pid=3Dfact&person=3Dchuck'
res =3D Net::HTTP.get(URI.parse(page))
res.scan(/(<\/h1>)(.*)(<hr \/>)/)
puts ($2 || 'No fact was found!')


First, don't use Net::HTTP. Require 'open-uri' at the top and you can
simplify your code a lot:
=20
open(https://www.knightonlineworld |page|
=20
=20
html =3D page.gets(nil)
end
=20
=20
Which will get the whole document into the 'html' variable.
=20
Next, look at StringScanner and using regular expressions. It will allo= w you
to iterate through your document quickly and pick up the columns you wa= nt.
Some pseudo-code might look like:
=20
scanner =3D StringScanner.new
while scanner.check(/<tr>.*<td>(.*)<\/td>.*<td>.*<\/td>.*<td>.*<\/td>.*=
do
name =3D scanner[1]
points =3D scanner[2]
end
=20
That will extract the name and level from each row. The parantheses in = the
regular expression are "capture groups", and they relate to the assignm= ents
in the loop (name =3D scanner[1], points =3D scanner[2]). The 'm' follo= wing the
regular expression makes sure a multi-line match is performed, which is
probably necessary as the table cells are on different lines.
=20
For further syntax and library help check http://www.ruby-doc.org.
Especially check out the 'Programming Ruby' book and read up Ruby's reg= ular
expressions, if you aren't familiar with them.
=20
Hope that helps!
=20
=20
Hi,
I wanted to learn something, and choosed ruby,
since it looked awesome, and can't say it isn't.
I am not expierienced programmer (tried some
Pascal, then PHP), and decided to do something
small, but usefull. Let's get straight into the
problem.

At the url:

https://www.knightonlineworld.com/index.php?pg=3Drankings=E2=8A=82=3D= 2&radServer=3D1&clanid=3D12199
I've got something similar to html (at least
it's not table based, but still damn ugly code
there) with online statistics.

I don't want to parse that one, i just want to
'crop that crap' and retrive informations from
data inside one element a div with an
id=3D"bleet". There are tables in there, but seems
impossible to navigate there.

I am really suck at strings :(

<tr bgcolor=3D"#FFFFFF">
<td align=3D"center" >Quatrina</td> # i want this
<td align=3D"center" >52</td> #
<td align=3D"center" >Shaman</td> #
<td align=3D"center" >563</td> # and this one
</tr> # preferable everything :p

I would want to hash that data, to have it
usefull in future (aiming a rails app in
future), but selecting two columns in each row,
located in the last table in the id'ed element
placed in not-well-made document looks
impossible for me. I don't even know where to
start, and it's FAR away from the things i
wanted to do (counting numbers*, assinging
additional data).


All i've done already is getting the document:

https://www.knightonlineworld.com/index.php?pg=3Drankings=E2=8A=82=3D= 2&radServer=3D1&clanid=3D12199
"


Thanks for reading.

*/Notice it does count "Loyality" wrong/

Charlie Bowman
http://www.recentrambles.com

--=-7mQ6BB4fWm/p5PEspuGC--
 
G

greg.rb

I tried:

require "open-uri"
trg= open
"https://www.knightonlineworld.com/index.php?pg=rankings&sub=2&radServer=1&clanid=12199"

trg.each do |line|
puts line
end

Result:
c:/ruby/lib/ruby/1.8/open-uri.rb:583:in `proxy_open': open-uri doesn't
support https. (ArgumentError)
from c:/ruby/lib/ruby/1.8/open-uri.rb:525:in `direct_open'
from c:/ruby/lib/ruby/1.8/open-uri.rb:169:in `open_loop'
from c:/ruby/lib/ruby/1.8/open-uri.rb:164:in `catch'
from c:/ruby/lib/ruby/1.8/open-uri.rb:164:in `open_loop'
from c:/ruby/lib/ruby/1.8/open-uri.rb:134:in `open_uri'
from c:/ruby/lib/ruby/1.8/open-uri.rb:424:in `open'
from c:/ruby/lib/ruby/1.8/open-uri.rb:85:in `open'
from Knight4.rb:2

doesn't look like it liked https
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top