[QUIZ] Gathering Ruby Quiz 2 Data (#189)

Daniel Moore · Jan 23, 2009

Greetings!

Welcome to the inaugural Ruby Quiz 3!

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this
quiz until 48 hours have elapsed from the time this message was
sent.

2. Support Ruby Quiz by submitting ideas and responses
as often as you can! Visit: <http://rubyquiz.strd6.com>

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem
helps everyone on Ruby Talk follow the discussion. Please reply to
the original quiz message, if you can.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

## Gathering Ruby Quiz 2 Data

I'm building the new Ruby Quiz website and I need your help...

This week's quiz involves gathering the existing Ruby Quiz 2 data from
the Ruby Quiz website: <http://splatbang.com/rubyquiz/>

Each quiz entry contains the following information:

* id
* title
* description
* summary

There are also many quiz solutions that belong to each quiz. The quiz
solutions have the following:

* quiz_id
* author
* ruby_talk_reference
* text

Matthew has some advice for getting at the data:

If you start at <http://splatbang.com/rubyquiz/>, you'll see
the quiz list on the left are all links to the same quiz.rhtml file
(embedded Ruby), but with different id parameters. Those
parameters are the name of a subdirectory. So, for example,
take quiz #184, which has a link like this:

<http://splatbang.com/rubyquiz/quiz.rhtml?id=184_Befunge>

So there is a subdirectory called "184_Befunge". There
are basically three files in every directory:

* quiz.txt -- the quiz description
* sols.txt -- a list of author names and the ruby-talk message # of the submission
* summ.txt -- the quiz summary

Examples:
* <http://splatbang.com/rubyquiz/184_Befunge/quiz.txt>
* <http://splatbang.com/rubyquiz/184_Befunge/sols.txt>
* <http://splatbang.com/rubyquiz/184_Befunge/summ.txt>

Your program will collect and output this data as yaml (or your favorite data
serialization standard; xml, json, etc.).

Robert Dober · Jan 23, 2009

Greetings!

Welcome to the inaugural Ruby Quiz 3!

-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-= =3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this
quiz until 48 hours have elapsed from the time this message was
sent.

2. Support Ruby Quiz by submitting ideas and responses
as often as you can! Visit: <http://rubyquiz.strd6.com>

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem
helps everyone on Ruby Talk follow the discussion. Please reply to
the original quiz message, if you can.

-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-= =3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-

## Gathering Ruby Quiz 2 Data

I'm building the new Ruby Quiz website and I need your help...

This week's quiz involves gathering the existing Ruby Quiz 2 data from
the Ruby Quiz website: <http://splatbang.com/rubyquiz/>

Each quiz entry contains the following information:

* id
* title
* description
* summary

There are also many quiz solutions that belong to each quiz. The quiz
solutions have the following:

* quiz_id
* author
* ruby_talk_reference
* text

Matthew has some advice for getting at the data:

If you start at <http://splatbang.com/rubyquiz/>, you'll see
the quiz list on the left are all links to the same quiz.rhtml file
(embedded Ruby), but with different id parameters. Those
parameters are the name of a subdirectory. So, for example,
take quiz #184, which has a link like this:

<http://splatbang.com/rubyquiz/quiz.rhtml?id=3D184_Befunge>

So there is a subdirectory called "184_Befunge". There
are basically three files in every directory:

* quiz.txt -- the quiz description
* sols.txt -- a list of author names and the ruby-talk message # of the= submission
* summ.txt -- the quiz summary

Examples:
* <http://splatbang.com/rubyquiz/184_Befunge/quiz.txt>
* <http://splatbang.com/rubyquiz/184_Befunge/sols.txt>
* <http://splatbang.com/rubyquiz/184_Befunge/summ.txt>

Click to expand...

Your program will collect and output this data as yaml (or your favorite = data
serialization standard; xml, json, etc.).

Daniel in which time zone are you? What do you and the others think if
we give our friends in GMT-x some more time? My suggestion would be to
extend the spoiler period to something like Sunday 13h or 14h GMT.
Actually I do not care about the Americans

I just sleep that long on WEs=

Matthew Moss · Jan 23, 2009

Daniel in which time zone are you? What do you and the others think if

we give our friends in GMT-x some more time? My suggestion would be to
extend the spoiler period to something like Sunday 13h or 14h GMT.
Actually I do not care about the Americans I just sleep that long
on WEs.

Are you suggesting that a duration of 48 hours varies in duration from
time zone to time zone?

*wink wink*

Gregory Brown · Jan 23, 2009

American dollars are not worth as much as the Euro, so I would guess
that is exactly what he is saying. I mean time IS money afterall.

Damn you Daniel! First day on the job and you've got your hand in my pocket!

-greg

Daniel Moore · Jan 24, 2009

I'm not opposed to extending the no spoiler period to give everyone
more of the weekend to contemplate. **So everyone, please no spoilers
until Sun 14:00 GMT**. As always feel free to ask questions and post
non-spoiler discussion any time.

My local time is UTC-8 so I posted the quiz Thursday night right
before going to be, which works out well for my schedule.

Open question to everyone: What day and time would you prefer to have
the new quizzes posted and how long of a no-spoiler period do you
prefer?

Greetings!

Welcome to the inaugural Ruby Quiz 3!

-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D= -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this
quiz until 48 hours have elapsed from the time this message was
sent.

2. Support Ruby Quiz by submitting ideas and responses
as often as you can! Visit: <http://rubyquiz.strd6.com>

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem
helps everyone on Ruby Talk follow the discussion. Please reply to
the original quiz message, if you can.

-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D= -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-

## Gathering Ruby Quiz 2 Data

I'm building the new Ruby Quiz website and I need your help...

This week's quiz involves gathering the existing Ruby Quiz 2 data from
the Ruby Quiz website: <http://splatbang.com/rubyquiz/>

Each quiz entry contains the following information:

* id
* title
* description
* summary

There are also many quiz solutions that belong to each quiz. The quiz
solutions have the following:

* quiz_id
* author
* ruby_talk_reference
* text

Matthew has some advice for getting at the data:

If you start at <http://splatbang.com/rubyquiz/>, you'll see
the quiz list on the left are all links to the same quiz.rhtml file
(embedded Ruby), but with different id parameters. Those
parameters are the name of a subdirectory. So, for example,
take quiz #184, which has a link like this:

<http://splatbang.com/rubyquiz/quiz.rhtml?id=3D184_Befunge>

So there is a subdirectory called "184_Befunge". There
are basically three files in every directory:

* quiz.txt -- the quiz description
* sols.txt -- a list of author names and the ruby-talk message # of th= e submission
* summ.txt -- the quiz summary

Examples:
* <http://splatbang.com/rubyquiz/184_Befunge/quiz.txt>
* <http://splatbang.com/rubyquiz/184_Befunge/sols.txt>
* <http://splatbang.com/rubyquiz/184_Befunge/summ.txt>

Click to expand...

Your program will collect and output this data as yaml (or your favorite= data
serialization standard; xml, json, etc.).

Click to expand...

Daniel in which time zone are you? What do you and the others think if
we give our friends in GMT-x some more time? My suggestion would be to
extend the spoiler period to something like Sunday 13h or 14h GMT.
Actually I do not care about the Americans I just sleep that long on W= Es.
Just 0.02=80.
Robert

--=20
-Daniel
http://strd6.com

peter · Jan 27, 2009

What's the deadline btw? I am almost ready with the solution since the
weekend, but have too much on my plate to finish it right now

Cheers,
Peter
__
http://www.rubyrailways.com

Greetings!

Welcome to the inaugural Ruby Quiz 3!

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this
quiz until 48 hours have elapsed from the time this message was
sent.

2. Support Ruby Quiz by submitting ideas and responses
as often as you can! Visit: <http://rubyquiz.strd6.com>

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem
helps everyone on Ruby Talk follow the discussion. Please reply to
the original quiz message, if you can.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

## Gathering Ruby Quiz 2 Data

I'm building the new Ruby Quiz website and I need your help...

This week's quiz involves gathering the existing Ruby Quiz 2 data from
the Ruby Quiz website: <http://splatbang.com/rubyquiz/>

Each quiz entry contains the following information:

* id
* title
* description
* summary

There are also many quiz solutions that belong to each quiz. The quiz
solutions have the following:

* quiz_id
* author
* ruby_talk_reference
* text

Matthew has some advice for getting at the data:

If you start at <http://splatbang.com/rubyquiz/>, you'll see
the quiz list on the left are all links to the same quiz.rhtml file
(embedded Ruby), but with different id parameters. Those
parameters are the name of a subdirectory. So, for example,
take quiz #184, which has a link like this:

<http://splatbang.com/rubyquiz/quiz.rhtml?id=184_Befunge>

So there is a subdirectory called "184_Befunge". There
are basically three files in every directory:

* quiz.txt -- the quiz description
* sols.txt -- a list of author names and the ruby-talk message # of the
submission
* summ.txt -- the quiz summary

Examples:
* <http://splatbang.com/rubyquiz/184_Befunge/quiz.txt>
* <http://splatbang.com/rubyquiz/184_Befunge/sols.txt>
* <http://splatbang.com/rubyquiz/184_Befunge/summ.txt>

Click to expand...

Your program will collect and output this data as yaml (or your favorite
data
serialization standard; xml, json, etc.).

Gregory Brown · Jan 27, 2009

What's the deadline btw? I am almost ready with the solution since the
weekend, but have too much on my plate to finish it right now

Historically there have been no deadlines that I know of, just that if
you aren't reasonably timely, you won't have a shot at being mentioned
in the summary. But at least when James ran it, you could certainly
submit late solutions for the archives. I hope this tradition is
continued, but you can always of course post here at any rate.

-greg

Daniel Moore · Jan 27, 2009

Historically there have been no deadlines that I know of, just that if
you aren't reasonably timely, you won't have a shot at being mentioned
in the summary. But at least when James ran it, you could certainly
submit late solutions for the archives. I hope this tradition is
continued, but you can always of course post here at any rate.

-greg

--
Technical Blaag at: http://blog.majesticseacreature.com
Non-tech stuff at: http://metametta.blogspot.com
"Ruby Best Practices" Book now in O'Reilly Roughcuts:
http://rubybestpractices.com

Gregory is correct, there aren't any hard deadlines. However, if you
post your solution by early Thursday then it stands a better chance to
get into the quiz summary.

peter · Jan 29, 2009

Greetings!

Welcome to the inaugural Ruby Quiz 3!

Here is my scRUBYt! and Nokogiri based solution:

http://pastie.org/374542

As far as I can tell (the script is generating a several MB single XML
file, so it's not trivial do determine) it is working well and it's also
complete.
If you need the XML file, drop me a msg.

A writeup will follow on my blog soon, will post a message here.

Cheers,
Peter
___
http://www.rubyrailways.com

Daniel Moore · Jan 31, 2009

This quiz was an exercise in Web Scraping
[http://en.wikipedia.org/wiki/Web_scraping]. As more and more
information becomes available on the internet it is useful to have a
programatic way to access it. This can be done through web APIs, but
not all websites have such APIs available or not all information is
available via the APIs. Scraping may be against the terms of use for
some sites and smaller sites may suffer if large amounts of data are
being pulled, so be sure to ask permission and be prudent!

The one solution to this week's quiz come from Peter Szinek using
scRUBYt [http://scrubyt.org/]. Despite being just over fifty lines
long there is a lot packed in here, so let's dive in.

Here we begin by seting up a scRUBYt Extractor and set it to get the
main Ruby Quiz 2 page.

#scrape the stuff with sRUBYt!
data = Scrubyt::Extractor.define do
fetch 'http://splatbang.com/rubyquiz/'

The 'quiz' sets up a node in the XML document, retrieving elements
that match the XPath. This yields all the links in the side area, that
is, links to all the quizzes.

quiz "//div[@id='side']/ol/li/a[1]" do
link_url do
quiz_id /id=(\d+)/
quiz_link /id=(.+)/ do

These next two sections download the description and summary for each
quiz. They are saved into temporary files to be loaded into the XML
document at the end. Notice the use of lambda, it takes in the match
from /id=(.+)/ in the quiz_link. So for example when the link is
'quiz.rhtml?id=157_The_Smallest_Circle' it matches
'157_The_Smallest_Circle' and passes it into the lambda which returns
it as "http://splatbang.com/rubyquiz/157_The_Smallest_Circle/quiz.txt"
which is the text for the quiz. The summary is gathered in a likewise
fashion.

quiz_desc_url(lambda {|quiz_dir|
"http://splatbang.com/rubyquiz/#{quiz_dir}/quiz.txt"}, :type =>
:script) do
quiz_dl 'descriptions', :type => :download
end
quiz_summary_url(lambda {|quiz_dir|
"http://splatbang.com/rubyquiz/#{quiz_dir}/summ.txt"}, :type =>
:script) do
quiz_dl 'summaries', :type => :download
end
end
end

This next part gets all the solutions for each quiz. It follows the
link_url from the side area. Once on the new page it creates a node
for each solution, again by using XPath to get all the links in the
list on the side. It populates each solution with an author: the text
from the html anchor tag. It populates the ruby_talk_reference with
the href attribute of the tag. In order to get the solution text it
follows (resolves) the link and returns the text within the "//pre[1]"
element, again using XPath to specify. The text node is added as a
child node to the solution.

quiz_detail :resolve => "http://splatbang.com/rubyquiz" do
solution "/html/body/div/div[2]/ol/li/a" do
author lambda {|solution_link_text| solution_link_text},
:type => :script
ruby_talk_reference "href", :type => :attribute
solution_detail :resolve => :full do
text "//pre[1]"
end
end
end

This select_indices limits the scope of the quiz gathering to just the
first three, usefull for testing since we don't want to have to
traverse the entire site to see if code works. I removed it when
gathering the full dataset.

end.select_indices(0..2)
end

This next part, using Nokogiri, loads the files that were saved
temporarily and inserts them into the XML document. It also removes
the link_url nodes to clean up the final output to match the output
specified in the quiz.

result = Nokogiri::XML(data.to_xml)

(result/"//quiz").each do |quiz|
quiz_id = quiz.text[/\s(\d+)\s/,1].to_i
file_index = quiz_id > 157 ? "_#{(quiz_id - 157)}" : ""
(quiz/"//link_url").first.unlink

desc = Nokogiri::XML::Element.new("description", quiz.document)
desc.content =open("descriptions/quiz#{file_index}.txt").read
quiz.add_child(desc)

summary = Nokogiri::XML::Element.new("summary", quiz.document)
summary.content =open("summaries/summ#{file_index}.txt").read
quiz.add_child(summary)
end

And finally save the result to an xml file on the filesystem:

open("ruby_quiz_archive.xml", "w") {|f| f.write result}

This was my first experience with scRUBYt and it took me a little
while to "get it". It packs a lot of power into a concise syntax and
is definitely worth considering for your next web scraping needs.

James Gray · Jan 31, 2009

This quiz was an exercise in Web Scraping
[http://en.wikipedia.org/wiki/Web_scraping].

Great summary Daniel. You've got the new quiz off to a great start.

James Edward Gray II

[QUIZ] Price Ranges (#164)	14	May 31, 2008
[QUIZ] Uptime Since... (#174)	24	Aug 23, 2008
[QUIZ] Long Division (#180)	6	Oct 17, 2008
[QUIZ] Symbolify (#169)	74	Jul 11, 2008
[QUIZ] Bowling Scores (#181)	8	Oct 24, 2008
[QUIZ] AnsiString (#185)	7	Dec 5, 2008
[QUIZ] Ruby BASIC (#228)	1	Jan 30, 2010
[QUIZ] Circle Drawing (#166)	18	Jun 13, 2008

[QUIZ] Gathering Ruby Quiz 2 Data (#189)

Daniel Moore

Robert Dober

Matthew Moss

Gregory Brown

Daniel Moore

peter

Gregory Brown

Daniel Moore

peter

Daniel Moore

James Gray

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads