S
Stedwick
This is just a whimsical question, really. I've been working on a
website where people can vote on episodes of TV shows (and I happen to
be a big Star Trek fan, so I'm starting there ha ha). By the way, the
website is, literally, 40 lines of code. I'm loving Ruby on Rails so
far.
http://brocoum.com/voter/startrekvoyager/episodes
Anyway, I need to extract the episode descriptions for the tool tips,
and the descriptions come from TV.com. Unfortunately, this has turned
out to be rather harder than it looks!
http://www.tv.com/star-trek-deep-sp....html?season=0&tag=season_dropdown;dropdown;7
If any of you feel up to the challenge, see if you can streamline my
code below, or write better code yourself. I can't help but think that
there's an easier way to do this!
# open html file
f = File.read("episode_guide.html")
# keep track of the number of descriptions found
count = 0
# each description is enclosed in a multiline <p> </p> tag
f.scan(/<p>.*?<\/p>/m) do |match|
# start with a blank description
desc = ''
# i want to condense each desc into a single line, and remove the
stardate info
match.each_line {|m|
# remove stardate...<br /> because the stardate is not always on
its own line
m.sub!(/^.*<br \/>/,'')
# remove unnecessary whitespace from beginning
m.sub!(/^\s*/,'')
# add non-stardate and non-blank lines to the desc and remove
trailing \n
desc += m.chomp unless m =~ /stardate:/i or !(m =~ /\w/)
}
# remove html tags
desc.gsub!(/<.*?>/,'')
# fix periods ie. "Hi there.I love you." => "Hi there. I love you."
# these period problems were caused by concatenating the paragraphs
above into one line
desc.gsub!(/(\w\.)(\w)/,'\1 \2')
# fix stupid html type stuff
desc.gsub!(/ /," ")
desc.gsub!(/'/,"'")
# make all spaces single
desc.gsub!(/ {2,}/,' ')
# output finished description followed by blank line and increment
counter
puts desc + "\n\n"
count += 1
end
# make sure i got all 176 episode descriptions
puts count
Philip
website where people can vote on episodes of TV shows (and I happen to
be a big Star Trek fan, so I'm starting there ha ha). By the way, the
website is, literally, 40 lines of code. I'm loving Ruby on Rails so
far.
http://brocoum.com/voter/startrekvoyager/episodes
Anyway, I need to extract the episode descriptions for the tool tips,
and the descriptions come from TV.com. Unfortunately, this has turned
out to be rather harder than it looks!
http://www.tv.com/star-trek-deep-sp....html?season=0&tag=season_dropdown;dropdown;7
If any of you feel up to the challenge, see if you can streamline my
code below, or write better code yourself. I can't help but think that
there's an easier way to do this!
# open html file
f = File.read("episode_guide.html")
# keep track of the number of descriptions found
count = 0
# each description is enclosed in a multiline <p> </p> tag
f.scan(/<p>.*?<\/p>/m) do |match|
# start with a blank description
desc = ''
# i want to condense each desc into a single line, and remove the
stardate info
match.each_line {|m|
# remove stardate...<br /> because the stardate is not always on
its own line
m.sub!(/^.*<br \/>/,'')
# remove unnecessary whitespace from beginning
m.sub!(/^\s*/,'')
# add non-stardate and non-blank lines to the desc and remove
trailing \n
desc += m.chomp unless m =~ /stardate:/i or !(m =~ /\w/)
}
# remove html tags
desc.gsub!(/<.*?>/,'')
# fix periods ie. "Hi there.I love you." => "Hi there. I love you."
# these period problems were caused by concatenating the paragraphs
above into one line
desc.gsub!(/(\w\.)(\w)/,'\1 \2')
# fix stupid html type stuff
desc.gsub!(/ /," ")
desc.gsub!(/'/,"'")
# make all spaces single
desc.gsub!(/ {2,}/,' ')
# output finished description followed by blank line and increment
counter
puts desc + "\n\n"
count += 1
end
# make sure i got all 176 episode descriptions
puts count
Philip