a quick program to download tv listings

E

Erik Lechak

Hello all,

Is there anyone out there that has written anything in python to
download tv listings (no XML)?

All the tv listing stuff that I can find is way too complex for my
taste (includes XML overkill or is not in python). I wrote the test
program below and it works. I am just curious if anyone has a more
robust python implementation before I take the time to add all the
bells and whistles.

Chron is just a class like datetime, but I wrote it to accept just
about every english way of expressing time. The important thing is to
put the epoch time in the URL.

If anyone is interested in Chron, I can post it or send it to them.

Hope you enjoy the code,
Erik Lechak

import urllib
import re
from pygra.Chron import Chron

class Show:
'''
Just a structure to hold program information
'''
def __init__(self):
name=""
channel=""
station=""
start=""
end=""


def getGrid(timestring):
shows=[]
t=Chron(timestring)
f = urllib.urlopen('http://tv.yahoo.com/grid?lineup=us_DMA560&genres=&dur=&starttime='+str(int(t.epoch))+'&.intl=us')
data = f.read()
data=[d.strip('\n') for d in data]
data="".join(data)
data=data.split("</A>")
for d in data:
progs = re.findall('<A HRef="\/tvpdb\?d=tvp&id=(.*)',d)
for s in progs:
m= re.search('(\d*)&cf.*channels=us_([^&]*).*&chname=([^\+]*)\+(\d+)&progutn=(\d*)&.*>(.*)',s)
show=Show()
show.name=m.group(6)
show.start=m.group(5)
show.channel=int(m.group(4))
show.station=m.group(3)
shows.append(show)
return shows


shows =getGrid('now')

for s in [s for s in shows if s.channel==17]:
print s.name
 
S

Sean 'Shaleh' Perry

Hello all,

Is there anyone out there that has written anything in python to
download tv listings (no XML)?

All the tv listing stuff that I can find is way too complex for my
taste (includes XML overkill or is not in python). I wrote the test
program below and it works. I am just curious if anyone has a more
robust python implementation before I take the time to add all the
bells and whistles.

web services, SOAP and the like are making XML a part of life, whether we like
it or not (-:
Chron is just a class like datetime, but I wrote it to accept just
about every english way of expressing time. The important thing is to
put the epoch time in the URL.

If anyone is interested in Chron, I can post it or send it to them.

You should look at mx.DateTime from the egenix folks. Great set of classes
with awesome docs (and it is free too).
 
P

pythonhda

On 10 Sep 2003 23:37:47 -0700
Hello all,

Is there anyone out there that has written anything in python to
download tv listings (no XML)?

All the tv listing stuff that I can find is way too complex for my
taste (includes XML overkill or is not in python). I wrote the test
program below and it works. I am just curious if anyone has a more
robust python implementation before I take the time to add all the
bells and whistles.

...

I felt the same way about the XML stuff, so what I did for a project was subclass the sgmllib.SGMLParser and use it to parse channel listings from tvlistings.zap2it.com. Their HTML was pretty bad so I had to write "clean up scripts" to strip out all the crap before I used the parser though.

Take a look at ClientCookie [ http://wwwsearch.sourceforge.net/ClientCookie/ ] if you want to be able to login and personalize your listings before you download the HTML.
 
E

Erik Lechak

Hello,

Thanks for the info. That will come in handy. There are a lot of
other sources that use Cookies. I have an stock options program that
needed cookies.

I redid my code so that you don't need any of my libraries to run it.
It's easy to follow. You may want to look at yahoo tv listings first
to get your location id (see the variable location). Then just
execute the function findShow('showname'). 'showname' is just a re so
you can search for parts of show titles. This is the smallest piece
of code I have seen that actually gets you your tv listings. It
downloads data for 24 hours (yahoo shows 3 hours grids, 10800 secs is
3 hours do that 8 times you get 24 hours).

I took some pickle stuff out of the code to keep it small. If you are
going to use the code write to me (or pickle the shows list yourself)
to be kind to yahoo's bandwidth.

Thanks again,
Erik Lechak

Here it is:

import urllib
import re
import time

class Show:
'''
Just a structure to hold program information
'''
def __init__(self):
name=""
channel=""
station=""
start=""
end=""


def getGrid(epoch,location='us_NC61376'):
shows=[]
f = urllib.urlopen('http://tv.yahoo.com/grid?lineup='+location+'&genres=&dur=&starttime='+str(epoch)+'&.intl=us')
data = f.read()
data=[d.strip('\n') for d in data]
data="".join(data)
data=data.split("</A>")
for d in data:
progs = re.findall('<A HRef="\/tvpdb\?d=tvp&id=(.*)',d)
for s in progs:
m= re.search('(\d*)&cf.*channels=us_([^&]*).*&chname=([^\+]*)\+(\d+)&progutn=(\d*)&.*>(.*)',s)
show=Show()
show.name=m.group(6)
show.start=float(m.group(5))
show.channel=int(m.group(4))
show.station=m.group(3)
shows.append(show)
return shows


def findShow(name):

t=int(time.time())
shows=[]

for a in range(8):
shows.extend( getGrid(t) )
t = t+10800

for s in [s for s in shows if re.search(name,s.name,re.I)]:
print s.name
print s.channel
print s.station
print time.ctime(s.start)
print


findShow('judy')
 
R

Rain Dog

Is there anyone out there that has written anything in python to
download tv listings (no XML)? ... I wrote the test
program below and it works. I am just curious if anyone has a more
robust python implementation before I take the time to add all the
bells and whistles.

Here's a version that uses an HTMLParser and does its own filtering
by channel, rather than setting a cookie.

Some sample output:

% tvsearch "college football" news

College Football ABC 7 12:30 PM Sat Sep 13
College Football CBS 2 12:30 PM Sat Sep 13
Eyewitness News ABC 7 4:00 PM Sat Sep 13
CBS 2 News at 5:00 CBS 2 5:00 PM Sat Sep 13
College Football ABC 7 5:00 PM Sat Sep 13
Channel 4 News NBC 4 5:00 PM Sat Sep 13
CBS Evening News CBS 2 5:30 PM Sat Sep 13


---------- 8< ---------- 8< ---------- 8< ---------- 8< ----------

#!/usr/bin/env python

import formatter, re, time, urllib
from htmllib import HTMLParser


# Channel lineup (leave empty to search all channels)
CHANNELS = [2, 3, 4, 5, 7, 9, 11, 13, 18, 22, 30, 32, 34, 35, 36, 39,
40, 41, 42, 43, 44, 46, 50, 57, 62]

# Yahoo location code
LOCATION = 'us_CA57315'

# Yahoo TV listing URL
YAHOO_TV_URL = ('http://tv.yahoo.com/grid?lineup=' + LOCATION
+ '&starttime=%(epoch)d&.intl=us')


class Show(object):

'''Just a structure to hold program information'''

__slots__ = ('name', 'channel', 'station', 'start', 'end')

def __init__(self, **kwargs):
for k, v in kwargs.items():
self.__setattr__(k, v)

def __str__(self):
showTime=time.strftime('%I:%M %p %a %b %d', time.localtime(self.start))
if showTime[0] == '0':
showTime = ' ' + showTime[1:]
return '%-35s %8s %-4d %-s' % (self.name, self.station,
self.channel, showTime)


class YahooTVParser(HTMLParser):

'''Minimal HTML parser for Yahoo TV listings'''

showRE = re.compile('\/tvpdb\?d=tvp&id=(.*)')
showInfoRE = re.compile('(\d*)&cf.*channels=us_([^&]*).*'
'&chname=([^\+]*)\+(\d+)&progutn=(\d*)')

def __init__(self):
HTMLParser.__init__(self, formatter.NullFormatter())
self.shows = []
self.inShow = 0

def start_a(self, attrs): # <A> handler
'''If the tag's HREF matches showRE, record the show info.'''
self.newShow = None
self.showName = ''

# Check if the HREF matches a show.
for k, v in attrs:
if k == 'href':
url = ''.join(v.split('\n'))
if self.showRE.search(url):
m = self.showInfoRE.search(url)
if m:
# Create a new Show--its name isn't known yet.
self.newShow = Show(start=float(m.group(5)),
channel=int(m.group(4)), station=m.group(3))
self.inShow = 1
break

def end_a(self): # </A> handler
'''If done with a show, record its name and add it to the list.'''
if self.inShow and self.showName:
self.newShow.name = self.showName
self.shows.append(self.newShow)
self.inShow = 0

def handle_data(self, text):
'''Handle the data between, e.g., <A> and </A> tags.'''
if self.inShow:
self.showName += text


def getGrid(epoch):
url = YAHOO_TV_URL % vars()
parser = YahooTVParser()
parser.feed(urllib.urlopen(url).read())
parser.close()
return parser.shows


def findShows(patterns):
isMatchingShow = None
if patterns:
nameRE = re.compile('|'.join(['(%s)' % n for n in patterns]), re.I)
if CHANNELS:
def isMatchingShow(show):
return (show.channel in CHANNELS) and nameRE.search(show.name)
else:
def isMatchingShow(show):
return (nameRE.search(show.name) is not None)
elif CHANNELS:
def isMatchingShow(show):
return (show.channel in CHANNELS)

THREE_HOURS = 3 * 60 * 60
ONE_WEEK = THREE_HOURS * 8 * 7
startTime = int(time.time())
endTime = startTime + ONE_WEEK

for h in range(startTime, endTime, THREE_HOURS):
allShows = getGrid(h)
# Print matching shows sorted by starting time.
if isMatchingShow is not None:
shows = [(s.start, s) for s in allShows if isMatchingShow(s)]
else:
shows = [(s.start, s) for s in allShows]
shows.sort()
for t, s in shows:
print s


def main():
import os.path, sys

args = sys.argv[1:]
if '-h' in args:
sys.stderr.write("Usage: %s [PATTERN]...\n"
% os.path.basename(sys.argv[0]))
sys.exit(1)

try: findShows(args)
except KeyboardInterrupt: pass
sys.exit(0)


if __name__ == '__main__':
main()
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top