a quick program to download tv listings

Discussion in 'Python' started by Erik Lechak, Sep 11, 2003.

  1. Erik Lechak

    Erik Lechak Guest

    Hello all,

    Is there anyone out there that has written anything in python to
    download tv listings (no XML)?

    All the tv listing stuff that I can find is way too complex for my
    taste (includes XML overkill or is not in python). I wrote the test
    program below and it works. I am just curious if anyone has a more
    robust python implementation before I take the time to add all the
    bells and whistles.

    Chron is just a class like datetime, but I wrote it to accept just
    about every english way of expressing time. The important thing is to
    put the epoch time in the URL.

    If anyone is interested in Chron, I can post it or send it to them.

    Hope you enjoy the code,
    Erik Lechak

    import urllib
    import re
    from pygra.Chron import Chron

    class Show:
    '''
    Just a structure to hold program information
    '''
    def __init__(self):
    name=""
    channel=""
    station=""
    start=""
    end=""


    def getGrid(timestring):
    shows=[]
    t=Chron(timestring)
    f = urllib.urlopen('http://tv.yahoo.com/grid?lineup=us_DMA560&genres=&dur=&starttime='+str(int(t.epoch))+'&.intl=us')
    data = f.read()
    data=[d.strip('\n') for d in data]
    data="".join(data)
    data=data.split("</A>")
    for d in data:
    progs = re.findall('<A HRef="\/tvpdb\?d=tvp&id=(.*)',d)
    for s in progs:
    m= re.search('(\d*)&cf.*channels=us_([^&]*).*&chname=([^\+]*)\+(\d+)&progutn=(\d*)&.*>(.*)',s)
    show=Show()
    show.name=m.group(6)
    show.start=m.group(5)
    show.channel=int(m.group(4))
    show.station=m.group(3)
    shows.append(show)
    return shows


    shows =getGrid('now')

    for s in [s for s in shows if s.channel==17]:
    print s.name
     
    Erik Lechak, Sep 11, 2003
    #1
    1. Advertising

  2. other implementation of time Re: a quick program to download tv listings

    On Wednesday 10 September 2003 23:37, Erik Lechak wrote:
    > Hello all,
    >
    > Is there anyone out there that has written anything in python to
    > download tv listings (no XML)?
    >
    > All the tv listing stuff that I can find is way too complex for my
    > taste (includes XML overkill or is not in python). I wrote the test
    > program below and it works. I am just curious if anyone has a more
    > robust python implementation before I take the time to add all the
    > bells and whistles.
    >


    web services, SOAP and the like are making XML a part of life, whether we like
    it or not (-:

    > Chron is just a class like datetime, but I wrote it to accept just
    > about every english way of expressing time. The important thing is to
    > put the epoch time in the URL.
    >
    > If anyone is interested in Chron, I can post it or send it to them.
    >


    You should look at mx.DateTime from the egenix folks. Great set of classes
    with awesome docs (and it is free too).
     
    Sean 'Shaleh' Perry, Sep 11, 2003
    #2
    1. Advertising

  3. Erik Lechak

    pythonhda Guest

    On 10 Sep 2003 23:37:47 -0700
    (Erik Lechak) wrote:

    > Hello all,
    >
    > Is there anyone out there that has written anything in python to
    > download tv listings (no XML)?
    >
    > All the tv listing stuff that I can find is way too complex for my
    > taste (includes XML overkill or is not in python). I wrote the test
    > program below and it works. I am just curious if anyone has a more
    > robust python implementation before I take the time to add all the
    > bells and whistles.
    >
    > ...


    I felt the same way about the XML stuff, so what I did for a project was subclass the sgmllib.SGMLParser and use it to parse channel listings from tvlistings.zap2it.com. Their HTML was pretty bad so I had to write "clean up scripts" to strip out all the crap before I used the parser though.

    Take a look at ClientCookie [ http://wwwsearch.sourceforge.net/ClientCookie/ ] if you want to be able to login and personalize your listings before you download the HTML.
     
    pythonhda, Sep 12, 2003
    #3
  4. Erik Lechak

    Erik Lechak Guest

    Hello,

    >>Take a look at ClientCookie


    Thanks for the info. That will come in handy. There are a lot of
    other sources that use Cookies. I have an stock options program that
    needed cookies.

    I redid my code so that you don't need any of my libraries to run it.
    It's easy to follow. You may want to look at yahoo tv listings first
    to get your location id (see the variable location). Then just
    execute the function findShow('showname'). 'showname' is just a re so
    you can search for parts of show titles. This is the smallest piece
    of code I have seen that actually gets you your tv listings. It
    downloads data for 24 hours (yahoo shows 3 hours grids, 10800 secs is
    3 hours do that 8 times you get 24 hours).

    I took some pickle stuff out of the code to keep it small. If you are
    going to use the code write to me (or pickle the shows list yourself)
    to be kind to yahoo's bandwidth.

    Thanks again,
    Erik Lechak

    Here it is:

    import urllib
    import re
    import time

    class Show:
    '''
    Just a structure to hold program information
    '''
    def __init__(self):
    name=""
    channel=""
    station=""
    start=""
    end=""


    def getGrid(epoch,location='us_NC61376'):
    shows=[]
    f = urllib.urlopen('http://tv.yahoo.com/grid?lineup='+location+'&genres=&dur=&starttime='+str(epoch)+'&.intl=us')
    data = f.read()
    data=[d.strip('\n') for d in data]
    data="".join(data)
    data=data.split("</A>")
    for d in data:
    progs = re.findall('<A HRef="\/tvpdb\?d=tvp&id=(.*)',d)
    for s in progs:
    m= re.search('(\d*)&cf.*channels=us_([^&]*).*&chname=([^\+]*)\+(\d+)&progutn=(\d*)&.*>(.*)',s)
    show=Show()
    show.name=m.group(6)
    show.start=float(m.group(5))
    show.channel=int(m.group(4))
    show.station=m.group(3)
    shows.append(show)
    return shows


    def findShow(name):

    t=int(time.time())
    shows=[]

    for a in range(8):
    shows.extend( getGrid(t) )
    t = t+10800

    for s in [s for s in shows if re.search(name,s.name,re.I)]:
    print s.name
    print s.channel
    print s.station
    print time.ctime(s.start)
    print


    findShow('judy')
     
    Erik Lechak, Sep 12, 2003
    #4
  5. Erik Lechak

    Rain Dog Guest

    In article <>,
    (Erik Lechak) wrote:

    > Is there anyone out there that has written anything in python to
    > download tv listings (no XML)? ... I wrote the test
    > program below and it works. I am just curious if anyone has a more
    > robust python implementation before I take the time to add all the
    > bells and whistles.


    Here's a version that uses an HTMLParser and does its own filtering
    by channel, rather than setting a cookie.

    Some sample output:

    % tvsearch "college football" news

    College Football ABC 7 12:30 PM Sat Sep 13
    College Football CBS 2 12:30 PM Sat Sep 13
    Eyewitness News ABC 7 4:00 PM Sat Sep 13
    CBS 2 News at 5:00 CBS 2 5:00 PM Sat Sep 13
    College Football ABC 7 5:00 PM Sat Sep 13
    Channel 4 News NBC 4 5:00 PM Sat Sep 13
    CBS Evening News CBS 2 5:30 PM Sat Sep 13


    ---------- 8< ---------- 8< ---------- 8< ---------- 8< ----------

    #!/usr/bin/env python

    import formatter, re, time, urllib
    from htmllib import HTMLParser


    # Channel lineup (leave empty to search all channels)
    CHANNELS = [2, 3, 4, 5, 7, 9, 11, 13, 18, 22, 30, 32, 34, 35, 36, 39,
    40, 41, 42, 43, 44, 46, 50, 57, 62]

    # Yahoo location code
    LOCATION = 'us_CA57315'

    # Yahoo TV listing URL
    YAHOO_TV_URL = ('http://tv.yahoo.com/grid?lineup=' + LOCATION
    + '&starttime=%(epoch)d&.intl=us')


    class Show(object):

    '''Just a structure to hold program information'''

    __slots__ = ('name', 'channel', 'station', 'start', 'end')

    def __init__(self, **kwargs):
    for k, v in kwargs.items():
    self.__setattr__(k, v)

    def __str__(self):
    showTime=time.strftime('%I:%M %p %a %b %d', time.localtime(self.start))
    if showTime[0] == '0':
    showTime = ' ' + showTime[1:]
    return '%-35s %8s %-4d %-s' % (self.name, self.station,
    self.channel, showTime)


    class YahooTVParser(HTMLParser):

    '''Minimal HTML parser for Yahoo TV listings'''

    showRE = re.compile('\/tvpdb\?d=tvp&id=(.*)')
    showInfoRE = re.compile('(\d*)&cf.*channels=us_([^&]*).*'
    '&chname=([^\+]*)\+(\d+)&progutn=(\d*)')

    def __init__(self):
    HTMLParser.__init__(self, formatter.NullFormatter())
    self.shows = []
    self.inShow = 0

    def start_a(self, attrs): # <A> handler
    '''If the tag's HREF matches showRE, record the show info.'''
    self.newShow = None
    self.showName = ''

    # Check if the HREF matches a show.
    for k, v in attrs:
    if k == 'href':
    url = ''.join(v.split('\n'))
    if self.showRE.search(url):
    m = self.showInfoRE.search(url)
    if m:
    # Create a new Show--its name isn't known yet.
    self.newShow = Show(start=float(m.group(5)),
    channel=int(m.group(4)), station=m.group(3))
    self.inShow = 1
    break

    def end_a(self): # </A> handler
    '''If done with a show, record its name and add it to the list.'''
    if self.inShow and self.showName:
    self.newShow.name = self.showName
    self.shows.append(self.newShow)
    self.inShow = 0

    def handle_data(self, text):
    '''Handle the data between, e.g., <A> and </A> tags.'''
    if self.inShow:
    self.showName += text


    def getGrid(epoch):
    url = YAHOO_TV_URL % vars()
    parser = YahooTVParser()
    parser.feed(urllib.urlopen(url).read())
    parser.close()
    return parser.shows


    def findShows(patterns):
    isMatchingShow = None
    if patterns:
    nameRE = re.compile('|'.join(['(%s)' % n for n in patterns]), re.I)
    if CHANNELS:
    def isMatchingShow(show):
    return (show.channel in CHANNELS) and nameRE.search(show.name)
    else:
    def isMatchingShow(show):
    return (nameRE.search(show.name) is not None)
    elif CHANNELS:
    def isMatchingShow(show):
    return (show.channel in CHANNELS)

    THREE_HOURS = 3 * 60 * 60
    ONE_WEEK = THREE_HOURS * 8 * 7
    startTime = int(time.time())
    endTime = startTime + ONE_WEEK

    for h in range(startTime, endTime, THREE_HOURS):
    allShows = getGrid(h)
    # Print matching shows sorted by starting time.
    if isMatchingShow is not None:
    shows = [(s.start, s) for s in allShows if isMatchingShow(s)]
    else:
    shows = [(s.start, s) for s in allShows]
    shows.sort()
    for t, s in shows:
    print s


    def main():
    import os.path, sys

    args = sys.argv[1:]
    if '-h' in args:
    sys.stderr.write("Usage: %s [PATTERN]...\n"
    % os.path.basename(sys.argv[0]))
    sys.exit(1)

    try: findShows(args)
    except KeyboardInterrupt: pass
    sys.exit(0)


    if __name__ == '__main__':
    main()
     
    Rain Dog, Sep 16, 2003
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Brent
    Replies:
    1
    Views:
    447
    Craig Deelsnyder
    Jun 10, 2005
  2. Blogger Team

    Free Listings in Blog directory

    Blogger Team, Nov 27, 2005, in forum: ASP .Net
    Replies:
    0
    Views:
    366
    Blogger Team
    Nov 27, 2005
  3. Chris McMahon
    Replies:
    11
    Views:
    714
    Chris McMahon
    Jun 27, 2005
  4. Replies:
    0
    Views:
    889
  5. Tony K

    Local TV Listings

    Tony K, May 31, 2007, in forum: ASP .Net
    Replies:
    0
    Views:
    366
    Tony K
    May 31, 2007
Loading...

Share This Page