Getting Newsgroup Headers

Discussion in 'Python' started by R. David Murray, Apr 17, 2009.

  1. aslkoi fdsda <> wrote:
    > I would like to read just the headers out of a newsgroup.
    > Being a Python newbie, I was wondering if this is possible and how difficult
    > it would be for a novice Python programmer.
    > Thanks for any reply!
    > [HTML part not displayed]


    It's not hard at all. I've pulled some bits and pieces out of the
    self-written minimalist newsreader I'm responding to your post with,
    and added some example usage code. It should head you in the right
    direction, and there's no advanced python involved here:

    --------------------------------------------------------------
    from email.parser import FeedParser
    from nntplib import NNTP
    from rfc822 import mktime_tz, parsedate_tz

    class Article:

    def __init__(self):
    self.num = None
    self.subject = None
    self.poster = None
    self.date = None
    self.id = None
    self.references = []
    self.size = 0
    self.lines = 0
    self.newsgroups = []

    def loadFromOverview(self, overview):
    (self.subject, self.poster, self.date, self.id,
    self.references, self.size, self.lines) = overview[1:]
    try: self.date = mktime_tz(parsedate_tz(self.date))
    except ValueError:
    print "ERROR in date parsing (%s)" % self.date
    self.date = None
    return overview[0]


    def loadMessage(self, server):
    msgparser = FeedParser()
    resp, num, id, lines = server.head(self.id)
    msgparser.feed('\n'.join(lines)+'\n\n')
    resp, num, id, lines = server.body(self.id)
    msgparser.feed('\n'.join(lines)+'\n')
    self.message = msgparser.close()



    server = NNTP('news.gmane.org')
    resp, count, first, last, name = server.group('gmane.comp.python.ideas')
    resp, headersets = server.xover(str(int(last)-100), last)
    articles = []
    for h in headersets:
    a = Article()
    artnum = a.loadFromOverview(h)
    articles.append(a)

    anarticle = articles[0]
    anarticle.loadMessage(server)
    print dir(anarticle.message)
    for header in anarticle.message.keys():
    print "%s: %s" % (header, anarticle.message[header])

    --------------------------------------------------------------

    Heh, looking at this I remember it is several-years-old code and really
    needs to be revisited and updated...so I'm not going to claim
    that this is the best code that could be written for this task :)

    Oh, and there's more involved in actually printing the headers if you
    need to deal with non-ASCII characters ("encoded words") in the headers.
    (That's in the docs for the email module, though it took me a bit to
    figure out how to do it right.)

    --
    R. David Murray http://www.bitdance.com
    R. David Murray, Apr 17, 2009
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. dont bother
    Replies:
    0
    Views:
    767
    dont bother
    Mar 3, 2004
  2. jakecjacobson

    Getting/Setting HTTP Headers

    jakecjacobson, Sep 17, 2008, in forum: Python
    Replies:
    0
    Views:
    269
    jakecjacobson
    Sep 17, 2008
  3. Phil
    Replies:
    4
    Views:
    660
    Gabriel Genellina
    Jan 17, 2010
  4. Ian
    Replies:
    2
    Views:
    1,910
  5. George

    getting NewsGroup articles,

    George, Jun 29, 2005, in forum: Perl Misc
    Replies:
    1
    Views:
    77
    Joe Smith
    Jul 3, 2005
Loading...

Share This Page