R
R. David Murray
aslkoi fdsda said:I would like to read just the headers out of a newsgroup.
Being a Python newbie, I was wondering if this is possible and how difficult
it would be for a novice Python programmer.
Thanks for any reply!
[HTML part not displayed]
It's not hard at all. I've pulled some bits and pieces out of the
self-written minimalist newsreader I'm responding to your post with,
and added some example usage code. It should head you in the right
direction, and there's no advanced python involved here:
--------------------------------------------------------------
from email.parser import FeedParser
from nntplib import NNTP
from rfc822 import mktime_tz, parsedate_tz
class Article:
def __init__(self):
self.num = None
self.subject = None
self.poster = None
self.date = None
self.id = None
self.references = []
self.size = 0
self.lines = 0
self.newsgroups = []
def loadFromOverview(self, overview):
(self.subject, self.poster, self.date, self.id,
self.references, self.size, self.lines) = overview[1:]
try: self.date = mktime_tz(parsedate_tz(self.date))
except ValueError:
print "ERROR in date parsing (%s)" % self.date
self.date = None
return overview[0]
def loadMessage(self, server):
msgparser = FeedParser()
resp, num, id, lines = server.head(self.id)
msgparser.feed('\n'.join(lines)+'\n\n')
resp, num, id, lines = server.body(self.id)
msgparser.feed('\n'.join(lines)+'\n')
self.message = msgparser.close()
server = NNTP('news.gmane.org')
resp, count, first, last, name = server.group('gmane.comp.python.ideas')
resp, headersets = server.xover(str(int(last)-100), last)
articles = []
for h in headersets:
a = Article()
artnum = a.loadFromOverview(h)
articles.append(a)
anarticle = articles[0]
anarticle.loadMessage(server)
print dir(anarticle.message)
for header in anarticle.message.keys():
print "%s: %s" % (header, anarticle.message[header])
--------------------------------------------------------------
Heh, looking at this I remember it is several-years-old code and really
needs to be revisited and updated...so I'm not going to claim
that this is the best code that could be written for this task
Oh, and there's more involved in actually printing the headers if you
need to deal with non-ASCII characters ("encoded words") in the headers.
(That's in the docs for the email module, though it took me a bit to
figure out how to do it right.)