Getting Newsgroup Headers

R

R. David Murray

aslkoi fdsda said:
I would like to read just the headers out of a newsgroup.
Being a Python newbie, I was wondering if this is possible and how difficult
it would be for a novice Python programmer.
Thanks for any reply!
[HTML part not displayed]

It's not hard at all. I've pulled some bits and pieces out of the
self-written minimalist newsreader I'm responding to your post with,
and added some example usage code. It should head you in the right
direction, and there's no advanced python involved here:

--------------------------------------------------------------
from email.parser import FeedParser
from nntplib import NNTP
from rfc822 import mktime_tz, parsedate_tz

class Article:

def __init__(self):
self.num = None
self.subject = None
self.poster = None
self.date = None
self.id = None
self.references = []
self.size = 0
self.lines = 0
self.newsgroups = []

def loadFromOverview(self, overview):
(self.subject, self.poster, self.date, self.id,
self.references, self.size, self.lines) = overview[1:]
try: self.date = mktime_tz(parsedate_tz(self.date))
except ValueError:
print "ERROR in date parsing (%s)" % self.date
self.date = None
return overview[0]


def loadMessage(self, server):
msgparser = FeedParser()
resp, num, id, lines = server.head(self.id)
msgparser.feed('\n'.join(lines)+'\n\n')
resp, num, id, lines = server.body(self.id)
msgparser.feed('\n'.join(lines)+'\n')
self.message = msgparser.close()



server = NNTP('news.gmane.org')
resp, count, first, last, name = server.group('gmane.comp.python.ideas')
resp, headersets = server.xover(str(int(last)-100), last)
articles = []
for h in headersets:
a = Article()
artnum = a.loadFromOverview(h)
articles.append(a)

anarticle = articles[0]
anarticle.loadMessage(server)
print dir(anarticle.message)
for header in anarticle.message.keys():
print "%s: %s" % (header, anarticle.message[header])

--------------------------------------------------------------

Heh, looking at this I remember it is several-years-old code and really
needs to be revisited and updated...so I'm not going to claim
that this is the best code that could be written for this task :)

Oh, and there's more involved in actually printing the headers if you
need to deal with non-ASCII characters ("encoded words") in the headers.
(That's in the docs for the email module, though it took me a bit to
figure out how to do it right.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top