How do I Extract Attachment from Newsgroup Message

snewman18 · May 31, 2007

I'm parsing NNTP messages that have XML file attachments. How can I
extract the encoded text back into a file? I looked for a solution
with mimetools (the way I'd approach it for email) but found nothing.

Here's a long snippet of the message:
('220 116431 <[email protected]> article', '116431',
'<[email protected]>', ['MIME-Version: 1.0', 'Message-ID:
<[email protected]>', 'Content-Type: Multipart/Mixed;', '
boundary="------------Boundary-00=_A5NJCP3FX6Y5BI3BH890"', 'Date: Thu,
24 May 2007 07:41:34 -0400 (EDT)', 'From: Newsclip <[email protected]>',
'Path: newsclip.ap.org!flounder.ap.org!flounder', 'Newsgroups:
ap.spanish.online,ap.spanish.online.business', 'Keywords: MUN ECO
PETROLEO PRECIOS', 'Subject: MUN ECO PETROLEO PRECIOS', 'Summary: ',
'Lines: 108', 'Xref: newsclip.ap.org ap.spanish.online:938298
ap.spanish.online.business:116431', '', '', '--------------
Boundary-00=_A5NJCP3FX6Y5BI3BH890', 'Content-Type: Text/Plain',
'Content-Transfer-Encoding: 8bit', 'Content-Description: text,
unencoded', '', '(AP) Precios del crudo se mueven sin rumbo claro',
'Por GEORGE JAHN', 'VIENA', 'Los precios

.... (truncated for length) ...

'', '___', '', 'Editores: Derrick Ho, periodista de la AP en Singapur,
contribuy\xf3 con esta informaci\xf3n.', '', '', '--------------
Boundary-00=_A5NJCP3FX6Y5BI3BH890', 'Content-Type: Text/Xml', 'Content-
Transfer-Encoding: base64', 'Content-Description: text, base64
encoded', '',
'PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPCFET0NUWVBFIG5pdGYgU1lT',
'VEVNICJuaXRmLmR0ZCI+CjxuaXRmPgogPGhlYWQ
+CiAgPG1ldGEgbmFtZT0iYXAtdHJhbnNyZWYi',
'IGNvbnRlbnQ9IlNQMTQ3MiIvPgogIDxtZXRhIG5hbWU9ImFwLW9yaWdpbiIgY29udGVudD0ic3Bh',
'bm9sIi8+CiAgPG1ldGEgbmFtZT0iYXAtc2VsZWN0b3IiIGNvbn

kyosohma · May 31, 2007

I'm parsing NNTP messages that have XML file attachments. How can I
extract the encoded text back into a file? I looked for a solution
with mimetools (the way I'd approach it for email) but found nothing.

Here's a long snippet of the message:

('220 116431 <[email protected]> article', '116431',
'<[email protected]>', ['MIME-Version: 1.0', 'Message-ID:
<[email protected]>', 'Content-Type: Multipart/Mixed;', '
boundary="------------Boundary-00=_A5NJCP3FX6Y5BI3BH890"', 'Date: Thu,
24 May 2007 07:41:34 -0400 (EDT)', 'From: Newsclip <[email protected]>',
'Path: newsclip.ap.org!flounder.ap.org!flounder', 'Newsgroups:
ap.spanish.online,ap.spanish.online.business', 'Keywords: MUN ECO
PETROLEO PRECIOS', 'Subject: MUN ECO PETROLEO PRECIOS', 'Summary: ',
'Lines: 108', 'Xref: newsclip.ap.org ap.spanish.online:938298
ap.spanish.online.business:116431', '', '', '--------------
Boundary-00=_A5NJCP3FX6Y5BI3BH890', 'Content-Type: Text/Plain',
'Content-Transfer-Encoding: 8bit', 'Content-Description: text,
unencoded', '', '(AP) Precios del crudo se mueven sin rumbo claro',
'Por GEORGE JAHN', 'VIENA', 'Los precios

... (truncated for length) ...

'', '___', '', 'Editores: Derrick Ho, periodista de la AP en Singapur,
contribuy\xf3 con esta informaci\xf3n.', '', '', '--------------
Boundary-00=_A5NJCP3FX6Y5BI3BH890', 'Content-Type: Text/Xml', 'Content-
Transfer-Encoding: base64', 'Content-Description: text, base64
encoded', '',
'PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPCFET0NUWVBFIG5pdGYgU1lT',
'VEVNICJuaXRmLmR0ZCI+CjxuaXRmPgogPGhlYWQ
+CiAgPG1ldGEgbmFtZT0iYXAtdHJhbnNyZWYi',
'IGNvbnRlbnQ9IlNQMTQ3MiIvPgogIDxtZXRhIG5hbWU9ImFwLW9yaWdpbiIgY29udGVudD0ic3Bh',
'bm9sIi8+CiAgPG1ldGEgbmFtZT0iYXAtc2VsZWN0b3IiIGNvbn

This looks like what you might be looking for:
http://mail.python.org/pipermail/python-list/2004-June/265018.html

Not sure if you'll need this or not, but here's some info on encoding/
decoding files:
http://www.jorendorff.com/articles/unicode/python.html

There are lots of ways to parse xml. I use the minidom module myself.

Mike

Dave Borne · May 31, 2007

I looked for a solution

with mimetools (the way I'd approach it for email) but found nothing. ....
<[email protected]>', 'Content-Type: Multipart/Mixed;', '
boundary="------------Boundary-00=_A5NJCP3FX6Y5BI3BH890"', 'Date: Thu,

....

Playing with

data = n.article('116431')[3]

and email.message_from_string, there seems to be a problem with the
content type being split up. I was able to get a multipart message by
using

msg = email.message_from_string('\n'.join(data).replace(';\n', ';'))

(and adding an ending boundary to your sample data).
This is a bit hackish and could cause problems if there are
semicolons inside the message body (no warranties expressed or
implied, etc.)

Hope this helps,
-Dave

big trouble getting data from attachment in email	0	Feb 25, 2011
Client AXIS2 Extract Attachment MTOM Base64Binary response	0	May 21, 2013
How Do I get my Python script to attach multiple files and send as asingle email	3	Aug 8, 2013
Remove some images from a mail message	0	Apr 21, 2013
way to extract only the message from pop3	9	Apr 3, 2007
Problem with base64 mimeparts and email.*	0	Apr 22, 2009
Problem with Email::MIME and Email::MIME::Attachment::Stripper	0	May 28, 2006
Extract parts from a Multipart/Related email using Java Mail API	9	Sep 19, 2007

How do I Extract Attachment from Newsgroup Message

snewman18

kyosohma

Dave Borne

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads