Parsing Email Headers

Discussion in 'Python' started by T, Mar 11, 2010.

  1. T

    T Guest

    All I'm looking to do is to download messages from a POP account and
    retrieve the sender and subject from their headers. Right now I'm 95%
    of the way there, except I can't seem to figure out how to *just* get
    the headers. Problem is, certain email clients also include headers
    in the message body (i.e. if you're replying to a message), and these
    are all picked up as additional senders/subjects. So, I want to avoid
    processing anything from the message body.

    Here's a sample of what I have:

    # For each line in message
    for j in M.retr(i+1)[1]:
    # Create email message object from returned string
    emailMessage = email.message_from_string(j)
    # Get fields
    fields = emailMessage.keys()
    # If email contains "From" field
    if emailMessage.has_key("From"):
    # Get contents of From field
    from_field = emailMessage.__getitem__("From")

    I also tried using the following, but got the same results:
    emailMessage =
    email.Parser.HeaderParser().parsestr(j, headersonly=True)

    Any help would be appreciated!
    T, Mar 11, 2010
    #1
    1. Advertising

  2. T

    MRAB Guest

    T wrote:
    > All I'm looking to do is to download messages from a POP account and
    > retrieve the sender and subject from their headers. Right now I'm 95%
    > of the way there, except I can't seem to figure out how to *just* get
    > the headers. Problem is, certain email clients also include headers
    > in the message body (i.e. if you're replying to a message), and these
    > are all picked up as additional senders/subjects. So, I want to avoid
    > processing anything from the message body.
    >
    > Here's a sample of what I have:
    >
    > # For each line in message
    > for j in M.retr(i+1)[1]:
    > # Create email message object from returned string
    > emailMessage = email.message_from_string(j)
    > # Get fields
    > fields = emailMessage.keys()
    > # If email contains "From" field
    > if emailMessage.has_key("From"):
    > # Get contents of From field
    > from_field = emailMessage.__getitem__("From")
    >
    > I also tried using the following, but got the same results:
    > emailMessage =
    > email.Parser.HeaderParser().parsestr(j, headersonly=True)
    >
    > Any help would be appreciated!


    If you're using poplib then use ".top" instead of ".retr".
    MRAB, Mar 11, 2010
    #2
    1. Advertising

  3. On 2010-03-11, T <> wrote:
    > All I'm looking to do is to download messages from a POP account and
    > retrieve the sender and subject from their headers. Right now I'm 95%
    > of the way there, except I can't seem to figure out how to *just* get
    > the headers.


    The headers are saparated from the body by a blank line.

    > Problem is, certain email clients also include headers in the message
    > body (i.e. if you're replying to a message), and these are all picked
    > up as additional senders/subjects. So, I want to avoid processing
    > anything from the message body.


    Then stop when you see a blank line.

    Or retreive just the headers.

    --
    Grant Edwards grant.b.edwards Yow! My life is a patio
    at of fun!
    gmail.com
    Grant Edwards, Mar 11, 2010
    #3
  4. T

    T Guest

    On Mar 11, 3:13 pm, MRAB <> wrote:
    > T wrote:
    > > All I'm looking to do is to download messages from a POP account and
    > > retrieve the sender and subject from their headers.  Right now I'm 95%
    > > of the way there, except I can't seem to figure out how to *just* get
    > > the headers.  Problem is, certain email clients also include headers
    > > in the message body (i.e. if you're replying to a message), and these
    > > are all picked up as additional senders/subjects.  So, I want to avoid
    > > processing anything from the message body.

    >
    > > Here's a sample of what I have:

    >
    > >                 # For each line in message
    > >                 for j in M.retr(i+1)[1]:
    > >                     # Create email message object from returned string
    > >                     emailMessage = email.message_from_string(j)
    > >                     # Get fields
    > >                     fields = emailMessage.keys()
    > >                     # If email contains "From" field
    > >                     if emailMessage.has_key("From"):
    > >                         # Get contents of From field
    > >                         from_field = emailMessage.__getitem__("From")

    >
    > > I also tried using the following, but got the same results:
    > >                  emailMessage =
    > > email.Parser.HeaderParser().parsestr(j, headersonly=True)

    >
    > > Any help would be appreciated!

    >
    > If you're using poplib then use ".top" instead of ".retr".


    I'm still having the same issue, even with .top. Am I missing
    something?

    for j in M.top(i+1, 0)[1]:
    emailMessage = email.message_from_string(j)
    #emailMessage =
    email.Parser.HeaderParser().parsestr(j, headersonly=True)
    # Get fields
    fields = emailMessage.keys()
    # If email contains "From" field
    if emailMessage.has_key("From"):
    # Get contents of From field
    from_field = emailMessage.__getitem__("From")

    Is there another way I should be using to retrieve only the headers
    (not those in the body)?
    T, Mar 11, 2010
    #4
  5. T

    MRAB Guest

    T wrote:
    > On Mar 11, 3:13 pm, MRAB <> wrote:
    >> T wrote:
    >>> All I'm looking to do is to download messages from a POP account and
    >>> retrieve the sender and subject from their headers. Right now I'm 95%
    >>> of the way there, except I can't seem to figure out how to *just* get
    >>> the headers. Problem is, certain email clients also include headers
    >>> in the message body (i.e. if you're replying to a message), and these
    >>> are all picked up as additional senders/subjects. So, I want to avoid
    >>> processing anything from the message body.
    >>> Here's a sample of what I have:
    >>> # For each line in message
    >>> for j in M.retr(i+1)[1]:
    >>> # Create email message object from returned string
    >>> emailMessage = email.message_from_string(j)
    >>> # Get fields
    >>> fields = emailMessage.keys()
    >>> # If email contains "From" field
    >>> if emailMessage.has_key("From"):
    >>> # Get contents of From field
    >>> from_field = emailMessage.__getitem__("From")
    >>> I also tried using the following, but got the same results:
    >>> emailMessage =
    >>> email.Parser.HeaderParser().parsestr(j, headersonly=True)
    >>> Any help would be appreciated!

    >> If you're using poplib then use ".top" instead of ".retr".

    >
    > I'm still having the same issue, even with .top. Am I missing
    > something?
    >
    > for j in M.top(i+1, 0)[1]:
    > emailMessage = email.message_from_string(j)
    > #emailMessage =
    > email.Parser.HeaderParser().parsestr(j, headersonly=True)
    > # Get fields
    > fields = emailMessage.keys()
    > # If email contains "From" field
    > if emailMessage.has_key("From"):
    > # Get contents of From field
    > from_field = emailMessage.__getitem__("From")
    >
    > Is there another way I should be using to retrieve only the headers
    > (not those in the body)?


    The documentation does say:

    """unfortunately, TOP is poorly specified in the RFCs and is
    frequently broken in off-brand servers."""

    All I can say is that it works for me with my ISP! :)
    MRAB, Mar 11, 2010
    #5
  6. T

    T Guest

    Thanks for your suggestions! Here's what seems to be working - it's
    basically the same thing I originally had, but first checks to see if
    the line is blank

    response, lines, bytes = M.retr(i+1)
    # For each line in message
    for line in lines:
    if not line.strip():
    M.dele(i+1)
    break

    emailMessage = email.message_from_string(line)
    # Get fields
    fields = emailMessage.keys()
    # If email contains "From" field
    if emailMessage.has_key("From"):
    # Get contents of From field
    from_field = emailMessage.__getitem__("From")
    T, Mar 11, 2010
    #6
  7. T wrote:
    > Thanks for your suggestions! Here's what seems to be working - it's
    > basically the same thing I originally had, but first checks to see if
    > the line is blank
    >
    > response, lines, bytes = M.retr(i+1)
    > # For each line in message
    > for line in lines:
    > if not line.strip():
    > M.dele(i+1)
    > break
    >
    > emailMessage = email.message_from_string(line)
    > # Get fields
    > fields = emailMessage.keys()
    > # If email contains "From" field
    > if emailMessage.has_key("From"):
    > # Get contents of From field
    > from_field = emailMessage.__getitem__("From")


    Hi T,

    wait, this code looks strange.

    You delete the email if it contains an empty line? I use something like this:

    message='\n'.join(connection.retr(msg_num)[1])

    Your code:
    emailMessage = email.message_from_string(line)
    create an email object from only *one* line!

    You retrieve the whole message (you don't save bandwith), but maybe that's
    what you want.


    Thomas

    --
    Thomas Guettler, http://www.thomas-guettler.de/
    E-Mail: guettli (*) thomas-guettler + de
    Thomas Guettler, Mar 12, 2010
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. sck10
    Replies:
    3
    Views:
    538
    Steven Cheng[MSFT]
    Apr 14, 2006
  2. Fortepianissimo

    module for parsing email Received headers?

    Fortepianissimo, Jan 10, 2004, in forum: Python
    Replies:
    1
    Views:
    400
    Skip Montanaro
    Jan 10, 2004
  3. dont bother
    Replies:
    0
    Views:
    766
    dont bother
    Mar 3, 2004
  4. Phil
    Replies:
    4
    Views:
    659
    Gabriel Genellina
    Jan 17, 2010
  5. Ian
    Replies:
    2
    Views:
    1,907
Loading...

Share This Page