JavaMail and base64 problem

Discussion in 'Java' started by Martin Gregorie, Oct 21, 2008.

  1. When I create and store a mail message by constructing a MimeMessage from
    an InputStream and writing it to an mbox file using the mstor provider
    some or all of its content will be encoded as base64 if the message
    contains the "Content-Transfer-Encoding: base64" header.

    However, when the messages are read back into Message objects, using
    JavaMail and the mstor provider, some base64 encoded parts of the content
    cause an exception. The exception message says that the base64 encoding
    length isn't a multiple of 4 bytes.

    I've never seen this error when reading messages from mbox files created
    by Postfix - only when the message has been written by JavaMail/mstor and
    then read back by it.

    I'm using JavaMail 1.4.1 and mstor 0.9.11 on a Linux (Fedora 8) system.

    Has anybody else seen this problem and, if so, how did you get round it?


    --
    martin@ | Martin Gregorie
    gregorie. | Essex, UK
    org |
     
    Martin Gregorie, Oct 21, 2008
    #1
    1. Advertising

  2. On Tue, 21 Oct 2008 15:58:23 +0100, rossum wrote:

    > On Tue, 21 Oct 2008 11:12:05 +0000 (UTC), Martin Gregorie
    > <> wrote:
    >
    >>When I create and store a mail message by constructing a MimeMessage
    >>from an InputStream and writing it to an mbox file using the mstor
    >>provider some or all of its content will be encoded as base64 if the
    >>message contains the "Content-Transfer-Encoding: base64" header.
    >>
    >>However, when the messages are read back into Message objects, using
    >>JavaMail and the mstor provider, some base64 encoded parts of the
    >>content cause an exception. The exception message says that the base64
    >>encoding length isn't a multiple of 4 bytes.

    >
    > An example of the before and after versions of the Base64 text might be
    > useful here. If the length is not a multiple of 4 then either it is
    > being truncated or something is being added to the end.
    >

    Difficult, as they are buried in very large files, but I'll see what I
    can do.

    If the decode fails the message gets spat into a single message mbox file
    where I can look at it. The failing region in this file is plain text in
    the examples I've looked at, not base64 encoded, and doesn't seem to be
    truncated. The exception can occur at the end of an attachment
    (attachment has "Content-Transfer_Encoding: base64" and the rest of the
    message is present) or at the end of the message body (message header has
    "Content-Transfer_Encoding: base64").

    The main point is that the files are not modified between being written
    by program A and being read into program B, so I wasn't expecting any
    problem of this type because both programs use the same set of supporting
    class libraries and are run in the same environment.

    I assume that the base64 encode/decode is in the depths of JavaMail
    rather than in the mstor provider code and that, if the input bytestring
    wasn't long enough it would be padded with '=' so its length is a
    multiple of 4 bytes. Do these sound like reasonable assumptions?


    --
    martin@ | Martin Gregorie
    gregorie. | Essex, UK
    org |
     
    Martin Gregorie, Oct 21, 2008
    #2
    1. Advertising

  3. Martin Gregorie

    Roedy Green Guest

    On Tue, 21 Oct 2008 11:12:05 +0000 (UTC), Martin Gregorie
    <> wrote, quoted or indirectly
    quoted someone who said :

    >
    >Has anybody else seen this problem and, if so, how did you get round it?


    Base64 expects you to pad to a multiple of 4 with =
    --
    Roedy Green Canadian Mind Products
    http://mindprod.com
    The Canadian national animal should be changed from the beaver to the ostrich.
    Canadians elected a party that denies global warming so they too could pretend it presents no danger.
     
    Roedy Green, Oct 22, 2008
    #3
  4. On Tue, 21 Oct 2008 17:32:14 -0700, Roedy Green wrote:

    > On Tue, 21 Oct 2008 11:12:05 +0000 (UTC), Martin Gregorie
    > <> wrote, quoted or indirectly quoted
    > someone who said :
    >
    >
    >>Has anybody else seen this problem and, if so, how did you get round it?

    >
    > Base64 expects you to pad to a multiple of 4 with =
    >

    I know - I've read the RFC.

    The problem is that I'm building the MimeMessage object for output with
    the MimeMessage(Folder, InputStream) constructor which accepts the entire
    message (headers and all) from the InputStream and parses them. To insert
    the padding myself I'd have to parse the stream to decide if there is a
    Content-Transfer-Encoding header that says base64 encoding is required
    and, if so, find the right place(es) to add the padding into the stream.
    They could be in the middle, for base64 attachments, or at the end, for a
    single part base64 message. This is the job that I'd expect the
    MimeMessage constructor to handle, since its already parsing the input
    stream - at least that's what the Javadocs say it does.

    Neither MimeMessage nor MimeBodyPart give me any control over content
    transfer encoding apart from manipulating the list of headers. The only
    way I can get that control involves operating at a very low level and
    directly using the methods in MimeUtility but I really want to avoid
    doing that. I'm currently storing the headers and content for each
    message in a single database row as two CLOB fields, which makes for a
    nice, simple schema. I really don't want to split the parts of the
    message content out because that would require me to use a recursive
    structure of arbitrary depth (message parts can be multi-parts and so ad
    infinitum). That is do-able, of course, but makes database handling
    considerably more complex so its to be avoided if possible. Apart from
    this one extract/reload task I have nothing else that would be helped by
    using a more complex schema.


    --
    martin@ | Martin Gregorie
    gregorie. | Essex, UK
    org |
     
    Martin Gregorie, Oct 22, 2008
    #4
  5. Martin Gregorie wrote:
    > On Tue, 21 Oct 2008 17:32:14 -0700, Roedy Green wrote:
    >> On Tue, 21 Oct 2008 11:12:05 +0000 (UTC), Martin Gregorie
    >> <> wrote, quoted or indirectly quoted
    >> someone who said :
    >>> Has anybody else seen this problem and, if so, how did you get round it?

    >> Base64 expects you to pad to a multiple of 4 with =
    >>

    > I know - I've read the RFC.


    As far as I know, then the MIME standard does not mandate length to
    be multiple of 4 and MimeUtility class does not implement it either.

    It is standard base64 and implemented into MimeUtility that no
    lines are longer than 76 bytes.

    When \r\n are inserted then the length are no longer a multipla
    of 4 (until another is inserted).

    That does not explain your real problem though.

    Arne
     
    Arne Vajhøj, Oct 26, 2008
    #5
  6. Arne Vajhøj wrote:
    > Martin Gregorie wrote:
    >> I know - I've read the RFC.

    >
    > As far as I know, then the MIME standard does not mandate length to
    > be multiple of 4 and MimeUtility class does not implement it either.


    <http://tools.ietf.org/html/rfc4648#section-3.2> says:
    In some circumstances, the use of padding ("=") in base-encoded data is
    not required or used. In the general case, when assumptions about the
    size of transported data cannot be made, padding is required to yield
    correct decoded data.

    Implementations MUST include appropriate pad characters at the end of
    encoded data unless the specification referring to this document
    explicitly states otherwise.

    <http://tools.ietf.org/html/rfc2045#section-6.8> says:
    Special processing is performed if fewer than 24 bits are available at
    the end of the data being encoded. A full encoding quantum is always
    completed at the end of a body. When fewer than 24 input bits are
    available in an input group, zero bits are added (on the right) to form
    an integral number of 6-bit groups. Padding at the end of the data is
    performed using the "=" character.

    In other words, you are required to pad.

    --
    Beware of bugs in the above code; I have only proved it correct, not
    tried it. -- Donald E. Knuth
     
    Joshua Cranmer, Oct 26, 2008
    #6
  7. Joshua Cranmer wrote:
    > Arne Vajhøj wrote:
    >> Martin Gregorie wrote:
    >>> I know - I've read the RFC.

    >>
    >> As far as I know, then the MIME standard does not mandate length to
    >> be multiple of 4 and MimeUtility class does not implement it either.

    >
    > <http://tools.ietf.org/html/rfc4648#section-3.2> says:
    > In some circumstances, the use of padding ("=") in base-encoded data is
    > not required or used. In the general case, when assumptions about the
    > size of transported data cannot be made, padding is required to yield
    > correct decoded data.
    >
    > Implementations MUST include appropriate pad characters at the end of
    > encoded data unless the specification referring to this document
    > explicitly states otherwise.
    >
    > <http://tools.ietf.org/html/rfc2045#section-6.8> says:
    > Special processing is performed if fewer than 24 bits are available at
    > the end of the data being encoded. A full encoding quantum is always
    > completed at the end of a body. When fewer than 24 input bits are
    > available in an input group, zero bits are added (on the right) to form
    > an integral number of 6-bit groups. Padding at the end of the data is
    > performed using the "=" character.
    >
    > In other words, you are required to pad.


    I think you should have read the rest of my post.

    It is required to pad. But it may insert \r\n to
    limit line length to 76. In which case the length
    is not always a multipla of 4.

    Arne
     
    Arne Vajhøj, Oct 26, 2008
    #7
  8. On Sat, 25 Oct 2008 22:01:11 -0400, Joshua Cranmer wrote:

    > Implementations MUST include appropriate pad characters at the end of
    > encoded data unless the specification referring to this document
    > explicitly states otherwise.
    >
    > <http://tools.ietf.org/html/rfc2045#section-6.8> says: Special
    > processing is performed if fewer than 24 bits are available at the end
    > of the data being encoded. A full encoding quantum is always completed
    > at the end of a body. When fewer than 24 input bits are available in an
    > input group, zero bits are added (on the right) to form an integral
    > number of 6-bit groups. Padding at the end of the data is performed
    > using the "=" character.
    >
    > In other words, you are required to pad.
    >

    Some clarification:

    I do the following with all the mail:

    Postfix->mbox->Javamail.MimeMessage->headers -->}fields in a DB row
    ->body -->}

    If I do this:
    DB->headers+body->attachment->JavaMail.Mimemessage --> Postfix --> MUA

    the retrieved message is correctly formatted and readable as an
    attachment. Base64 attachments (e.g.) images are viewable.

    However, if I do this:
    DB->headers+body->JavaMail.MimeMessage --> mstor provider --> mbox file

    mbox file --> mstor provider -> JavaMail.MimeMessage

    then a small proportion of the messages that originally contained Base64
    attachments or a non-Mime Base64-encoded body (some M$ MUAs do this) will
    fail to be read back from the mbox file with the "Base64 not a multiple
    of 4 bytes" exception. In both cases the complete MIME body is treated as
    a data stream and not parsed by my code.

    Headers+body means that I concatenate the headers and body to create a
    single InputStream that's passed to the MimeMessage constructor, which is
    then written to an mbox file by JavaMail using the mstor provider. I have
    to ensure that the stream ends with CRLFCRLF before constructing the
    MimeMessage. If I omit this step the blank line between the message body
    and the next message's 'From ' envelope header is omitted and mstor
    becomes incapable of parsing the mbox file, so quite its possible that
    the provider is doing Base64 encode/decode as well. My guess is that the
    provider module is causing the problem since messages sent to Postfix via
    pop3 provider doesn't cause problems, but a small fraction of the
    messages sent to the mbox file via the mstor provider are not readable by
    mstor+JavaMail.

    As I said at the start of this thread, has anybody seen this problem
    before? Alternatively, has anybody done something similar and NOT seen
    the problem?

    he purpose of this post is to see if anybody on this newsgroup can
    confirm or deny this guess.

    IOW I want to know if the Base64 encoding is handled by the mstor module
    or within JavaMail. Once I can determine that I can start a dialogue with
    the appropriate author.


    TIA,
    Martin

    --
    martin@ | Martin Gregorie
    gregorie. | Essex, UK
    org |
     
    Martin Gregorie, Oct 26, 2008
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. PCH

    base64 enc/dec problem

    PCH, Jul 1, 2003, in forum: ASP .Net
    Replies:
    2
    Views:
    550
  2. Bobby Martin

    JavaMail sending problem and solution

    Bobby Martin, May 19, 2004, in forum: Java
    Replies:
    4
    Views:
    5,759
    Bobby Martin
    May 21, 2004
  3. Nikhil
    Replies:
    4
    Views:
    976
    Martin Gregorie
    Apr 15, 2007
  4. Tamir Weiss
    Replies:
    0
    Views:
    518
    Tamir Weiss
    May 13, 2007
  5. seal
    Replies:
    0
    Views:
    375
Loading...

Share This Page