Enhancing the Gateway (Help Needed)

Discussion in 'Ruby' started by James Edward Gray II, Oct 28, 2007.

  1. Here's the short-story on the current situation with our mailing list =20=

    to Usenet gateway:

    * Our Usenet host rejects multipart/alternative messages
    because they are technically illegal Usenet posts
    * This means that some emails do not reach comp.lang.ruby
    (several messages each day according to the logs)
    * We don't like this

    To solve this, we want to enhance the gateway to convert multipart/=20
    alternative messages into something we can legally post to Usenet. I =20=

    have two thoughts on this strategy:

    1. If possible, we should gather all text/plain portions of an email =20=

    and post those with a content-type of text/plain
    2. If that fails, we can just post the original body but force the =20
    content-type to text/plain for maximum compatibility

    Now I need all of you email and Usenet experts to tell me if that's a =20=

    sane strategy. If another approach would be better, please clue me in.

    I've pretty much made it this far. The code at the bottom of this =20
    message is the mail_to_news.rb script used by the gateway rewritten =20
    using this strategy.

    If you aren't familiar with the gateway code, you can get details =20
    from the articles at:

    http://blog.grayproductions.net/categories/the_gateway

    There's one problem left I know I haven't solved correctly. Help me =20
    figure out a decent strategy for this last piece and we can deploy =20
    the new code.

    The outstanding issue is how to handle character sets for the =20
    constructed message. You'll see in the code below that I just pull =20
    the charset param from the original message, but after looking at a =20
    few messages, I realize that this doesn't make sense. For example, =20
    here are the relevant portions of a recent post that wasn't gated =20
    correctly:

    Content-Type: multipart/alternative; boundary=3DApple-Mail-18-445454026=


    --Apple-Mail-18-445454026
    Content-Transfer-Encoding: 7bit
    Content-Type: text/plain;
    charset=3DUS-ASCII;
    delsp=3Dyes;
    format=3Dflowed

    As you can see, the overall email doesn't have a charset but each =20
    text portion can. If we are going to merge these parts, what's the =20
    best strategy for handling the charset?

    I thought of trying to convert them all to UTF-8 with Iconv, but I'm =20
    not sure what to do if a type doesn't declare a charset or when Iconv =20=

    chokes on what is declared? Please share your opinions.

    If you are feeling really adventurous, rewrite the relevant portion =20
    of the code below which I will bracket with a FIX ME comments.

    Here's the script:

    #!/usr/bin/env ruby

    # written by James Edward Gray II <>

    $KCODE =3D "u"

    GATEWAY_DIR =3D File.join(File.dirname(__FILE__), "..").freeze

    $LOAD_PATH << File.join(GATEWAY_DIR, "config") << File.join=20
    (GATEWAY_DIR, "lib")

    require "tmail"

    require "servers_config"
    require "nntp"

    require "logger"
    require "timeout"

    # prepare log
    log =3D Logger.new(ARGV.shift || $stdout)
    log.datetime_format =3D "%Y-%m-%d %H:%M "

    # build incoming and outgoing message object
    incoming =3D TMail::Mail.parse($stdin.read)
    outgoing =3D TMail::Mail.new

    # skip any flagged messages
    if incoming["X-Rubymirror"].to_s =3D=3D "yes"
    log.info "Skipping message ##{incoming.message_id}, sent by =20
    news_to_mail"
    exit
    elsif incoming["X-Spam-Status"].to_s =3D~ /\AYes/
    log.info "Ignoring Spam ##{incoming.message_id}: " +
    "#{incoming.subject}=96#{incoming.from}"
    exit
    end

    # only allow certain headers through
    %w[from subject in_reply_to transfer_encoding date].each do |header|
    outgoing.send("#{header}=3D", incoming.send(header))
    end
    outgoing.message_id =3D incoming.message_id.sub(/\.+>$/, ">")
    %w[X-ML-Name X-Mail-Count X-X-Sender].each do |header|
    outgoing[header] =3D incoming[header].to_s if incoming.key?header
    end

    # doctor headers for Ruby Talk
    outgoing.references =3D if incoming.key? "References"
    incoming.references
    else
    if incoming.key? "In-Reply-To"
    incoming.reply_to
    else
    if incoming.subject =3D~ /^Re:/
    outgoing.reply_to =3D =
    "<this_is_a_dummy_message-id@rubygateway>"
    end
    end
    end
    outgoing["X-Ruby-Talk"] =3D incoming.message_id
    outgoing["X-Received-From"] =3D <<END_GATEWAY_DETAILS.gsub(/\s+/, " ")
    This message has been automatically forwarded from the ruby-talk =20
    mailing list by
    a gateway at #{ServersConfig::NEWSGROUP}. If it is SPAM, it did not =20
    originate at
    #{ServersConfig::NEWSGROUP}. Please report the original sender, and =20
    not us.
    Thanks! For more details about this gateway, please visit:
    http://blog.grayproductions.net/categories/the_gateway
    END_GATEWAY_DETAILS
    outgoing["X-Rubymirror"] =3D "Yes"

    # translate the body of the message, if needed
    if incoming.multipart? and incoming.sub_type =3D=3D "alternative"
    ### FIX ME ###
    # handle multipart/alternative messages
    # extract body
    body =3D ""
    extract_text =3D lambda do |message_or_part|
    if message_or_part.multipart?
    message_or_part.each_part { |part| extract_text[part] }
    elsif message_or_part.content_type =3D=3D "text/plain"
    body +=3D message_or_part.body
    end
    end
    extract_text[incoming]
    if body.empty?
    outgoing.body =3D "Note: the content-type of this message was =20
    altered by " +
    "the gateway.\n\n#{incoming.body}"
    else
    outgoing.body =3D "Note: non-text portions of this message were =20=

    stripped " +
    "by the gateway.\n\n#{body}"
    end
    # set the content type of the new message
    outgoing.set_content_type( "text", "plain",
    "charset" =3D> incoming.type_param=20
    ("charset") )
    ### END FIX ME ###
    else
    %w[content_type body].each do |header|
    outgoing.send("#{header}=3D", incoming.send(header))
    end
    end

    log.info "Sending message ##{incoming.message_id}: " +
    "#{incoming.subject}=96#{incoming.from}=85"
    log.info "Message looks like:\n#{outgoing.encoded}"

    # connect to NNTP host
    begin
    nntp =3D nil
    Timeout.timeout(30) do
    nntp =3D Net::NNTP.new( ServersConfig::NEWS_SERVER,
    Net::NNTP::NNTP_PORT,
    ServersConfig::NEWS_USER,
    ServersConfig::NEWS_PASS )
    end
    rescue Timeout::Error
    log.error "The NNTP connection timed out"
    exit -1
    rescue
    log.fatal "Unable to establish connection to NNTP host: =
    #{$!.message}"
    exit -1
    end

    # attempt to send newsgroup post
    unless $DEBUG
    begin
    result =3D nil
    Timeout.timeout(30) { result =3D nntp.post(outgoing.encoded) }
    rescue Timeout::Error
    log.error "The NNTP post timed out"
    exit -1
    rescue
    log.fatal "Unable to post to NNTP host: #{$!.message}"
    exit -1
    end
    log.info "=85 Sent. nntp.post() result: #{result}"
    end

    __END__

    Thanks for the help.

    James Edward Gray II
     
    James Edward Gray II, Oct 28, 2007
    #1
    1. Advertising

  2. James Edward Gray II

    Bill Kelly Guest

    From: "James Edward Gray II" <>
    >
    > 1. If possible, we should gather all text/plain portions of an email
    > and post those with a content-type of text/plain


    Do we get many HTML-only messages, having a text/html part, without a
    corresponding text/plain part?

    Or is that too uncommon to worry about?


    Regards,

    Bill
     
    Bill Kelly, Oct 28, 2007
    #2
    1. Advertising

  3. Hi,

    At Mon, 29 Oct 2007 06:20:48 +0900,
    James Edward Gray II wrote in [ruby-talk:276334]:
    > To solve this, we want to enhance the gateway to convert multipart/
    > alternative messages into something we can legally post to Usenet. I
    > have two thoughts on this strategy:
    >
    > 1. If possible, we should gather all text/plain portions of an email
    > and post those with a content-type of text/plain


    Rather I want it to be done by FML itself on ruyb-lang.org.

    > 2. If that fails, we can just post the original body but force the
    > content-type to text/plain for maximum compatibility


    I do it locally by `w3m -dump -T text/html`.

    > The outstanding issue is how to handle character sets for the
    > constructed message. You'll see in the code below that I just pull
    > the charset param from the original message, but after looking at a
    > few messages, I realize that this doesn't make sense. For example,
    > here are the relevant portions of a recent post that wasn't gated
    > correctly:
    >
    > Content-Type: multipart/alternative; boundary=Apple-Mail-18-445454026
    >
    > --Apple-Mail-18-445454026
    > Content-Transfer-Encoding: 7bit
    > Content-Type: text/plain;
    > charset=US-ASCII;
    > delsp=yes;
    > format=flowed
    >
    > As you can see, the overall email doesn't have a charset but each
    > text portion can. If we are going to merge these parts, what's the
    > best strategy for handling the charset?


    "alternative" means each bodies have actually same contents,
    so, in theoretically, you can and should select one of them.
    Merging them all is wrong behavior. I suspect you mean
    multipart/relative.

    > I thought of trying to convert them all to UTF-8 with Iconv, but I'm
    > not sure what to do if a type doesn't declare a charset or when Iconv
    > chokes on what is declared? Please share your opinions.


    Should be defaulted to US-ASCII.

    --
    Nobu Nakada
     
    Nobuyoshi Nakada, Oct 29, 2007
    #3
  4. On Oct 28, 2007, at 10:00 PM, Nobuyoshi Nakada wrote:

    > Hi,
    >
    > At Mon, 29 Oct 2007 06:20:48 +0900,
    > James Edward Gray II wrote in [ruby-talk:276334]:
    >> To solve this, we want to enhance the gateway to convert multipart/
    >> alternative messages into something we can legally post to Usenet. I
    >> have two thoughts on this strategy:
    >>
    >> 1. If possible, we should gather all text/plain portions of an email
    >> and post those with a content-type of text/plain

    >
    > Rather I want it to be done by FML itself on ruyb-lang.org.


    Excellent. Are their any plans to make that happen?

    I'm trying to get it in the gateway so we can stop having this
    discussion. ;) But if there are plans to have the list itself do
    it, that's great.

    >> 2. If that fails, we can just post the original body but force the
    >> content-type to text/plain for maximum compatibility

    >
    > I do it locally by `w3m -dump -T text/html`.


    Yes, I assume we could use lynx/links to similar effect. My strategy
    wasn't as clever, but I thought by swapping the content type we would
    at least get the content, though it would have some noise.

    >> The outstanding issue is how to handle character sets for the
    >> constructed message. You'll see in the code below that I just pull
    >> the charset param from the original message, but after looking at a
    >> few messages, I realize that this doesn't make sense. For example,
    >> here are the relevant portions of a recent post that wasn't gated
    >> correctly:
    >>
    >> Content-Type: multipart/alternative; boundary=Apple-
    >> Mail-18-445454026
    >>
    >> --Apple-Mail-18-445454026
    >> Content-Transfer-Encoding: 7bit
    >> Content-Type: text/plain;
    >> charset=US-ASCII;
    >> delsp=yes;
    >> format=flowed
    >>
    >> As you can see, the overall email doesn't have a charset but each
    >> text portion can. If we are going to merge these parts, what's the
    >> best strategy for handling the charset?

    >
    > "alternative" means each bodies have actually same contents,
    > so, in theoretically, you can and should select one of them.
    > Merging them all is wrong behavior.


    Now you know why I asked for help. I know so little about email
    rules. Thanks for explaining this.

    This is good news because it greatly simplifies the process.

    Do you know if multipart content can be nested? For example, could a
    single part of a multipart message itself be multipart? The design
    of TMail seems to support this, but again it's easier if that's not
    the case.

    > I suspect you mean multipart/relative.


    I wasn't even aware of that format, to be honest. I knew of
    multipart/mixed (which our Usenet host will allow) and multipart/
    alternative. What is the purpose of multipart/relative?

    >> I thought of trying to convert them all to UTF-8 with Iconv, but I'm
    >> not sure what to do if a type doesn't declare a charset or when Iconv
    >> chokes on what is declared? Please share your opinions.

    >
    > Should be defaulted to US-ASCII.


    Do you mean that US-ASCII is the charset when one is not specified?

    Thanks for all for the information.

    James Edward Gray II
     
    James Edward Gray II, Oct 29, 2007
    #4
  5. On Oct 28, 2007, at 6:39 PM, Bill Kelly wrote:

    >
    > From: "James Edward Gray II" <>
    >> 1. If possible, we should gather all text/plain portions of an =20
    >> email and post those with a content-type of text/plain

    >
    > Do we get many HTML-only messages, having a text/html part, without a
    > corresponding text/plain part?


    I know I have seen it at least once in the past. I suspect it's =20
    rare, but that's just me guessing. When dealing with the Internet at =20=

    large, I think we always need to be prepared for the worst case =20
    scenario.

    > Or is that too uncommon to worry about?


    You made a good point here that I should try looking at some actual =20
    Ruby Talk messages to see what we're up against. I'll put together a =20=

    script to comb through a subset of the archives=85

    James Edward Gray II=
     
    James Edward Gray II, Oct 29, 2007
    #5
  6. Hi,

    At Mon, 29 Oct 2007 12:18:40 +0900,
    James Edward Gray II wrote in [ruby-talk:276357]:
    > >> 1. If possible, we should gather all text/plain portions of an email
    > >> and post those with a content-type of text/plain

    > >
    > > Rather I want it to be done by FML itself on ruyb-lang.org.

    >
    > Excellent. Are their any plans to make that happen?


    I'm asking to eban.

    > Do you know if multipart content can be nested? For example, could a
    > single part of a multipart message itself be multipart? The design
    > of TMail seems to support this, but again it's easier if that's not
    > the case.


    Yes, and the depth isn't restricted.

    > > I suspect you mean multipart/relative.

    >
    > I wasn't even aware of that format, to be honest. I knew of
    > multipart/mixed (which our Usenet host will allow) and multipart/
    > alternative. What is the purpose of multipart/relative?


    As the above.

    > >> I thought of trying to convert them all to UTF-8 with Iconv, but I'm
    > >> not sure what to do if a type doesn't declare a charset or when Iconv
    > >> chokes on what is declared? Please share your opinions.

    > >
    > > Should be defaulted to US-ASCII.

    >
    > Do you mean that US-ASCII is the charset when one is not specified?


    RFC 2045 Internet Message Bodies November 1996

    5.2. Content-Type Defaults

    Default RFC 822 messages without a MIME Content-Type header are taken
    by this protocol to be plain text in the US-ASCII character set,
    which can be explicitly specified as:

    Content-type: text/plain; charset=us-ascii

    This default is assumed if no Content-Type header field is specified.

    --
    Nobu Nakada
     
    Nobuyoshi Nakada, Oct 29, 2007
    #6
  7. Hi,

    At Mon, 29 Oct 2007 13:17:24 +0900,
    Nobuyoshi Nakada wrote in [ruby-talk:276371]:
    > > > I suspect you mean multipart/relative.

    > >
    > > I wasn't even aware of that format, to be honest. I knew of
    > > multipart/mixed (which our Usenet host will allow) and multipart/
    > > alternative. What is the purpose of multipart/relative?

    >
    > As the above.


    Oops, it was multipart/related, and I removed the paragraph
    mentioned about it. My mistake, sorry.

    --
    Nobu Nakada
     
    Nobuyoshi Nakada, Oct 29, 2007
    #7
  8. On Oct 28, 2007, at 11:35 PM, Nobuyoshi Nakada wrote:

    > Hi,
    >
    > At Mon, 29 Oct 2007 13:17:24 +0900,
    > Nobuyoshi Nakada wrote in [ruby-talk:276371]:
    >>>> I suspect you mean multipart/relative.
    >>>
    >>> I wasn't even aware of that format, to be honest. I knew of
    >>> multipart/mixed (which our Usenet host will allow) and multipart/
    >>> alternative. What is the purpose of multipart/relative?

    >>
    >> As the above.

    >
    > Oops, it was multipart/related, and I removed the paragraph
    > mentioned about it. My mistake, sorry.


    I've been looking into this a little this morning.

    We do receive multipart/related messages, though they seem fairly
    uncommon compared to multipart/alternative. They don't appear to be
    gated properly. In fact, the mailing list archives don't even seem
    to show them. For example 271796 was a multipart/related message and
    I can't find it in the archives or on comp.lang.ruby.

    To understand what we are dealing with here, I read:

    http://www.faqs.org/rfcs/rfc2387.html

    This type does not seem easy to deal with and I open to suggestions
    for the best strategy to use.

    James Edward Gray II
     
    James Edward Gray II, Oct 29, 2007
    #8
  9. James Edward Gray II

    mortee Guest

    James Edward Gray II wrote:
    > I've been looking into this a little this morning.
    >
    > We do receive multipart/related messages, though they seem fairly
    > uncommon compared to multipart/alternative. They don't appear to be
    > gated properly. In fact, the mailing list archives don't even seem to
    > show them. For example 271796 was a multipart/related message and I
    > can't find it in the archives or on comp.lang.ruby.
    >
    > To understand what we are dealing with here, I read:
    >
    > http://www.faqs.org/rfcs/rfc2387.html
    >
    > This type does not seem easy to deal with and I open to suggestions for
    > the best strategy to use.


    AFAIK it's mostly used for HTML messages with images embedded in the
    email itself. I guess it would mostly be one part of a
    multipart/alternative message, of which one alternative should be
    text/plain anyway. Otherwise, you're most likely left with HTML to
    strip, and images which you may either drop or attach to the output as
    files.

    Sorry if I happen to be wrong on one point or the other.

    mortee
     
    mortee, Oct 29, 2007
    #9
  10. James Edward Gray II

    Todd Benson Guest

    On 10/29/07, James Edward Gray II <> wrote:
    > On Oct 28, 2007, at 11:35 PM, Nobuyoshi Nakada wrote:
    >
    > > Hi,
    > >
    > > At Mon, 29 Oct 2007 13:17:24 +0900,
    > > Nobuyoshi Nakada wrote in [ruby-talk:276371]:
    > >>>> I suspect you mean multipart/relative.
    > >>>
    > >>> I wasn't even aware of that format, to be honest. I knew of
    > >>> multipart/mixed (which our Usenet host will allow) and multipart/
    > >>> alternative. What is the purpose of multipart/relative?
    > >>
    > >> As the above.

    > >
    > > Oops, it was multipart/related, and I removed the paragraph
    > > mentioned about it. My mistake, sorry.

    >
    > I've been looking into this a little this morning.
    >
    > We do receive multipart/related messages, though they seem fairly
    > uncommon compared to multipart/alternative. They don't appear to be
    > gated properly. In fact, the mailing list archives don't even seem
    > to show them. For example 271796 was a multipart/related message and
    > I can't find it in the archives or on comp.lang.ruby.
    >
    > To understand what we are dealing with here, I read:
    >
    > http://www.faqs.org/rfcs/rfc2387.html
    >
    > This type does not seem easy to deal with and I open to suggestions
    > for the best strategy to use.
    >
    > James Edward Gray II


    I haven't built enough clout in this group for my opinion to matter,
    but here goes...

    James did a great job with the gateway ... no doubt about that.
    Should we even have it? I absolutely think so.

    The lowest common denominator for language is US-ASCII (is that a good
    thing or bad thing? You decide).

    Make sure, James and others, that you label the reformed
    emails/postings with some kind of rejoinder that says something to the
    effect of "mail/posting has been modified to make it available."

    Todd
     
    Todd Benson, Oct 29, 2007
    #10
  11. On Oct 29, 2007, at 9:20 AM, mortee wrote:

    > James Edward Gray II wrote:
    >> I've been looking into this a little this morning.
    >>
    >> We do receive multipart/related messages, though they seem fairly
    >> uncommon compared to multipart/alternative. They don't appear to be
    >> gated properly. In fact, the mailing list archives don't even
    >> seem to
    >> show them. For example 271796 was a multipart/related message and I
    >> can't find it in the archives or on comp.lang.ruby.
    >>
    >> To understand what we are dealing with here, I read:
    >>
    >> http://www.faqs.org/rfcs/rfc2387.html
    >>
    >> This type does not seem easy to deal with and I open to
    >> suggestions for
    >> the best strategy to use.

    >
    > AFAIK it's mostly used for HTML messages with images embedded in the
    > email itself.


    Yeah, I think that's what I'm seeing in my analysis of the messages.

    > I guess it would mostly be one part of a multipart/alternative
    > message, of which one alternative should be text/plain anyway.


    Most of the cases I have found have a multipart/alternative section
    inside the multipart/related section, like this example shows:

    271796: multipart/related ()
    multipart/alternative ()
    image/png ()

    Obviously I need to extend my statistics gathering script to handle
    the nesting, but I've checked this message by hand and there was a
    text/plain part in there.

    > Otherwise, you're most likely left with HTML to
    > strip, and images which you may either drop or attach to the output as
    > files.


    Right. Which means I still need to settle on an HTML strategy as well.

    > Sorry if I happen to be wrong on one point or the other.


    The other usage that seems common, more common than the HTML case in
    fact, is as part of a signed message:

    271822: multipart/signed ()
    multipart/related ()
    application/pgp-signature ()

    I've not yet checked to see if these messages are gated properly with
    our current setup.

    James Edward Gray II
     
    James Edward Gray II, Oct 29, 2007
    #11
  12. James Edward Gray II

    mortee Guest

    Todd Benson wrote:
    > The lowest common denominator for language is US-ASCII (is that a good
    > thing or bad thing? You decide).


    Aside from any language bias: the language of this list/group is
    certainly English, which does just well in ASCII. So IMHO we wouldn't
    loose much by falling back to that in case of some iconv errors. At
    least certainly not as much as it'd be worth extraneous effort to work
    around.

    mortee
     
    mortee, Oct 29, 2007
    #12
  13. On Oct 29, 2007, at 10:02 AM, Todd Benson wrote:

    > On 10/29/07, James Edward Gray II <> wrote:
    >> On Oct 28, 2007, at 11:35 PM, Nobuyoshi Nakada wrote:
    >>
    >>> Hi,
    >>>
    >>> At Mon, 29 Oct 2007 13:17:24 +0900,
    >>> Nobuyoshi Nakada wrote in [ruby-talk:276371]:
    >>>>>> I suspect you mean multipart/relative.
    >>>>>
    >>>>> I wasn't even aware of that format, to be honest. I knew of
    >>>>> multipart/mixed (which our Usenet host will allow) and multipart/
    >>>>> alternative. What is the purpose of multipart/relative?
    >>>>
    >>>> As the above.
    >>>
    >>> Oops, it was multipart/related, and I removed the paragraph
    >>> mentioned about it. My mistake, sorry.

    >>
    >> I've been looking into this a little this morning.
    >>
    >> We do receive multipart/related messages, though they seem fairly
    >> uncommon compared to multipart/alternative. They don't appear to be
    >> gated properly. In fact, the mailing list archives don't even seem
    >> to show them. For example 271796 was a multipart/related message and
    >> I can't find it in the archives or on comp.lang.ruby.
    >>
    >> To understand what we are dealing with here, I read:
    >>
    >> http://www.faqs.org/rfcs/rfc2387.html
    >>
    >> This type does not seem easy to deal with and I open to suggestions
    >> for the best strategy to use.
    >>
    >> James Edward Gray II

    >
    > I haven't built enough clout in this group for my opinion to matter,
    > but here goes...


    I'm in over my head with all this email stuff and need all the help I
    can get. The gateway belongs to all of us, not my. So don't be
    shy. Help me fix this right and we all benefit.

    > James did a great job with the gateway ... no doubt about that.


    Just to be totally clear, I didn't make the original gateway. I'm
    just the current caretaker.

    > Make sure, James and others, that you label the reformed
    > emails/postings with some kind of rejoinder that says something to the
    > effect of "mail/posting has been modified to make it available."


    I will absolutely do this. The code I posted earlier in this thread
    already does.

    James Edward Gray II
     
    James Edward Gray II, Oct 29, 2007
    #13
  14. James Edward Gray II

    F. Senault Guest

    Le 29 octobre à 16:06, James Edward Gray II a écrit :

    > On Oct 29, 2007, at 9:20 AM, mortee wrote:


    >> Otherwise, you're most likely left with HTML to
    >> strip, and images which you may either drop or attach to the output as
    >> files.

    >
    > Right. Which means I still need to settle on an HTML strategy as well.


    I'm not sure you have that many HTML only messages. For my mailbox, I
    have an HTML-only filter. It catches 0.5% of my incoming mail, and it's
    100% spam.

    OTOH, I seem to recall we looked at a weird multipart/alternative
    message recently which had only one plain text part.

    >> Sorry if I happen to be wrong on one point or the other.

    >
    > The other usage that seems common, more common than the HTML case in
    > fact, is as part of a signed message:
    >
    > 271822: multipart/signed ()
    > multipart/related ()
    > application/pgp-signature ()
    >
    > I've not yet checked to see if these messages are gated properly with
    > our current setup.


    Yes. I have <> / ruby-talk 276326,
    for instance. I can't guarantee it's propagated as well as a pure text
    message, but it should be on most servers.

    Fred
    --
    You walked away from this Did it make it easier on you ? So now what ?
    Life must go on still haunted It's so hard to face the day I hope it
    is good for you I tried, oh how I tried, but it's broken Let me go, I
    could have died (Kittie, Pink Lemonade)
     
    F. Senault, Oct 29, 2007
    #14
  15. James Edward Gray II

    F. Senault Guest

    Le 28 octobre à 22:20, James Edward Gray II a écrit :

    > The outstanding issue is how to handle character sets for the
    > constructed message. You'll see in the code below that I just pull
    > the charset param from the original message, but after looking at a
    > few messages, I realize that this doesn't make sense. For example,
    > here are the relevant portions of a recent post that wasn't gated
    > correctly:
    >
    > Content-Type: multipart/alternative; boundary=Apple-Mail-18-445454026
    >
    > --Apple-Mail-18-445454026
    > Content-Transfer-Encoding: 7bit
    > Content-Type: text/plain;
    > charset=US-ASCII;
    > delsp=yes;
    > format=flowed
    >
    > As you can see, the overall email doesn't have a charset but each
    > text portion can. If we are going to merge these parts, what's the
    > best strategy for handling the charset?


    Well, usually, you don't have more than one charset in a message ; you
    should push the charset of the part back to the main header and be done
    with it.

    Now, if you have more than one text part and different charsets, it's a
    bit more complicated...

    > I thought of trying to convert them all to UTF-8 with Iconv, but I'm
    > not sure what to do if a type doesn't declare a charset or when Iconv
    > chokes on what is declared? Please share your opinions.


    Hm... Complain to the poster / the software writer ? :)

    Fred
    --
    Good thing I calmed down after all this mindless destruction. I mean,
    destroy the world single handedly, what the hell was I thinking. I'd
    need a tank to do that... Mmmm... Tank...
    (Fusion D, Yamcha Hibiki, http://fusiond.keenspace.com)
     
    F. Senault, Oct 29, 2007
    #15
  16. Fred, you always show up when I need you. That's why you're still my =20=

    best friend. ;)

    On Oct 29, 2007, at 1:55 PM, F. Senault wrote:

    > Le 29 octobre =E0 16:06, James Edward Gray II a =E9crit :
    >
    >> On Oct 29, 2007, at 9:20 AM, mortee wrote:

    >
    >>> Otherwise, you're most likely left with HTML to
    >>> strip, and images which you may either drop or attach to the =20
    >>> output as
    >>> files.

    >>
    >> Right. Which means I still need to settle on an HTML strategy as =20
    >> well.

    >
    > I'm not sure you have that many HTML only messages. For my mailbox, I
    > have an HTML-only filter. It catches 0.5% of my incoming mail, and =20=


    > it's 100% spam.


    Yes, you may be right about that. Perhaps not much of a concern. =20
    I'm not seeing any such messages in my sample data.

    > OTOH, I seem to recall we looked at a weird multipart/alternative
    > message recently which had only one plain text part.


    Sadly, that's extremely common. Have a look at just the beginning of =20=

    my sample data:

    271456: multipart/alternative ()
    text/plain (UTF-8)
    271541: multipart/signed ()
    text/plain (utf-8)
    application/pgp-signature ()
    271567: multipart/signed ()
    text/plain (iso-8859-1)
    application/pgp-signature ()
    271588: multipart/signed ()
    text/plain (utf-8)
    application/pgp-signature ()
    271569: multipart/alternative ()
    text/plain (ISO-8859-1)
    271578: multipart/alternative ()
    text/plain (ISO-8859-1)
    271566: multipart/signed ()
    text/plain (iso-8859-1)
    application/pgp-signature ()
    271568: multipart/alternative ()
    text/plain (ISO-8859-1)
    271444: multipart/alternative ()
    text/plain (ISO-8859-1)
    271452: multipart/alternative ()
    text/plain (ISO-8859-1)
    271640: multipart/alternative ()
    text/plain (UTF-8)
    271669: multipart/alternative ()
    text/plain (ISO-8859-1)
    =85

    Good thing those are super easy to fix. ;)

    >>> Sorry if I happen to be wrong on one point or the other.

    >>
    >> The other usage that seems common, more common than the HTML case in
    >> fact, is as part of a signed message:
    >>
    >> 271822: multipart/signed ()
    >> multipart/related ()
    >> application/pgp-signature ()
    >>
    >> I've not yet checked to see if these messages are gated properly with
    >> our current setup.

    >
    > Yes. I have <> / ruby-talk =20
    > 276326,
    > for instance. I can't guarantee it's propagated as well as a pure =20
    > text
    > message, but it should be on most servers.


    Awesome. That's good to know. Thanks for checking that for me.

    James Edward Gray II=
     
    James Edward Gray II, Oct 29, 2007
    #16
  17. On Oct 29, 2007, at 2:15 PM, F. Senault wrote:

    > Le 28 octobre =E0 22:20, James Edward Gray II a =E9crit :
    >
    >> I thought of trying to convert them all to UTF-8 with Iconv, but I'm
    >> not sure what to do if a type doesn't declare a charset or when Iconv
    >> chokes on what is declared? Please share your opinions.

    >
    > Hm... Complain to the poster / the software writer ? :)


    Good plan. ;)

    James Edward Gray II=
     
    James Edward Gray II, Oct 29, 2007
    #17
  18. On Oct 28, 2007, at 4:20 PM, James Edward Gray II wrote:

    > Now I need all of you email and Usenet experts to tell me if that's
    > a sane strategy.


    OK, here is the revised plan folks. Complain now if you see flaws:

    * The gateway will only alter messages with a top-level content-type
    of multipart/alternative or multipart/related
    * For both types of messages, if will search for the first text/plain
    part and promote that to the body, discarding other types (this is
    probably not the ideal handling multipart/related, but it seems to
    fit the messages we are seeing on Ruby Talk)
    * All modified messages will begin with a disclaimer on the first line

    James Edward Gray II
     
    James Edward Gray II, Oct 29, 2007
    #18
  19. On Oct 29, 2007, at 3:46 PM, James Edward Gray II wrote:

    > On Oct 28, 2007, at 4:20 PM, James Edward Gray II wrote:
    >
    >> Now I need all of you email and Usenet experts to tell me if =20
    >> that's a sane strategy.

    >
    > OK, here is the revised plan folks. Complain now if you see flaws:


    I forgot one detail=85

    > * The gateway will only alter messages with a top-level content-=20
    > type of multipart/alternative or multipart/related
    > * For both types of messages, if will search for the first text/=20
    > plain part and promote that to the body, discarding other types =20
    > (this is probably not the ideal handling multipart/related, but it =20
    > seems to fit the messages we are seeing on Ruby Talk)


    * If we fail to find a text/plain part, the gateway will keep the =20
    body as is, but force the content-type of the message to text/plain =20
    in the hopes of getting the content through with some noise (it seems =20=

    this will be needed for very few messages, possibly none)

    > * All modified messages will begin with a disclaimer on the first line


    James Edward Gray II=
     
    James Edward Gray II, Oct 29, 2007
    #19
  20. James Edward Gray II

    mortee Guest

    James Edward Gray II wrote:
    > * If we fail to find a text/plain part, the gateway will keep the body
    > as is, but force the content-type of the message to text/plain in the
    > hopes of getting the content through with some noise (it seems this will
    > be needed for very few messages, possibly none)


    Do you think *anyone* would ever attempt to read a post which would show
    its html source as plain text (that will happen if you force the
    content-type of a html mail to text/plain)? I guess you should either
    drop those or try to strip the html tags.

    mortee
     
    mortee, Oct 29, 2007
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dmitriy Zakharov

    Enhancing ASP.NET Framework

    Dmitriy Zakharov, Sep 3, 2004, in forum: ASP .Net
    Replies:
    9
    Views:
    401
    Patrice
    Sep 6, 2004
  2. =?ISO-8859-1?Q?Christian_Brechb=FChler?=

    Enhancing valarray with "normal" arithmetic operators

    =?ISO-8859-1?Q?Christian_Brechb=FChler?=, Sep 12, 2003, in forum: C++
    Replies:
    6
    Views:
    969
    =?ISO-8859-1?Q?Christian_Brechb=FChler?=
    Sep 14, 2003
  3. John
    Replies:
    0
    Views:
    1,060
  4. John
    Replies:
    0
    Views:
    1,068
  5. P2P
    Replies:
    2
    Views:
    161
Loading...

Share This Page