HTML on Usenet

Discussion in 'HTML' started by Steve Crook, Nov 22, 2009.

  1. Steve Crook

    Steve Crook Guest

    Hi all,

    Please excuse me if my question is a little off-topic for this group but
    the level of activity in here suggests it's a good place to ask an HTML
    related question.

    I'm the current maintainer of the Usenet filtering software, Cleanfeed.
    (http://www.mixmin.net/cleanfeed)
    At the moment, Cleanfeed places HTML postings to Usenet into four
    categories:

    MIME posts with HTML attached files
    HTML posts (Content-Type: Text/HTML)
    MIME Multipart/Alternatives with HTML components
    Any HTML with embedded <img src=> tags

    In general Multipart HTML is broadly accepted across Usenet providing a
    Text alternative is included. Image tags are rejected everywhere,
    except the microsoft hierarchy where pretty much anything goes. The
    same is True for non-MIME Text/HTML content.

    I'm in the process of refining some of the HTML filters and would
    appreciate some feedback on what groups/sub-hierarchies should be
    exempted from these rules. One simple approach would be to allow HTML
    to any group with a '.html' element in its name but I'm sure there are
    exceptions to such a simple statement. Embedded images are another area
    where I'd appreciate views on their acceptability.

    Steve
    Steve Crook, Nov 22, 2009
    #1
    1. Advertising

  2. Steve Crook wrote:
    > Hi all,
    >
    > Please excuse me if my question is a little off-topic for this group but
    > the level of activity in here suggests it's a good place to ask an HTML
    > related question.
    >
    > I'm the current maintainer of the Usenet filtering software, Cleanfeed.
    > (http://www.mixmin.net/cleanfeed)
    > At the moment, Cleanfeed places HTML postings to Usenet into four
    > categories:
    >
    > MIME posts with HTML attached files
    > HTML posts (Content-Type: Text/HTML)
    > MIME Multipart/Alternatives with HTML components
    > Any HTML with embedded<img src=> tags
    >

    <snip>

    Correct me if I am wrong, but unless it is a binary newsgroup all
    content must be plain text in Usenet. To do otherwise is a common newbie
    mistake that usually results in a good flaming...

    --
    Take care,

    Jonathan
    -------------------
    LITTLE WORKS STUDIO
    http://www.LittleWorksStudio.com
    Jonathan N. Little, Nov 22, 2009
    #2
    1. Advertising

  3. Steve Crook

    Steve Crook Guest

    On Sun, 22 Nov 2009 08:48:08 -0500, Jonathan N. Little wrote in
    Message-Id: <hebfer$65p$-september.org>:

    > Correct me if I am wrong, but unless it is a binary newsgroup all
    > content must be plain text in Usenet. To do otherwise is a common newbie
    > mistake that usually results in a good flaming...


    That's certainly a view that many people share and I for one wouldn't
    dispute it. I think there are exceptions though, such as the microsoft
    hierarchy I mentioned previously where HTML seems to be widely accepted,
    probably due to the functionality of Outlook and it's offspring.
    There's also a "clari" hierarchy that contains a high ratio of HTML.

    At least for the big-8 and alt, (excluding binaries), I'd like to take
    the stance that all posts are text only, or at least contain a
    text/plain element. I wanted to gather opinions from a group related
    specifically to HTML to see if it was contentious. You reply suggests
    it's not. :)
    Steve Crook, Nov 22, 2009
    #3
  4. Steve Crook wrote:
    > On Sun, 22 Nov 2009 08:48:08 -0500, Jonathan N. Little wrote in
    > Message-Id:<hebfer$65p$-september.org>:
    >
    >> Correct me if I am wrong, but unless it is a binary newsgroup all
    >> content must be plain text in Usenet. To do otherwise is a common newbie
    >> mistake that usually results in a good flaming...

    >
    > That's certainly a view that many people share and I for one wouldn't
    > dispute it. I think there are exceptions though, such as the microsoft
    > hierarchy I mentioned previously where HTML seems to be widely accepted,
    > probably due to the functionality of Outlook and it's offspring.
    > There's also a "clari" hierarchy that contains a high ratio of HTML.
    >
    > At least for the big-8 and alt, (excluding binaries), I'd like to take
    > the stance that all posts are text only, or at least contain a
    > text/plain element. I wanted to gather opinions from a group related
    > specifically to HTML to see if it was contentious. You reply suggests
    > it's not. :)



    --
    Take care,

    Jonathan
    -------------------
    LITTLE WORKS STUDIO
    http://www.LittleWorksStudio.com
    Jonathan N. Little, Nov 22, 2009
    #4
  5. Steve Crook wrote:
    > On Sun, 22 Nov 2009 08:48:08 -0500, Jonathan N. Little wrote in
    > Message-Id:<hebfer$65p$-september.org>:
    >
    >> Correct me if I am wrong, but unless it is a binary newsgroup all
    >> content must be plain text in Usenet. To do otherwise is a common newbie
    >> mistake that usually results in a good flaming...

    >
    > That's certainly a view that many people share and I for one wouldn't
    > dispute it. I think there are exceptions though, such as the microsoft
    > hierarchy I mentioned previously where HTML seems to be widely accepted,
    > probably due to the functionality of Outlook and it's offspring.
    > There's also a "clari" hierarchy that contains a high ratio of HTML.


    I think it is a matter of *server* not the *client*. Plain text is much
    smaller storage and transport-wise.

    --
    Take care,

    Jonathan
    -------------------
    LITTLE WORKS STUDIO
    http://www.LittleWorksStudio.com
    Jonathan N. Little, Nov 22, 2009
    #5
  6. Steve Crook

    richard Guest

    On Sun, 22 Nov 2009 12:27:19 +0000 (UTC), Steve Crook wrote:

    > Hi all,
    >
    > Please excuse me if my question is a little off-topic for this group but
    > the level of activity in here suggests it's a good place to ask an HTML
    > related question.
    >
    > I'm the current maintainer of the Usenet filtering software, Cleanfeed.
    > (http://www.mixmin.net/cleanfeed)
    > At the moment, Cleanfeed places HTML postings to Usenet into four
    > categories:
    >
    > MIME posts with HTML attached files
    > HTML posts (Content-Type: Text/HTML)
    > MIME Multipart/Alternatives with HTML components
    > Any HTML with embedded <img src=> tags
    >
    > In general Multipart HTML is broadly accepted across Usenet providing a
    > Text alternative is included. Image tags are rejected everywhere,
    > except the microsoft hierarchy where pretty much anything goes. The
    > same is True for non-MIME Text/HTML content.
    >
    > I'm in the process of refining some of the HTML filters and would
    > appreciate some feedback on what groups/sub-hierarchies should be
    > exempted from these rules. One simple approach would be to allow HTML
    > to any group with a '.html' element in its name but I'm sure there are
    > exceptions to such a simple statement. Embedded images are another area
    > where I'd appreciate views on their acceptability.
    >
    > Steve


    It has been a standard practice for a number of years now that unless it is
    plain text, with no html, no images, no binaries, then it is allowed in the
    group. Anything else must be posted to a binary group.

    The only exceptions are the "stationery" groups and those who have
    established charters allowing the binaries.

    A few years back I did create a group called alt.binaries.html for the sole
    purpose of posting web pages for review. But no one was interested.

    Most servers will cancel a binary posted a to a text only group.
    richard, Nov 24, 2009
    #6
  7. Steve Crook

    richard Guest

    On Sun, 22 Nov 2009 14:54:20 +0000 (UTC), Steve Crook wrote:

    > On Sun, 22 Nov 2009 08:48:08 -0500, Jonathan N. Little wrote in
    > Message-Id: <hebfer$65p$-september.org>:
    >
    >> Correct me if I am wrong, but unless it is a binary newsgroup all
    >> content must be plain text in Usenet. To do otherwise is a common newbie
    >> mistake that usually results in a good flaming...

    >
    > That's certainly a view that many people share and I for one wouldn't
    > dispute it. I think there are exceptions though, such as the microsoft
    > hierarchy I mentioned previously where HTML seems to be widely accepted,
    > probably due to the functionality of Outlook and it's offspring.
    > There's also a "clari" hierarchy that contains a high ratio of HTML.
    >
    > At least for the big-8 and alt, (excluding binaries), I'd like to take
    > the stance that all posts are text only, or at least contain a
    > text/plain element. I wanted to gather opinions from a group related
    > specifically to HTML to see if it was contentious. You reply suggests
    > it's not. :)


    I believe you should read the charters of the groups before allowing html
    encoded stuff to be posted. The main reason you find it in some groups is
    because OE is one of few clients that can allow the translation. Ergo, you
    need OE to read it or view it. As many clients simply ignore the encoding.
    When html is posted and read as pure text, it is hard to read the content.
    richard, Nov 24, 2009
    #7
  8. Steve Crook

    richard Guest

    On Sun, 22 Nov 2009 11:37:46 -0500, Jonathan N. Little wrote:

    > Steve Crook wrote:
    >> On Sun, 22 Nov 2009 08:48:08 -0500, Jonathan N. Little wrote in
    >> Message-Id:<hebfer$65p$-september.org>:
    >>
    >>> Correct me if I am wrong, but unless it is a binary newsgroup all
    >>> content must be plain text in Usenet. To do otherwise is a common newbie
    >>> mistake that usually results in a good flaming...

    >>
    >> That's certainly a view that many people share and I for one wouldn't
    >> dispute it. I think there are exceptions though, such as the microsoft
    >> hierarchy I mentioned previously where HTML seems to be widely accepted,
    >> probably due to the functionality of Outlook and it's offspring.
    >> There's also a "clari" hierarchy that contains a high ratio of HTML.

    >
    > I think it is a matter of *server* not the *client*. Plain text is much
    > smaller storage and transport-wise.


    The server does nothing more than store the article. It could care less
    what the format is. the client does the translating.
    richard, Nov 24, 2009
    #8
  9. richard wrote:
    > On Sun, 22 Nov 2009 11:37:46 -0500, Jonathan N. Little wrote:
    >
    >> Steve Crook wrote:
    >>> On Sun, 22 Nov 2009 08:48:08 -0500, Jonathan N. Little wrote in
    >>> Message-Id:<hebfer$65p$-september.org>:
    >>>
    >>>> Correct me if I am wrong, but unless it is a binary newsgroup all
    >>>> content must be plain text in Usenet. To do otherwise is a common newbie
    >>>> mistake that usually results in a good flaming...
    >>> That's certainly a view that many people share and I for one wouldn't
    >>> dispute it. I think there are exceptions though, such as the microsoft
    >>> hierarchy I mentioned previously where HTML seems to be widely accepted,
    >>> probably due to the functionality of Outlook and it's offspring.
    >>> There's also a "clari" hierarchy that contains a high ratio of HTML.

    >> I think it is a matter of *server* not the *client*. Plain text is much
    >> smaller storage and transport-wise.

    >
    > The server does nothing more than store the article. It could care less
    > what the format is. the client does the translating.


    Read again: "Plain text is much smaller *storage and transport-wise*".
    The server does care about those things. Whether the additional size
    makes a significant difference has to be addressed, but storage and
    transport *are* server concerns.
    Harlan Messinger, Nov 24, 2009
    #9
  10. Steve Crook

    Terence Guest

    One e-mail service I can use, automatically treats any HTM content in
    a incoming message from abroad as SPAM, labels it as such and deletes
    it. Some various Forums which offer e-mailed content, work as normal
    text but any other formatted text (e.g. HM) gets eliminated (as do
    many forms of attatched files based solely on the suffix name and not
    the content...).
    Output HTM is allowed.
    Also note that non-english languages are full of upper half ascii-
    table symbols.
    Terence, Nov 24, 2009
    #10
  11. On Nov 22, 1:27 pm, Steve Crook <> wrote:
    > Hi all,
    >
    > Please excuse me if my question is a little off-topic for this group but
    > the level of activity in here suggests it's a good place to ask an HTML
    > related question.


    [...]

    > Steve


    Don't excuse yourself - if you do like that you already lost the game.
    Jan C. Faerber, Nov 24, 2009
    #11
  12. On Mon, 23 Nov 2009 23:36:26 -0800 (PST), Terence wrote:

    > Also note that non-english languages are full of upper half ascii-
    > table symbols.


    Huh? Non sequitur. UTF-8


    (Please learn how to quote in followups.)
    Allodoxaphobia, Nov 24, 2009
    #12
  13. Terence wrote:
    > One e-mail service I can use, automatically treats any HTM content in
    > a incoming message from abroad as SPAM, labels it as such and deletes
    > it. Some various Forums which offer e-mailed content, work as normal
    > text but any other formatted text (e.g. HM) gets eliminated (as do
    > many forms of attatched files based solely on the suffix name and not
    > the content...).
    > Output HTM is allowed.
    > Also note that non-english languages are full of upper half ascii-
    > table symbols.


    The ASCII table is the ASCII table. It has no variants. It has character
    positions up to 127. The upper half of the ASCII table, positions
    64-127, is where the lower- and upper-case letters a-z and A-Z are,
    along with the @ sign and a number of punctuation marks.

    You may be thinking of the variety of 256-character tables that are out
    there, whose lower halves coincide with ASCII.

    Anyway, that all has to do with the character set, not the content type.
    A Usenet posting can be declared with a variety of encodings, including
    US-ASCII (unlikely these days), KOI-8, ISO-8859-1, ISO-8859-8, UTF-8,
    Big-5, GB-2312, JIS, etc.
    Harlan Messinger, Nov 25, 2009
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Andrew Thompson

    Is usenet feed unstable?

    Andrew Thompson, Aug 18, 2004, in forum: Java
    Replies:
    4
    Views:
    302
    Murray
    Aug 18, 2004
  2. Daan
    Replies:
    15
    Views:
    681
    Jukka K. Korpela
    Aug 5, 2004
  3. Mac

    Usenet in your website

    Mac, Oct 17, 2005, in forum: HTML
    Replies:
    4
    Views:
    423
    Joker7
    Oct 18, 2005
  4. Replies:
    21
    Views:
    1,408
    Rex Ballard
    Jun 27, 2006
  5. richard
    Replies:
    7
    Views:
    1,413
    Allodoxaphobia
    Jan 10, 2011
Loading...

Share This Page