HTML on Usenet

S

Steve Crook

Hi all,

Please excuse me if my question is a little off-topic for this group but
the level of activity in here suggests it's a good place to ask an HTML
related question.

I'm the current maintainer of the Usenet filtering software, Cleanfeed.
(http://www.mixmin.net/cleanfeed)
At the moment, Cleanfeed places HTML postings to Usenet into four
categories:

MIME posts with HTML attached files
HTML posts (Content-Type: Text/HTML)
MIME Multipart/Alternatives with HTML components
Any HTML with embedded <img src=> tags

In general Multipart HTML is broadly accepted across Usenet providing a
Text alternative is included. Image tags are rejected everywhere,
except the microsoft hierarchy where pretty much anything goes. The
same is True for non-MIME Text/HTML content.

I'm in the process of refining some of the HTML filters and would
appreciate some feedback on what groups/sub-hierarchies should be
exempted from these rules. One simple approach would be to allow HTML
to any group with a '.html' element in its name but I'm sure there are
exceptions to such a simple statement. Embedded images are another area
where I'd appreciate views on their acceptability.

Steve
 
J

Jonathan N. Little

Steve said:
Hi all,

Please excuse me if my question is a little off-topic for this group but
the level of activity in here suggests it's a good place to ask an HTML
related question.

I'm the current maintainer of the Usenet filtering software, Cleanfeed.
(http://www.mixmin.net/cleanfeed)
At the moment, Cleanfeed places HTML postings to Usenet into four
categories:

MIME posts with HTML attached files
HTML posts (Content-Type: Text/HTML)
MIME Multipart/Alternatives with HTML components
Any HTML with embedded<img src=> tags
<snip>

Correct me if I am wrong, but unless it is a binary newsgroup all
content must be plain text in Usenet. To do otherwise is a common newbie
mistake that usually results in a good flaming...
 
S

Steve Crook

Correct me if I am wrong, but unless it is a binary newsgroup all
content must be plain text in Usenet. To do otherwise is a common newbie
mistake that usually results in a good flaming...

That's certainly a view that many people share and I for one wouldn't
dispute it. I think there are exceptions though, such as the microsoft
hierarchy I mentioned previously where HTML seems to be widely accepted,
probably due to the functionality of Outlook and it's offspring.
There's also a "clari" hierarchy that contains a high ratio of HTML.

At least for the big-8 and alt, (excluding binaries), I'd like to take
the stance that all posts are text only, or at least contain a
text/plain element. I wanted to gather opinions from a group related
specifically to HTML to see if it was contentious. You reply suggests
it's not. :)
 
J

Jonathan N. Little

Steve said:
That's certainly a view that many people share and I for one wouldn't
dispute it. I think there are exceptions though, such as the microsoft
hierarchy I mentioned previously where HTML seems to be widely accepted,
probably due to the functionality of Outlook and it's offspring.
There's also a "clari" hierarchy that contains a high ratio of HTML.

At least for the big-8 and alt, (excluding binaries), I'd like to take
the stance that all posts are text only, or at least contain a
text/plain element. I wanted to gather opinions from a group related
specifically to HTML to see if it was contentious. You reply suggests
it's not. :)
 
J

Jonathan N. Little

Steve said:
That's certainly a view that many people share and I for one wouldn't
dispute it. I think there are exceptions though, such as the microsoft
hierarchy I mentioned previously where HTML seems to be widely accepted,
probably due to the functionality of Outlook and it's offspring.
There's also a "clari" hierarchy that contains a high ratio of HTML.

I think it is a matter of *server* not the *client*. Plain text is much
smaller storage and transport-wise.
 
R

richard

Hi all,

Please excuse me if my question is a little off-topic for this group but
the level of activity in here suggests it's a good place to ask an HTML
related question.

I'm the current maintainer of the Usenet filtering software, Cleanfeed.
(http://www.mixmin.net/cleanfeed)
At the moment, Cleanfeed places HTML postings to Usenet into four
categories:

MIME posts with HTML attached files
HTML posts (Content-Type: Text/HTML)
MIME Multipart/Alternatives with HTML components
Any HTML with embedded <img src=> tags

In general Multipart HTML is broadly accepted across Usenet providing a
Text alternative is included. Image tags are rejected everywhere,
except the microsoft hierarchy where pretty much anything goes. The
same is True for non-MIME Text/HTML content.

I'm in the process of refining some of the HTML filters and would
appreciate some feedback on what groups/sub-hierarchies should be
exempted from these rules. One simple approach would be to allow HTML
to any group with a '.html' element in its name but I'm sure there are
exceptions to such a simple statement. Embedded images are another area
where I'd appreciate views on their acceptability.

Steve

It has been a standard practice for a number of years now that unless it is
plain text, with no html, no images, no binaries, then it is allowed in the
group. Anything else must be posted to a binary group.

The only exceptions are the "stationery" groups and those who have
established charters allowing the binaries.

A few years back I did create a group called alt.binaries.html for the sole
purpose of posting web pages for review. But no one was interested.

Most servers will cancel a binary posted a to a text only group.
 
R

richard

That's certainly a view that many people share and I for one wouldn't
dispute it. I think there are exceptions though, such as the microsoft
hierarchy I mentioned previously where HTML seems to be widely accepted,
probably due to the functionality of Outlook and it's offspring.
There's also a "clari" hierarchy that contains a high ratio of HTML.

At least for the big-8 and alt, (excluding binaries), I'd like to take
the stance that all posts are text only, or at least contain a
text/plain element. I wanted to gather opinions from a group related
specifically to HTML to see if it was contentious. You reply suggests
it's not. :)

I believe you should read the charters of the groups before allowing html
encoded stuff to be posted. The main reason you find it in some groups is
because OE is one of few clients that can allow the translation. Ergo, you
need OE to read it or view it. As many clients simply ignore the encoding.
When html is posted and read as pure text, it is hard to read the content.
 
R

richard

I think it is a matter of *server* not the *client*. Plain text is much
smaller storage and transport-wise.

The server does nothing more than store the article. It could care less
what the format is. the client does the translating.
 
H

Harlan Messinger

richard said:
The server does nothing more than store the article. It could care less
what the format is. the client does the translating.

Read again: "Plain text is much smaller *storage and transport-wise*".
The server does care about those things. Whether the additional size
makes a significant difference has to be addressed, but storage and
transport *are* server concerns.
 
T

Terence

One e-mail service I can use, automatically treats any HTM content in
a incoming message from abroad as SPAM, labels it as such and deletes
it. Some various Forums which offer e-mailed content, work as normal
text but any other formatted text (e.g. HM) gets eliminated (as do
many forms of attatched files based solely on the suffix name and not
the content...).
Output HTM is allowed.
Also note that non-english languages are full of upper half ascii-
table symbols.
 
J

Jan C. Faerber

Hi all,

Please excuse me if my question is a little off-topic for this group but
the level of activity in here suggests it's a good place to ask an HTML
related question.
[...]

Steve

Don't excuse yourself - if you do like that you already lost the game.
 
H

Harlan Messinger

Terence said:
One e-mail service I can use, automatically treats any HTM content in
a incoming message from abroad as SPAM, labels it as such and deletes
it. Some various Forums which offer e-mailed content, work as normal
text but any other formatted text (e.g. HM) gets eliminated (as do
many forms of attatched files based solely on the suffix name and not
the content...).
Output HTM is allowed.
Also note that non-english languages are full of upper half ascii-
table symbols.

The ASCII table is the ASCII table. It has no variants. It has character
positions up to 127. The upper half of the ASCII table, positions
64-127, is where the lower- and upper-case letters a-z and A-Z are,
along with the @ sign and a number of punctuation marks.

You may be thinking of the variety of 256-character tables that are out
there, whose lower halves coincide with ASCII.

Anyway, that all has to do with the character set, not the content type.
A Usenet posting can be declared with a variety of encodings, including
US-ASCII (unlikely these days), KOI-8, ISO-8859-1, ISO-8859-8, UTF-8,
Big-5, GB-2312, JIS, etc.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top