My multipost-detecting usenet bot (David Filmer)

J

John Bokma

Mumia W. said:
[...]
If I've angered or annoyed anyone, I do apologize. I had no such
intent.
[...]

Thank you Mr. Filmer. I can see how the 'bot would reduce a
lot of work, but, as you've acknowledged, its message was a
little long and harsh.

Whatever you do, please don't release the code. Hip-ç-rime
would make usenet a nightmare with it.

Uhm? People who need such programs can just download them including
software to auto-cancel and repost massively.
 
U

usenet

John said:
Personally I don't have a problem with a crosspost if it's really needed
*and* has the follow-up to header set to the most appropriate group.

In my observation, many (if not most) crossposts come from
GoogleGroups. You cannot set a follow-up in GG - it does it for you
(and I'm 99.9% sure it sets it for all groups you've x-posted to).

Many new OPs, I believe, don't really know how (functionally) to
crosspost (and I think that's one reason why they multipost). I'm
reluctant to suggest that crossposting might be OK because many of
those posters won't read the fine print, and I think new OPs will
readily abuse it - they will tend to crosspost similar Perl newsgroups
(with all-inclusive follow-ups) and the groups will begin to look like
mirrors of each other.

I believe that it's unusual for most people to have a valid need to
crosspost (I can't recall needing to crosspost in years, and I rarely,
if ever, see Perl crossposts from known and respected posters, unless
it's a reply to an x-posted message with broad follow-ups). I think it
would be downright rare for a new poster to have a valid need to
crosspost, and I'd rather take a discouraging posture when mentioning
it.
 
A

axel

John Bokma said:
[...]
If I've angered or annoyed anyone, I do apologize. I had no such
intent.
[...]
Thank you Mr. Filmer. I can see how the 'bot would reduce a
lot of work, but, as you've acknowledged, its message was a
little long and harsh.
Whatever you do, please don't release the code. Hip-ç-rime
would make usenet a nightmare with it.
Uhm? People who need such programs can just download them including
software to auto-cancel and repost massively.

It makes me wonder what software Cantor & Siegel used in their
infamous massive multiposting.

And that was over a decade ago.

I probably mentioned it before, but I found their book _How to Make a
Fortune on the Information Superhighway _ in a remainder bookshop and
bought it just for fun.

Axel
 
J

John Bokma

I believe that it's unusual for most people to have a valid need to
crosspost (I can't recall needing to crosspost in years, and I rarely,
if ever, see Perl crossposts from known and respected posters, unless
it's a reply to an x-posted message with broad follow-ups). I think it
would be downright rare for a new poster to have a valid need to
crosspost, and I'd rather take a discouraging posture when mentioning
it.

Yup agreed. Xpost is rarely needed, I have xposted before but it's a small
% of my total except in those cases I was not aware that something was
xposted to 5 groups [1] :-D.

[1] Ages ago I have contributed to a thread that was xposted in 13 (!) or
so groups :-D.
 
J

John Bokma

John Bokma said:
On 08/14/2006 05:28 PM, (e-mail address removed) wrote:
[...]
If I've angered or annoyed anyone, I do apologize. I had no such
intent.
[...]
Thank you Mr. Filmer. I can see how the 'bot would reduce a
lot of work, but, as you've acknowledged, its message was a
little long and harsh.
Whatever you do, please don't release the code. Hip-ç-rime
would make usenet a nightmare with it.
Uhm? People who need such programs can just download them including
software to auto-cancel and repost massively.

It makes me wonder what software Cantor & Siegel used in their
infamous massive multiposting.

No idea. Posting to Usenet is not black art, you can do it manually via
telnet. Even if you write it all yourself in Perl (I once did as an
excersise) it doesn't take more then a few hours to have a working version
that posts test messages :)
And that was over a decade ago.

I probably mentioned it before, but I found their book _How to Make a
Fortune on the Information Superhighway _ in a remainder bookshop and
bought it just for fun.

:-D. I would have bought it to, and not because English books are rare
here in Mexico :-D.
 
M

Mumia W.

I don't grok "Hip-ç-rime"...

-jp

He is someone who attempts to destroy usenet every year or so.
I dare not spell his name correctly because he might be
searching for references to himself and if he finds any, he
might resurface.
 
U

usenet

Mumia said:
He is someone who attempts to destroy usenet every year or so.

Wikipedia has a brief article, but I won't post a link because it
contains the word in cleartext (it's never a good idea to type the name
of trolls or vandals into cleartext messages). You ought to be able to
un-obfuscate the name easily enough (it has no dashes and can be
represented in 7-bit ASCII).
 
N

Nomen Nescio

(Note: This message is crossposted to the following newsgroups, as
these groups are affected by the subject bot: comp.lang.perl.misc,
perl.beginners, comp.lang.perl.modules, perl.dbi.users,
perl.beginners.cgi, alt.perl)

Greetings. As many of you are doubtless aware, I recently wrote and
deployed a usenet 'bot which identifies multiposted messages. After
manually flagging such messages for some time, it occurred to me that I
could let Perl do the work for me, and laziness took over.

Are you competing against Alan Connor for netkook status?

Maybe your bot should say <article not downloaded> before all the
# BLOCKS OF TEXT #.

If multiposts offend you, why don't you just ignore them like the
rest of us?
 
J

John Bokma

David,

Two remarks:

1 - My Usenet client shows your bot's post as a new post instead
of a follow up to the original multipost. No idea if this is a bug
in my client, but if not, can this be fixed?

2 - Yesterday a fine example of one of my worries popped up: I saw
a post as a reply to a cancelled spam message. The problem is
that there is some time between spam being posted and the cancel,
so your bot might reply to each multiposted spam message.

Maybe a solution might be to scan the multipost for some keywords.
If they don't show up, don't let your bot post.

See also Dr. Ruud's reply, <[email protected]> which is a
reply to your bot replying to, again, a cancelled spam message.

Personally I again strongly advice to stop the bot until at least some
kind of voting has taken place. It still generates more posts then it
"prevents" at the moment.
 
U

usenet

John said:
1 - My Usenet client shows your bot's post as a new post instead
of a follow up to the original multipost. No idea if this is a bug
in my client, but if not, can this be fixed?

I wouldn't want to suggest that your reader has a bug, but I do set
"References" and "In-Reply-To" headers. Even in GG, this shows info up
(see http://tinyurl.com/mo4op):

References: <[email protected]>
<[email protected]>
In-Reply-To: <[email protected]>

I'm not sure why your reader wouldn't show that as well.
2 - Yesterday a fine example of one of my worries popped up: I saw
a post as a reply to a cancelled spam message.

The message isn't cancelled in GigaNews (as of this writing). And even
GG also still shows it.
that there is some time between spam being posted and the cancel,
so your bot might reply to each multiposted spam message.

The bot won't reply to a message less than five minutes old. Maybe I
should open that up a bit... but how long do cancels take to run? I
thought they ran within a minute or two on modern newsservers.
See also Dr. Ruud's reply, <[email protected]> which is a
reply to your bot replying to, again, a cancelled spam message.

As I mentioned to the good Doc, I believe a good (but very simple) spam
filtering strategy would be to look for web URLs in the body of the
message. Spam almost always has a web URL, but I don't ever remember
seeing a newbie post with a web URL. I think such a filtering rule
should be nearly 100% effective.
Personally I again strongly advice to stop the bot until at least some
kind of voting has taken place.

I'm still open to such an idea of a vote, but unsure how to implement
it. The info you kindly provided previously on the topic seems pretty
focused on newsgroup creation; it doesn't seem that it is designed to
be used (or has ever been used) for in-forum voting. How did CLPMisc
ever ratify the posting guidelines? I wasn't around back then...
 
S

Sherm Pendley

As I mentioned to the good Doc, I believe a good (but very simple) spam
filtering strategy would be to look for web URLs in the body of the
message. Spam almost always has a web URL, but I don't ever remember
seeing a newbie post with a web URL. I think such a filtering rule
should be nearly 100% effective.

It could monitor a few other groups that have nothing whatsoever to do with
Perl, CGI, or even programming - rec.food.cooking, for example. A message
that appears in a variety of unrelated groups is likely to be spam.

sherm--
 
U

usenet

Sherm said:
(e-mail address removed) writes:
It could monitor a few other groups that have nothing whatsoever to do with
Perl, CGI, or even programming - rec.food.cooking, for example. A message
that appears in a variety of unrelated groups is likely to be spam.

That was also suggested previously... Lemme give that some thought.

To effectively cross-reference even a very small percentage of usenet,
I think I would need to look at several hundred groups.

My main concern is that I don't want to go astray of my usenet
provider's ToS, and hitting several hundred groups repeatedly could
possibly be interpreted as abusive (I'm not sure, since GigaNews
doesn't specify a specific abuse threshold. They don't prohibit bots,
as some providers do, but a bot that did something like that might
constitute abuse in their opinion).
 
J

John Bokma

I wouldn't want to suggest that your reader has a bug, but I do set
"References" and "In-Reply-To" headers. Even in GG, this shows info up
(see http://tinyurl.com/mo4op):

References: <[email protected]>
<[email protected]>
In-Reply-To: <[email protected]>

I'm not sure why your reader wouldn't show that as well.

I see:

Subject: Workflow Systems--Multiposted

Maybe the missing Re: breaks Xnews? Moreover, I suggest to follow the
netiquette: if you change the subject do it like:

New subject (was: old subject).

I suggest something like:

Multiposted (was: Workflow Systems)

Not sure if there should be a Re: in front though :)
The message isn't cancelled in GigaNews (as of this writing). And
even GG also still shows it.

The (my) problem is that I now start to see replies to canceled messages
:-(
The bot won't reply to a message less than five minutes old. Maybe I
should open that up a bit... but how long do cancels take to run? I
thought they ran within a minute or two on modern newsservers.

Depends on who cancels them :) It can be hours :) But yeah, if you
wait to long, other people might have been replying to the multiposter.
OTOH, I would check first if the multiposter has had any replies. If he
has, then don't post (IIRC you do that now?)
As I mentioned to the good Doc, I believe a good (but very simple)
spam filtering strategy would be to look for web URLs in the body of
the message. Spam almost always has a web URL, but I don't ever
remember seeing a newbie post with a web URL. I think such a
filtering rule should be nearly 100% effective.

As long as there are more false negatives then positives I am happy :)
I'm still open to such an idea of a vote, but unsure how to implement
it. The info you kindly provided previously on the topic seems pretty
focused on newsgroup creation; it doesn't seem that it is designed to
be used (or has ever been used) for in-forum voting. How did CLPMisc
ever ratify the posting guidelines? I wasn't around back then...

No idea :) You might contact the vote collectors (VVU?). They might be
able to provide an answer and help you with the vote collection if
needed.
 
J

John Bokma

That was also suggested previously... Lemme give that some thought.

To effectively cross-reference even a very small percentage of usenet,
I think I would need to look at several hundred groups.

My main concern is that I don't want to go astray of my usenet
provider's ToS, and hitting several hundred groups repeatedly could
possibly be interpreted as abusive (I'm not sure, since GigaNews
doesn't specify a specific abuse threshold. They don't prohibit bots,
as some providers do, but a bot that did something like that might
constitute abuse in their opinion).

How about checking for

/perl/i
/cgi/i

and monitor some multi posts for more words? Can't be that many :)
 
U

usenet

John said:
How about checking for
/perl/i
/cgi/i
and monitor some multi posts for more words? Can't be that many :)

Lemme go back and research some multiposts that I (and others) have
manually flagged and see how effective a keyword search would be. I
might find that /doesn'?t work/i would catch most of them... ;^))
The (my) problem is that I now start to see replies to canceled messages

But my point is that the messages aren't cancelled... at least not on
my newsserver. Are they cancelled on yours (ie, do you see the reply
but not the original?)
Depends on who cancels them

I was talking about automated spam cancels (by filtering systems), not
user cancels. Or is spam filtering done prior to publishing the post?
I'm not sure how newsservers filter out spam.
If he has [replies], then don't post (IIRC you do that now?)

Not quite. The bot will flag the original multipost, even if it has
replies (it ignores ALL replies, including replies which might be
cut-and-paste answers to multiposted questions, or even cut-and-paste
answers to similar questions, either of which would hash out as
multiposts but aren't really multiposts, IMHO).

But I'm not sure that it is undesirable for the bot to reply to an
answered thread. There have been past cases where I've (unknowingly)
replied to a multiposted question, only to jump to another group and
realize it was multiposted. In that case, I still respond back to both
threads and flag the post. Even though I previously replied (and, who
knows, I might have even replied correctly), flagging the message may
discourage further "rewarding" the OP with additional assistance (or
corrections to a crap answer I gave... serves the guy right).

I'm not sure why the bot should not act in the same manner...
New subject (was: old subject).
I suggest something like:
Multiposted (was: Workflow Systems)

I've already implemented something like this per Dr.Ruud's suggestion
in another thread.
 
J

John Bokma

Lemme go back and research some multiposts that I (and others) have
manually flagged and see how effective a keyword search would be. I
might find that /doesn'?t work/i would catch most of them... ;^))


But my point is that the messages aren't cancelled... at least not on
my newsserver. Are they cancelled on yours (ie, do you see the reply
but not the original?)

Yes.

Notice that a cancel message is a post with a request of removal. Each
news server can be configured to honor such a request, to not honor it
and can be configured to propagate such requests or not.
I was talking about automated spam cancels (by filtering systems), not
user cancels. Or is spam filtering done prior to publishing the post?
I'm not sure how newsservers filter out spam.

IIRC a spam filter will just drop the post. However, there are cancel
bots that work similar like yours, they just post a different message
(cancel), and there is a delay with those, of course.

I am also talking about manual cancels. In general you don't want to
reply to *any* message that is canceled. Practically this is impossible,
so I would be happy if all spam is not considered multipost.
If he has [replies], then don't post (IIRC you do that now?)

Not quite. The bot will flag the original multipost, even if it has
replies (it ignores ALL replies, including replies which might be
cut-and-paste answers to multiposted questions, or even cut-and-paste
answers to similar questions, either of which would hash out as
multiposts but aren't really multiposts, IMHO).

But I'm not sure that it is undesirable for the bot to reply to an
answered thread. There have been past cases where I've (unknowingly)
replied to a multiposted question, only to jump to another group and
realize it was multiposted. In that case, I still respond back to
both threads and flag the post. Even though I previously replied (and,
who knows, I might have even replied correctly), flagging the message
may discourage further "rewarding" the OP with additional assistance
(or corrections to a crap answer I gave... serves the guy right).

I'm not sure why the bot should not act in the same manner...

IMO no. Just assume that if there has been one reply that the poster
just got away with it. You might catch him/her later anyway. I would
(again) recommend to limit the number of messages the bot sends out.
I've already implemented something like this per Dr.Ruud's suggestion
in another thread.

Yeah, he beat me again :)
 
U

usenet

I believe a good (but very simple) spam filtering strategy would
be to look for web URLs in the body of the message.

This was quick and easy, and I have done so.

I also noticed earlier that a pyramid scheme got flagged. All pyramid
schemes will contain an e-mail address in the message body (but real
multiposts rarely will), so I have coded around e-mail addresses as
well. Presently, messages are ignored if the message body (excluding
cut/tag line) matches per (using Regexp::Common):

if ($body =~m{
$RE{URI}{HTTP}{-scheme => qr{https?}}
| $RE{URI}{FTP}
| $RE{URI}{news}
| $RE{URI}{NNTP}
}xms) {
$log->debug("Ignoring msg with URI (spam?)");
next MSGNUM;
}
if ($body =~ m{($RE{Email}{Address})}xms) {
$log->debug("Ignoring msg with e-mail address (pyramid?)");
next MSGNUM;
}

I think that will ignore 99% of spam (with very few false negatives).
 
B

Ben Morrow

Quoth (e-mail address removed):
But my point is that the messages aren't cancelled... at least not on
my newsserver. Are they cancelled on yours (ie, do you see the reply
but not the original?)

Something you could perhaps try is keeping a database of all your bot's
posts and cancelling them if the original article gets cancelled.

Ben
 
J

John Bokma

This was quick and easy, and I have done so.

I also noticed earlier that a pyramid scheme got flagged. All pyramid
schemes will contain an e-mail address in the message body (but real
multiposts rarely will), so I have coded around e-mail addresses as
well.

Yup, you can't ignore them based on a simple count of $$$$$ :-D.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,479
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top