Are spams on comp.lang.python a major nuisance?

S

skip

I took over spam filter management for the python.org mailing lists a couple
months ago and made a few changes to the way the spam filter is trained.
Things seem to be at a reasonable level as far as I can tell (I see a few
spams leak through each day), though I wasn't actively reading
comp.lang.python/[email protected] before I took over the task, so I
have nothing to compare with. Does the level of spam leaking through the
filter now seem excessive? Is it more or less than in June and July?

Thanks,

Skip Montanaro
 
S

Steven D'Aprano

I took over spam filter management for the python.org mailing lists a
couple months ago and made a few changes to the way the spam filter is
trained. Things seem to be at a reasonable level as far as I can tell (I
see a few spams leak through each day), though I wasn't actively reading
comp.lang.python/[email protected] before I took over the task, so
I have nothing to compare with. Does the level of spam leaking through
the filter now seem excessive? Is it more or less than in June and
July?


I don't have any objective numbers, but subjectively it seems to me that
the number of spams is significantly higher, but not so high as to be a
major nuisance.
 
M

MRAB

I don't have any objective numbers, but subjectively it seems to me that
the number of spams is significantly higher, but not so high as to be a
major nuisance.
I think it changes over time. You might tweak the filter to reduce the
amount of spam getting through but then the spammers adapt. In other
words, spam is a moving target.
 
T

Tim Rowe

2008/9/26 Steven D'Aprano said:
I don't have any objective numbers, but subjectively it seems to me that
the number of spams is significantly higher, but not so high as to be a
major nuisance.

I consider *any* spam to be a major nuisance, but I don't see them as
being the fault of python-list which seems to do a pretty good job of
blocking them
 
A

Aaron \Castironpi\ Brady

I consider *any* spam to be a major nuisance, but I don't see them as
being the fault of python-list which seems to do a pretty good job of
blocking them

Is it worth mentioning that they come from the same author in a short
period of time? Maybe that could bump up the score a notch.

I think in June and July they were selling watches a lot which I
haven't noticed recently.
 
N

nntpman68

Hm,


I guess you just filter mailing lists and can do nothing about the
newsgroup if I'm fetching via the nntp server of my ISP itself, right?

I'm using Thunderbird as news reader and this is probably not the
smartest news reader, though I like it a lot for mails.

Is there any pythonable (or perlable) news reader running under windows
/ cygwin or any way (under WIN XP) to use scripts to filter newsgroups
for Thunderbird?

- I'm annoyed by any spam.
It's tough to find good rules, but the incoming spams that I see
currently on comp.lang.python have certain criteas.



- most email addresses from gmail.
- all never posted before and then they have multiple posts within a few
minutes / seconds
- the posts always contain one or more urls ( mostly cryptic names )
- they always start a thread but never reply to one
- the post doesn't contain python code or anything which looks only
vaguely like source code
- never mentions the word python
- the amount of sexual or financial vocabulary exceeds classical python
posts



bye


N
 
T

Terry Reedy

nntpman68 said:
I guess you just filter mailing lists and can do nothing about the
newsgroup if I'm fetching via the nntp server of my ISP itself, right?

I am reading via gmane.comp.python.general, so I will benefit from
filtering spam posted to c.l.p. Anyone with a news reader can do the
same. Just make a news.gmane.org account with standard port setting.
Works fine with OE and now Tbird. I believe posts to gmane are sent to
the mailing list first, before appearing even on gmane.
- I'm annoyed by any spam.

Ditto. I would say 'nuisance' rather than 'major nuisance'.
Improvements are appreciated here.
It's tough to find good rules, but the incoming spams that I see
currently on comp.lang.python have certain criteas.

- most email addresses from gmail.
- all never posted before and then they have multiple posts within a few
minutes / seconds
- the posts always contain one or more urls ( mostly cryptic names )
- they always start a thread but never reply to one
- the post doesn't contain python code or anything which looks only
vaguely like source code
- never mentions the word python
- the amount of sexual or financial vocabulary exceeds classical python
posts

Pretty good list.
 
A

Aaron \Castironpi\ Brady

I took over spam filter management for the python.org mailing lists a couple
months ago and made a few changes to the way the spam filter is trained.
Things seem to be at a reasonable level as far as I can tell (I see a few
spams leak through each day), though I wasn't actively reading
comp.lang.python/[email protected] before I took over the task, so I
have nothing to compare with.  Does the level of spam leaking through the
filter now seem excessive?  Is it more or less than in June and July?

Thanks,

Skip Montanaro

Is there such a thing as an open-source spam filter? That way any
time anyone had spare time and got annoyed, they could dump a short
snippet of code into the grinder.

Check-in would be tricky. It would need lots of votes, and voters
would see a list of retroactive consequences of the change. (Marks
these five things as spam.) I'm not sure that the rule-making is any
better in the hands of many than it is of one (in general, to the OP),
considering the power of stupid in large numbers, and the ease of
submitting a filter for 'if name== "D'Aprano"'. That is, surely Skip
wouldn't do that, but a group might.

I've never gone spamming, so I don't know: Is it really easy (and
necessarily profitable) to see, "if 'python' not in contents" and add
the word to the mail? Or is it not worth the time it takes to catch
that list? They're greedy, not bored.
 
S

skip

Aaron> Is there such a thing as an open-source spam filter? That way
Aaron> any time anyone had spare time and got annoyed, they could dump a
Aaron> short snippet of code into the grinder.

Yes: <http://spambayes.sf.net/> though I think your model of how it works
probably needs a bit of refinement. <wink> You might want to read through
the background reading page: <http://spambayes.sf.net/background.html>.

It's what we're running on the python.org site. The initial impetus for it
came from a desire to filter spam for python.org mailing lists. If you'd
like to contribute we can always use developers.

Skip
 
G

George Sakkis

I read the group via NNTP, and I find that blocking all
articles posted from google.groups gets rid of all of the spam.

.... along with a far from trivial (I guess) percentage of non-spam,
such as this post.

And-nothing-of-value-was-lost'ly yrs,
George
 
A

Aaron \Castironpi\ Brady

... along with a far from trivial (I guess) percentage of non-spam,
such as this post.

And-nothing-of-value-was-lost'ly yrs,
George

Every method has false positives, George. (including this one.)
 
S

skip

Aaron> Every method has false positives, George. (including this one.)

George makes a good point though, and it is a key element of how SpamBayes
works. A single clue is not a binary selector for ham or spam.

Skip
 
A

Aaron \Castironpi\ Brady

As a Google user, you have (presumably) more clout with them than
those of us who are not. Please pressure your provider to reduce the
spam they output so the above drastic measure is not so attractive.

Such pressure may be more effective if you *also* use an alternate
NNTP provider that isn't such a spam-haven.

--
 \             “I put contact lenses in my dog's eyes. They had little |
  `\   pictures of cats on them. Then I took one out and he ran around |
_o__)                                      in circles.” —Steven Wright |
Ben Finney

I composed a thread to the end of voicing that sentiment.

http://groups.google.com/group/Groups-Suggestions/browse_thread/thread/142ce723675bcad3#

Feel free to follow this.

For the record, I do find the fervor with which some netizens are
denouncing Google somewhat provocative. I find them biased, more
ardent than a classification heuristic with the same number of false
negatives and false positives. That is, not purely objective in their
advocacy.
 
B

Bob Cortopassi

- I'm annoyed by any spam.
It's tough to find good rules, but the incoming spams that I see
currently on comp.lang.python have certain criteas.

- most email addresses from gmail.
....snip rest of good filter criteria...

Killing all messages with "googlegroups" in the messageid will get rid
of a vast majority of the spam. Killing anything with a gmail.com
email address will end up killing more legitimate posters.
 
A

Aaron \Castironpi\ Brady

I think in June and July they were selling watches a lot which I
haven't noticed recently.

Gucci 104 G-Bandeau Watches - Gucci Watches Discount Rolex Oyster
Perpetual Lady Datejust Pearlmaster 18kt Yellow Gold Diamond Ladies
Watch 80318C
Cartier Must 21 Watches - Cartier Watches Discount

I speak too soon.
 
A

Aaron \Castironpi\ Brady

If we start blocking users who have no previous posts and then post
many new messages at once, then we wll just push the spammers to forge
active list users and reply to threads. That would be a worse
situation then we are in now. I say leave well enough alone.

Read about the Brain Blood Barrier
(http://en.wikipedia.org/wiki/Brain_Blood_Barrier) for an example in
nature where although a method to stop an attacker exists, it is not
overused to prevent the attacker from becoming more powerful.

--
Dotan Cohen

http://what-is-what.comhttp://gibberish.co.il
×-ב-×’-ד-×”-ו-×–-×—-ט-×™-ך-×›-ל-×-מ-ן-× -ס-×¢-×£-פ-×¥-צ-ק-ר-ש-ת

ä-ö-ü-ß-Ä-Ö-Ü

Yow! I forgot the radiolabeled polyethylene glycol coated
hexadecylcyanoacrylate nanospheres!

CMIIW correct me if I'm wrong. Google Groups is a Usenet/c-l-py
gateway. Other gateways aren't contributing to spam. What are they
doing that G-Groups is not?
 
S

Skip Montanaro

CMIIW correct me if I'm wrong.  Google Groups is a Usenet/c-l-py
gateway.  Other gateways aren't contributing to spam.  What are they
doing that G-Groups is not?

Actually Google Groups appears to be just displaying the Usenet
newsgroup
comp.lang.python. The spam filtering which is the topic of this
thread
is applied to the mailing list (e-mail address removed) side of things.
The
gateway between the mailing list and the Usenet newsgroup is on
mail.python.org I believe.

As to what Google Groups isn't doing, it's not clear. I just visited
this
group and saw lots of spam. My guess is that we on the mailing list
side
of things don't see a lot of that because of the spam filter. It
seems
Google Groups makes it more difficult to report/eliminate spam than
other
more traditional Usenet newsgroup software might. First you need to
view
the message (even though it's frequently obvious from the subject
alone
that it's spam), then click the "More Options" link, then the "Report
Message" link, then type something in the description field of the
form they display and click the Submit button. After that, who knows
how long it takes for them to send out a Usenet cancel message? Most
people probably see the subject and move on to the next message.

In short, it would appear that Google makes it harder to cancel spam
than they ought to. Why they don't have spam filters similar to
what's
on Gmail to trap this stuff is unclear.

Skip

(Sent via Google Groups, so Grant will probably not see this...)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top