RubyGarden Spam

G

Gavin Sinclair

If the spam is entered by a script, then the wiki code should be able to
use some simple heuristics to block the most annoying crap.
For example, if the diff from the old page to the new page is greater
than some percentage, or if the new page contains X number of links to
the same site.

X number of links to _any_ site should be good enough. Automatic spam
could then go fine-grained to get under the radar, at which time a
secondary heuristic is needed, or X becomes zero. Your statement
below provides ample justification for that.
Might this cause a problem for legit users once in a while? Sure. But
we have that now, with spam clean-up.

Gavin
 
A

Austin Ziegler

If the spam is entered by a script, then the wiki code should be able to
use some simple heuristics to block the most annoying crap.

For example, if the diff from the old page to the new page is greater
than some percentage, or if the new page contains X number of links to
the same site.

That's more or less the idea behind an "entropy" value that gets saved
on a page right now -- I haven't figured out exactly what I'm going to
do with it, but it offers interesting possibilities.

-austin
 
A

Austin Ziegler

How about displaying a trivial line of Ruby code and asking the user to
enter the value. Something like

To stop spammers, please enter the value of the following

1.+(2) = | |

Change the + to a - or * randomly, and pick random numbers between 1
and 9

That works for Ruby developers' wikis, but not for the general case.
Although my current "clients" for Ruwiki are all developers, I intend
to aim it a bit wider.

-austin
 
P

Patrick May

Hello,

You should create a way to generate images with text
verification. This would eliminate spam.

The only way to stop wiki spam is to have a dedicated admin.
Creativity helps reduce the time burden, but it is a constant endeavor.

A tarpit would be easier to implement than a captcha. In the usemod
settings, you use NetAddr::IP to check if the env's Remote Addr is
within a known spammer domain. If it is a spammer, set the pages
database to a copy. Nightly / weekly / whatever, dump the latest pages
directory on top of the tarpit.

There goes one of my points for my presentation :)

The main resource in fighting spammers is time. You want to waste
their time, let them think that things are working.

Cheers,

Patrick
 
P

Patrick May

Hello,
The only way to stop wiki spam is to have a dedicated admin.
Creativity helps reduce the time burden, but it is a constant > endeavor.

A tarpit would be easier to implement than a captcha. In the usemod
settings, you use NetAddr::IP to check if the env's Remote Addr is
within a known spammer domain. If it is a spammer, set the pages
database to a copy. Nightly / weekly / whatever, dump the latest
pages directory on top of the tarpit.

I said domain. I meant subnet. You can just put a whole isp on
probation and not allow changes from it to be propagated to the main
database.

Cheers,

Patrick
 
D

David Ross

Patrick said:
Hello,



I said domain. I meant subnet. You can just put a whole isp on
probation and not allow changes from it to be propagated to the main
database.

Cheers,

Patrick

adding a whole isp to a probation list can lead to full scale lockout. I
have dozens of proxies from many ISPs. I like that idea though. hmmm
spam trap....

Are you saying to set up a trigger for people who post to a certain page?

--dross
 
P

Patrick May

adding a whole isp to a probation list can lead to full scale lockout.
I have dozens of proxies from many ISPs. I like that idea though. hmmm
spam trap....

Are you saying to set up a trigger for people who post to a certain
page?

No. It goes further. You set up a trigger to recognize vandals by IP
address. You push their changes to an alternate database. They can
see the site, they can make changes, they can see their changes on the
site.

No one except the other vandals sees their changes. And every night,
everything the vandals do is washed away.

Cheers,

Patrick
 
D

David Ross

Patrick said:
No. It goes further. You set up a trigger to recognize vandals by IP
address. You push their changes to an alternate database. They can
see the site, they can make changes, they can see their changes on the
site.

No one except the other vandals sees their changes. And every night,
everything the vandals do is washed away.

Cheers,

Patrick

Superb idea Patrick. Very interesting. I think that is the better idea
out of all. Hmm.. I guess there could be multiple ways of detecting
spammers.

- regex
- trigger page
- morons who try to post 4 in under 10 seconds
- spam detection as other mail filters implement

What other good ways would there be to detect spammers?

--dross
 
A

Austin Ziegler

A tarpit would be easier to implement than a captcha. In the usemod
settings, you use NetAddr::IP to check if the env's Remote Addr is
within a known spammer domain. If it is a spammer, set the pages
database to a copy. Nightly / weekly / whatever, dump the latest pages
directory on top of the tarpit.

There goes one of my points for my presentation :)

The main resource in fighting spammers is time. You want to waste
their time, let them think that things are working.

I'm approaching it, again, from a slightly different perspective. My
goal is to make the page seem as if it were entirely a read-only
website to robots, and 403 if they are known bad crawlers. I don't yet
have IP banning, but I have robot exclusion.

-austin
 
P

Patrick May

Austin,

I'm approaching it, again, from a slightly different perspective. My
goal is to make the page seem as if it were entirely a read-only
website to robots, and 403 if they are known bad crawlers. I don't yet
have IP banning, but I have robot exclusion.

Read-only to robots makes sense as a way of preventing accidental
problems. I used to have a delete link on the wiki. All my pages kept
getting deleted. I guessed that it was a robot gone amuck [1] . I
also like the bit about recognizing bad crawlers. No harvesting for
old fashioned spam is a good thing.

The thing about banning is that it is easy for the vandal to tell that
they have been detected. I tried using Apache Deny directives to
manage abuse, but sometimes that just encourages the vandal to switch
computers. Plus the cost of a false positive is denial of service.
After one particularly annoying episode, I realized that the vandal was
trying to waste my time. So I setup the tarpit system to waste his,
and haven't lost sleep since.

I still do alot of cleanup on my wikis, and I still use Deny
directives. Nothing replaces an active administrator. The tarpit just
gave me another lever to help me manage the problem.

Cheers,

Patrick

1. I didn't labor to much over it, I just deleted the Delete link.
 
A

Austin Ziegler

Patrick May:
Read-only to robots makes sense as a way of preventing accidental
problems. I used to have a delete link on the wiki. All my pages kept
getting deleted. I guessed that it was a robot gone amuck [1] . I
also like the bit about recognizing bad crawlers. No harvesting for
old fashioned spam is a good thing.

The thing about banning is that it is easy for the vandal to tell that
they have been detected. I tried using Apache Deny directives to
manage abuse, but sometimes that just encourages the vandal to switch
computers. Plus the cost of a false positive is denial of service.
After one particularly annoying episode, I realized that the vandal was
trying to waste my time. So I setup the tarpit system to waste his,
and haven't lost sleep since.

I still do alot of cleanup on my wikis, and I still use Deny
directives. Nothing replaces an active administrator. The tarpit just
gave me another lever to help me manage the problem.

As of right now, a tarpit would actually be a little too difficult to
implement in Ruwiki. It's much easier to present the wiki as if it
were a CMS or a read-only website.

-austin
 
P

Patrick May

Austin,

As of right now, a tarpit would actually be a little too difficult to
implement in Ruwiki. It's much easier to present the wiki as if it
were a CMS or a read-only website.

This is the best reason to choose one tactic over another. It's your
time the spammers are wasting. No need to help them out by trying to
do something difficult :)

Cheers,

Patrick
 
P

Patrick May

Hello,

Hello,



The only way to stop wiki spam is to have a dedicated admin.
Creativity helps reduce the time burden, but it is a constant > endeavor.

A tarpit would be easier to implement than a captcha. In the usemod
settings, you use NetAddr::IP to check if the env's Remote Addr is
within a known spammer domain. If it is a spammer, set the pages
database to a copy. Nightly / weekly / whatever, dump the latest
pages directory on top of the tarpit.

I threw together tarpit logic for usemod:

# == Configuration ====================================
use NetAddr::IP;
use vars qw( $TarpitDir $VandalFile );

$DataDir = "/tmp/mywikidb"; # Main wiki directory
$TarpitDir = "/tmp/tarpitdb"; # tarpit dir
$VandalFile = "/Users/patsplat/Desktop/usemod10/vandals.txt";

open(SOURCE, "< $VandalFile")
or die "Couldn't open $VandalFile for reading: $!\n";
my $remote_addr = new NetAddr::IP $ENV{"REMOTE_ADDR"};
while(<SOURCE>) {
my $vandal_host = new NetAddr::IP $_;
if ( $remote_addr->within( $vandal_host ) ) {
$DataDir = $TarpitDir;
}
}

Cheers,

Patrick
 
C

Chad Fowler

Hello,



I threw together tarpit logic for usemod:

# == Configuration ====================================
use NetAddr::IP;
use vars qw( $TarpitDir $VandalFile );

$DataDir = "/tmp/mywikidb"; # Main wiki directory
$TarpitDir = "/tmp/tarpitdb"; # tarpit dir
$VandalFile = "/Users/patsplat/Desktop/usemod10/vandals.txt";

open(SOURCE, "< $VandalFile")
or die "Couldn't open $VandalFile for reading: $!\n";
my $remote_addr = new NetAddr::IP $ENV{"REMOTE_ADDR"};
while(<SOURCE>) {
my $vandal_host = new NetAddr::IP $_;
if ( $remote_addr->within( $vandal_host ) ) {
$DataDir = $TarpitDir;
}
}

Great, thanks! Now I've just got to find the time to insert it and
test it. Hopefully some time this afternoon I can steal a few
minutes..


Much appreciated, Patrick!
Chad
 
D

David Ross

Here is one step of which many could be applied.
Mr. Britt you said in a message a while ago that the IP address
220.163.37.233 attacked Rubygarden. Here is the ultimate solution to
step a good percentage of the spammers. I really didnt think about it at
fist until I was setting up my RBL lists on servers..

Use this site to check for address. Make sure you send to the admins of
the RBL servers that you are using the servers or you could get
blacklisted from access.

This is *the* solution

RBLs are not only for mail.

I use a very big list of RBLs all the time. Remember never to use
dul.dnsbl.sorbs.net

You *can* integrate this into wiki's. Its very easy. Okay thanks, 80%
spamming solved.
Most, if not ALL the ips listed in
http://www.istori.com/cgi-bin/wiki?WikiBlackList *ARE* in the RBLs

Use them!

Thanks, have a nice day. Problem solved

David Ross
 
D

David Ross

Chad said:
We have been. For months.





Unfortunately not.
You have been, hard to believe :) Set up scanners as well for common and
the uncommon ports.
Rubyforge obviously hasn't been using an RBL :) That ip was a first try
hit for me on I think it was spamhaus

David Ross
 
D

David Ross

Chad said:
We have been. For months.





Unfortunately not.
First Rubygarden Spam email
-----------------------------------------
The rubygarden wiki has been over-run with spam links.

220.163.37.233 is one of the offending source IP addresss.

I fixed the home page, and then saw the extent of the crap. Looks like
many personal pages have been altered.

Those with user pages may want to go check their own page to assist with
the clean up.

James

-----------------------------------------

-------------------------------------

I've got a list, but it has become obvious that maintaining a list
manually isn't going to work. I'm tempted to require registration and
authentication at this point as much as I hate the thought.

Chad


-------------------------------------

http://rbls.org/?q=220.163.37.233

You're not reading the email..

Thanks for lying, its listed since June 2003

No, problem is 80% solved. There are some actual unlogged IPs. Please
educate yourself in security, you obviously aren't qualified.

David Ross
 
T

trans. (T. Onoma)

| >>You *can* integrate this into wiki's. Its very easy. Okay thanks, 80%
| >>spamming solved.
| >>Most, if not ALL the ips listed in
| >>http://www.istori.com/cgi-bin/wiki?WikiBlackList *ARE* in the RBLs
| >
| >We have been. For months.
| >
| >>Thanks, have a nice day. Problem solved
| >
| >Unfortunately not.
|
| First Rubygarden Spam email
| -----------------------------------------
| The rubygarden wiki has been over-run with spam links.
|
| 220.163.37.233 is one of the offending source IP addresss.
|
| I fixed the home page, and then saw the extent of the crap. Looks like
| many personal pages have been altered.
|
| Those with user pages may want to go check their own page to assist with
| the clean up.
|
| James
|
| -----------------------------------------
|
| -------------------------------------
|
| I've got a list, but it has become obvious that maintaining a list
| manually isn't going to work. I'm tempted to require registration and
| authentication at this point as much as I hate the thought.
|
| Chad
|
|
| -------------------------------------
|
| http://rbls.org/?q=220.163.37.233
|
| You're not reading the email..
|
| Thanks for lying, its listed since June 2003
|
| No, problem is 80% solved. There are some actual unlogged IPs. Please
| educate yourself in security, you obviously aren't qualified.

Umm... why not try to educate rather then accuse. I for one would certainly
like to know that in the hell you're talking about, but you're not explaining
yourself very well.

T.
 
A

Andreas Schwarz

Chad said:
We have been. For months.

Are you sure? I only checked 2 of the spammer IPs, and they are both
blacklisted on rbls.org (61.50.242.197 and 220.163.37.233).

Anyway, I think the spam problem would be quite easy to handle if there
was a better interface for rollback and IP blocking. I have never seen a
Mediawiki wiki flooded with spam, because it needs far more effort to
spam it than to repair it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Flea vs RubyGarden 14
RubyGarden: GCAndExtensions 0
rubygarden wiki 0
RubyGarden Spam 12
Wiki Spam Report 10
Spam 5
Report Spam 3
Looking for feedback on this markup language I developed and my website idea? 0

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top