new spam at the wiki

J

Jim Weirich

been checking my pages and it looks like we've got a new spammer on board.

here's what it looks like:

<SPAM>
<code style="visibility:hidden;display:none">

Wow. This shows up as a blank line when viewing the page diffs. I totally
missed this when reviewing page changes.

Thanks for calling this to our attention.
 
A

Asfand Yar Qazi

Jim said:
Wow. This shows up as a blank line when viewing the page diffs. I totally
missed this when reviewing page changes.

Thanks for calling this to our attention.

Why isn't the Wiki password-protected so that only authorised users
can edit it? I'm sure there could be a scheme where some moderators
get sent 'patches' which they then apply at their discretion?

I'm just asking, not trying to impose my desires or anything.....
 
J

James Britt

Asfand Yar Qazi wrote:
...
Why isn't the Wiki password-protected so that only authorised users can
edit it? I'm sure there could be a scheme where some moderators get
sent 'patches' which they then apply at their discretion?

I'm just asking, not trying to impose my desires or anything.....

Please take a look at the mailing list archives, searching on "wiki
spam." This has been discussed quite a bit.

Not meaning to discourage your questions, but I think you'll learn a
lot about the different arguments for and against such an approach

http://groups.google.com/groups?group=comp.lang.ruby.*&q=wiki+spam


James
 
H

Henrik Horneber

Asfand said:
Why isn't the Wiki password-protected so that only authorised users can
edit it? I'm sure there could be a scheme where some moderators get
sent 'patches' which they then apply at their discretion?

I'm just asking, not trying to impose my desires or anything.....

Hi!

Well, the main reason is, I guess, that it's against the 'Wiki Way'.
The main point of a wiki is that anybody can edit it. Since most of the
spam is being inserted by bots, that should be reduced to 'any human can
edit it'. :)

regards,
Henrik
 
C

Charles Comstock

Hi!

Well, the main reason is, I guess, that it's against the 'Wiki Way'.
The main point of a wiki is that anybody can edit it. Since most of the
spam is being inserted by bots, that should be reduced to 'any human can
edit it'. :)

regards,
Henrik

More evidence of descrimination against robot kind!

But seriously, it would seem the capitalization hack was only semi
successful. It seems to me that most of the bots seem to put in links
in just a long series. Perhaps we could change it so your limited to
say 5 outbound links per page edit? Or we could just scan for invisible
spans which seem to be a favorite of the bots at the moment?

Charles Comstock
 
A

Ara.T.Howard

More evidence of descrimination against robot kind!

But seriously, it would seem the capitalization hack was only semi
successful. It seems to me that most of the bots seem to put in links in
just a long series. Perhaps we could change it so your limited to say 5
outbound links per page edit? Or we could just scan for invisible spans
which seem to be a favorite of the bots at the moment?

Charles Comstock

how about generating a jpg of a password and requiring the editor to enter it.
this would block the bots i think.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| When you do something, you should burn yourself completely, like a good
| bonfire, leaving no trace of yourself. --Shunryu Suzuki
===============================================================================
 
M

Michael DeHaan

how about generating a jpg of a password and requiring the editor to enter it.
this would block the bots i think.

CAPTCHA systems are a roadblock to blind and visually impaired users.
 
G

gabriele renzi

(e-mail address removed) ha scritto:
how about generating a jpg of a password and requiring the editor to
enter it.
this would block the bots i think.

see archives :)
Basically, this kind of captcha locks out people who can't see.
The other proposals were an autogenerated random word to type in or an
answer to a simple question ("first letter of word dog" "d").
I believe the latter remains the best choice, but it's up to the
maintainers.
 
B

Belorion

I've seen a couple of wiki's that have Honeypots embedded in them.
Basically, when you click on the honeypot link you are taken to a page
which says basically "do not click *this* link, it will disable your
access to this wiki for X time". So, when a bot crawls the whole
page, it will also crawls the honeypot, and the IP gets logged and
banned from the site.

This might be undesirable for search engines, as you might prevent
your wiki from being crawled by say Google, but if you give the
honeypot page a norobots or nofollow metatag, that might prevent that
problem.
 
A

Ara.T.Howard

this would block the bots i think.

CAPTCHA systems are a roadblock to blind and visually impaired users.

a jpg and an mp3 then? ;-)

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| When you do something, you should burn yourself completely, like a good
| bonfire, leaving no trace of yourself. --Shunryu Suzuki
===============================================================================
 
N

Nikolai Weibull

* Michael DeHaan said:
CAPTCHA systems are a roadblock to blind and visually impaired users.

What possibilities do they have in altering the contents of the pages?
I'm not being sarcastic, I'd actually like to know if there are any.
nikolai
 
B

Belorion

Someone who simply has really poor eyesight can configure their
browsers for large font. They would be able to read the wiki, but not
contribute to it because the strangely formed text within an image may
not be readable to them. These people are certainly capable of
contributing to the wiki.

And people who are blind have the text-to-speech option. Whether or
not they are able to contribute, I'm not sure, but it seems silly to
punish someone with a visual disability for spammer bad behaviour.

* Michael DeHaan said:
CAPTCHA systems are a roadblock to blind and visually impaired users.

What possibilities do they have in altering the contents of the pages?
I'm not being sarcastic, I'd actually like to know if there are any.
nikolai

--
::: name: Nikolai Weibull :: aliases: pcp / lone-star / aka :::
::: born: Chicago, IL USA :: loc atm: Gothenburg, Sweden :::
::: page: www.pcppopper.org :: fun atm: gf,lps,ruby,lisp,war3 :::
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}
 
G

gabriele renzi

Belorion ha scritto:

And people who are blind have the text-to-speech option.

there even those funky braille console (I knew someone who used that)

Whether or
not they are able to contribute, I'm not sure, but it seems silly to
punish someone with a visual disability for spammer bad behaviour.

+1
 
J

Jim Weirich

Hello all.

I've been helping the RubyGarden folks deal with some of the wiki spam
issues and thought I would provide an update here just so folks are
informed. I'll try to answer some of the questions that come up and let
you know what we have done.

But first the good news! A spam attack that targetted nearly 90 separate
pages was thwarted this morning with no need to manually revert any pages
(well, except for one).

Ok, now for a couple of responses ...

From "Charles Comstock said:
But seriously, it would seem the capitalization hack was only semi
successful.

I believe the capitalization trick weeded out most of the non-hard core
spammers. The ones that were left were VERY determined to get in. We
have added additional restrictions beyond the HTTP hack that was triggered
by typical spammer types of posting and you could watch several of the
spammers try different variations on their post until it got past the
filters. The fellow posting links to pack001.com was particularly
persistent.
It seems to me that most of the bots seem to put in links
in just a long series. Perhaps we could change it so your limited to
say 5 outbound links per page edit?

Three things ...

(1) I really doubt they are bots for the most part ... at least not the
ones that are left. I watched one fellow this morning make a post, and
then repost to correct a error in the first attempt. Also, the timings of
the postings seems very non-machine like. The repeated attempts
attempting to find a work around the filters also indicates a human behind
the wheel.

That't not to say there are no bots out there, but a significant fraction
are human. My theory is that the all the recent "no-call" lists put a lot
of phone sales people out of work and the only job they could find where
they could be equally as annoying is working as wiki spammers. :)

(2) Not all spam is long chains of links. The first (and only successful)
spam posting this morning was changing an existing link from
www.lughead.org to www.sister8.com on the MooreheadGroup page.

(3) Detecting the addition of links is difficult because the change is
submitted to the wiki as a whole page, with nothing to distinguish the new
material from the old. To determine that we would have to run a diff
algorithm between the old and new content. Although doable, I'm looking
for easy to implement features right now because I'm dealing with perl
code in UseMod (shudder!). (And yes, we are agressively pursuing switching
to a Ruby based wiki engine ... I can't take too much more Perl!).
Or we could just scan for invisible
spans which seem to be a favorite of the bots at the moment?

Now that is a good idea. I only became aware of that technique this
morning, but its on my list of things to try.

Asfand Yar Qazi asks:
Why isn't the Wiki password-protected so that only authorised users can
edit it?

Part of the magic of the "wiki way" is gathering the input from legitimate
users. Therefore we want to make the barrier of entry very very low,
otherwise people just won't bother to post. It is entirely possible that
we may be forced to go to a user registration process, but I hope that's a
last resort and not the first response.

how about generating a jpg of a password and requiring the editor
to enter it. this would block the bots i think.

This is known as a Captcha test (http://en.wikipedia.org/wiki/Captcha). As
mentioned earlier, I think most of the simple minded bots have been
eliminated, and the remaining spammers are either human or bots closely
monitored by humans. I suspect that captcha will have less of an effect
than we would hope. Of course I could be wrong and may setup a test of
this. Also, as someone else noted, captcha systems suffer from some
concern over accessibility issues.

Belorion said:
I've seen a couple of wiki's that have Honeypots embedded in them.
Basically, when you click on the honeypot link you are taken to a page
which says basically "do not click *this* link, it will disable your
access to this wiki for X time". So, when a bot crawls the whole
page, it will also crawls the honeypot, and the IP gets logged and
banned from the site.

I first heard of something like this suggested by Patrick May (NARF
developer) at this years RubyConf. Patrick called it a tarpit.
Essentially, spammers are routed to a shadow wiki that looks just like the
real one. Any changes they make to the pages only exist on the shadow
wiki and are not reflected on the real thing.

The beauty of this approach is that the spammers have no idea that they
have been redirected to a fake sight. The problem with the current
banning system is that spammers know immediately that they have failed, so
they begin to investigate work arounds (e.g. switching IP addresses,
modifying the post content). If they think their spam is successful, then
they have no motivation to try harder.

Another feature of tarpits is the deliberate slowing of responses. If the
spammer has to wait forever for a page to update, it encourages them to
look elsewhere. I find a certain amount of self-righteous glee in the
thought of annoying spammers.

The downside to tarpits is two-fold. First, we still have the problem of
identifying spammers. A banlist (realtime or manual) is one possibility.
Another is by self-identifying behavior. E.g. the invisible link trick is
a strong indication you are a spammer.

The other downside is the possibility that legitimate users getting caught
in the tarpit. Since the tarpit /looks/ legitimate, I can easily imagine
legitemate users caught there without them ever realizaing it. If you
find that the wiki suddenly starts loosing day-old posts of yours, drop
the wikimaster a note and ask if you have been tarpitted. I'm planning on
some tarpit management software where we can review the tarpitted users
and fix any accidents. All in good time.

I have actually implemented a prototype tarpit on the current wiki and am
monitoring it to see how effective it is. We actually caught one spammer
this morning immediately after his first post, so his remaining 90 odd
changes went directly into the tarpit.

A *very* satisfying morning! :)
 
G

gabriele renzi

Jim Weirich ha scritto:
Hello all.

I've been helping the RubyGarden folks deal with some of the wiki spam
issues and thought I would provide an update here just so folks are
informed. I'll try to answer some of the questions that come up and let
you know what we have done.

But first the good news! A spam attack that targetted nearly 90 separate
pages was thwarted this morning with no need to manually revert any pages
(well, except for one).
<snipall>
great stuff, thanks for this..
I wonder: all the revisions marked as [despam] in the recentchanges page
from you are coming from that?
Can I suggest tha this could be marked as small edits so that they're
not listed in the RecenChanges page ?
 
B

Bill Kelly

From: "Jim Weirich said:
(3) Detecting the addition of links is difficult because the change is
submitted to the wiki as a whole page, with nothing to distinguish the new
material from the old. To determine that we would have to run a diff
algorithm between the old and new content.

How about:

... er... my Perl is very rusty... I concur with the <shudder>... :)

..uh..um...

my $old = "abc http://blah.com def ftp://foo.bar ghi";

my $new = "abc http://spam1.com def ftp://foo.bar ghi http://spam2.com";

my %old_links;
$old_links{$&}++ while $old =~ m{\w+://[^\s)>]+}g;

my $new_links;
while($new =~ m{\w+://[^\s)>]+}g) {
$new_links++ unless defined $old_links{$&};
}

print "new links: $new_links\n";


...The above doesn't count negatively when links disappear,
only positively when new links appear... I'd first written
this version:

my %delta;
$delta{$&} -= 1 while $old =~ m{\w+://[^\s)>]+}g;
$delta{$&} += 1 while $new =~ m{\w+://[^\s)>]+}g;

my $d;
foreach my $val (values %delta) {
$d += $val;
}

print "delta links: $d\n";


...But I'd guess the former is probably more like what
we'd want to count.


HTH,

Regards,

Bill
 
A

Ara.T.Howard

Part of the magic of the "wiki way" is gathering the input from legitimate
users. Therefore we want to make the barrier of entry very very low,
otherwise people just won't bother to post. It is entirely possible that we
may be forced to go to a user registration process, but I hope that's a last
resort and not the first response.

how about an 'invitation' process: you can invite me, and then i can invite
other people. eg. you've got to get someone else to register you. finding
that special someone would probably involve a post to comp.lang.ruby with
something like

Subject: [INVITE]

please won't someone invite me

this is consistent with the notion of 'legitimate users' and is a lower
barrier than a full blown registration process that requires someone behind it
unless it's automated - in which case (esp. if most spam is human generated)
what's the point?

just a thought...

regards.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| When you do something, you should burn yourself completely, like a good
| bonfire, leaving no trace of yourself. --Shunryu Suzuki
===============================================================================
 
J

Jim Weirich

great stuff, thanks for this..
I wonder: all the revisions marked as [despam] in the recentchanges page
from you are coming from that?

No, they've been manually done. Visions of automatic rollback are still
visions.
Can I suggest tha this could be marked as small edits so that they're
not listed in the RecenChanges page ?

Actually, currently minor edits are disabled. Some spammers were marking
their posts as minor edits to hide them from Recent changes.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,053
Latest member
BrodieSola

Latest Threads

Top