Another Wiki/Spam Update

Jim Weirich · Nov 12, 2004

During the a question/answer session at the NoFluff/JustStuff conference in
Cincinnati this summer, someone asked since there are so many things in the
IT world to learn, how does one tell what technologies to investigate and
what technologies to put on the back burner. The general answer from the
panel was to wait until you hear about something 6 times. At that point it
is probably worth investigating.

So, I'm jumping the clock here because I only heard of the following twice,
but it was twice in a two day period and it does have bearing on the wiki
spam issue.

I first heard about this from Austin Ziegler in an IM message about ruwiki.
Austin told me that Ruwiki will not link to external sites directly, but will
go through a page rank stripping redirect service supported by Google.
Hmmm ... interesting I thought.

Maybe I'm missing something, but why not pass all outgoing links through the
google redirect, thereby denying the spammer of their all important
PageRank?.
http://www.google.com/url?sa=D&q=URL

Ok, thats two references. LeoO also provided a link to
http://simon.incutio.com/archive/2004/05/11/approved where you can read more
details.

So, I went ahead and enabled the Google redirect for external links on the
RubyGarden wiki. I'll leave it there for a few days and see how it works.
If anyone has problems, feel free to drop me a line at (e-mail address removed).

Just a couple of observations:

(1) Although it denies spammers the benefits of their activies, I'm not
convinced that it will prevent spamming in anything but the most indirect
ways. However, denying them those benefits still makes me feel all tinglely
inside.

(2) As currently implemented, URL with CGI parameters in them might have
problems. For example, in the link:

http://rubygarden.org/ruby?action=browse&id=RubyDiscussions

everything from "&id=" to the end will be ignored when translated to

http://www.google.com/url?sa=D&q=http://rubygarden.org/ruby?action=browse&id=RubyDiscussions

A workaround is to use something like http://tinyurl.com (e.g. the above link
is equivalent to http://tinyurl.com/5jmyb).

(3) As I mentioned, if there is negative pushback on this change, it can be
easily backed out.

Thanks for listening.

James Britt · Nov 12, 2004

Jim Weirich wrote:

...

(2) As currently implemented, URL with CGI parameters in them might have
problems. For example, in the link:

http://rubygarden.org/ruby?action=browse&id=RubyDiscussions

everything from "&id=" to the end will be ignored when translated to

http://www.google.com/url?sa=D&q=http://rubygarden.org/ruby?action=browse&id=RubyDiscussions

A workaround is to use something like http://tinyurl.com (e.g. the above link
is equivalent to http://tinyurl.com/5jmyb).

As practical as they may be, I'm less than enthused with passing my
links through tinyurl. I have much more faith in Google, and expect
that redirection through tinyurl will ultimately lead to some business
plan I may not care for.

Implementing the same behavior in Ruby should be trivial, and I would be
far more comfortable seeing links go through a Ruby-oriented site run by
a known member of the Ruby community (e.g., www.rubyurl.com, which
appears to be free)

Interesting idea, though, passing through Google.

James

gabriele renzi · Nov 12, 2004

James Britt ha scritto:

As practical as they may be, I'm less than enthused with passing my
links through tinyurl. I have much more faith in Google, and expect
that redirection through tinyurl will ultimately lead to some business
plan I may not care for.

Implementing the same behavior in Ruby should be trivial, and I would be
far more comfortable seeing links go through a Ruby-oriented site run by
a known member of the Ruby community (e.g., www.rubyurl.com, which
appears to be free)

qurl.net runs with ruby FWIW.

Eric Hodel · Nov 12, 2004

(2) As currently implemented, URL with CGI parameters in them might
have
problems. For example, in the link:

http://rubygarden.org/ruby?action=browse&id=RubyDiscussions

everything from "&id=" to the end will be ignored when translated to

http://www.google.com/url?sa=D&q=http://rubygarden.org/ruby?
action=browse&id=RubyDiscussions

You just need to escape all [^a-zA-Z]:

http://www.google.com/url?
sa=D&q=http%3a%2f%2frubygarden.org%2fruby%3faction%3dbrowse%26id%3dRubyD
iscussions

pull the code out of cgi.rb and you're done!

Florian Gross · Nov 12, 2004

Jim said:
(2) As currently implemented, URL with CGI parameters in them might have
problems. For example, in the link:

http://rubygarden.org/ruby?action=browse&id=RubyDiscussions

everything from "&id=" to the end will be ignored when translated to

http://www.google.com/url?sa=D&q=http://rubygarden.org/ruby?action=browse&id=RubyDiscussions

Can't you just use CGI.escape for this?

Jim Weirich · Nov 12, 2004

(2) As currently implemented, URL with CGI parameters in them might
have problems. [...]

Click to expand...

You just need to escape all [^a-zA-Z]:

Actually, I tried this, but then google barfed on the resulting URL. Perhaps
I encoded incorrectly. I'll give it another try when I get a chance.

Thanks.

Jim Weirich · Nov 13, 2004

(2) As currently implemented, URL with CGI parameters in them might
have problems. [...]

Click to expand...

You just need to escape all [^a-zA-Z]:

Click to expand...

Actually, I tried this, but then google barfed on the resulting URL.
Perhaps I encoded incorrectly. I'll give it another try when I get a
chance.

Got it working now. I must have fat fingered it earlier. Thanks.

Jim Weirich · Nov 13, 2004

Can't you just use CGI.escape for this?

You know, its funny how the brain works. I saw this comment and thought to
myself "Of course! It would be much nicer just to use the CGI module
directly. That's what I will do."

So I bring up the editor and actually enter the code "CGI.escape($url)" into
the program, save it, and run a quick test.

But now I get the error:
Bareword "CGI" not allowed while "strict subs" in use at [...]

Now I'm sure most everybody who has been following this thread probably
realizes what is going on, but I still didn't see it. Half of my brain is
processing the problem that Perl doesn't like a bare CGI stuck into its code,
and the other half of the brain is trying to figure out why perfectly legal
Ruby code is causing an error. All of a sudden, the two halves of my brain
decided to talk to each other: "Duh! You're writing Ruby code in a Perl
program! Of course it doesn't work. Sheesh!"

After my brain got done rsyncing itself, I tried the code "$q->escape($url);"
and that works great.

Austin... it's become imperative that you get Ruwiki released soon [1]. I'm
afraid if I spend much more time in this Perl code I will become permanently
brain damaged.

Belorion · Feb 2, 2005

I don't mean to drudge up an old thread unnecessarily, but I
encountered this today:
http://www.google.com/googleblog/2005/01/preventing-comment-spam.html.
Basically, it looks like google is trying to do something to help
stop comment/wiki spam. Implementing something like this won't *stop*
spammers (unless they know the site uses it), but if enough people
start doing it maybe this sort of spam will decrease in the long run.

Wiki Spam Report	10	Dec 13, 2004
Who owns rubygarden.org (Wiki)?	2	Jun 13, 2004
RubyGarden Spam	12	Sep 28, 2004
RubyGarden wiki URLs being intercepted	1	Jul 20, 2005
Looking for a MoinMoin guru - MoinMoin+SpamBayes == no wiki spam?	4	Feb 9, 2007
Tkinter wiki down?	0	Apr 21, 2007
MacPython wiki "moved" to Python wiki - now it's your turn...	0	Feb 13, 2007
Please e-mail Google to help the Ruby Garden Wiki	6	Jul 20, 2004

Another Wiki/Spam Update

Jim Weirich

James Britt

gabriele renzi

Eric Hodel

Florian Gross

Jim Weirich

Jim Weirich

Jim Weirich

Belorion

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads