Please e-mail Google to help the Ruby Garden Wiki

R

Robert Oschler

Hello,

I have sent the following e-mail message to (e-mail address removed). It's an
idea that could dramatically help reduce the Wiki Spam problem on the Ruby
Garden Wiki. The spam problem on that Wiki is getting really bad:

*********

Hello Google,

Because of the Google Page Rank land grab, there are web sites running
scripts to deface popular Wikis with links to their site. For a dramatic
example look at the Revision history page for the Ruby Garden Wiki:

http://www.rubygarden.org/ruby?RecentChanges

The problem is, even though, we diligently delete the spam as it shows up,
most Wikis archive the old revisions in a revision list. Google (you) crawls
these revision list pages and finds the deleted spam links. In fact, you
find a lot of them because the spammers keep coming back and we keep
deleting them, creating lots of revision history pages that you crawl.

Here's a VERY SIMPLE way for you to help out the thousands of Wikis out
there.

Allow the Wiki owners to add an HTML tag to the web page called
<NO_EXTERNAL_PAGE_RANK/>. If you find this tag on the web page, then only
for off-site URL's, you do not follow any external links on the page, or
pass on any page rank to the target web site.

The Wiki owners could place this HTML tag in the non-editable portion of a
web page. So on the main page they would NOT use this HTML tag so that page
rank would be properly passed on. But they would put it on any revision
history pages so they would not.

Once this gets around the spammers would probably check for the tag too with
this script, and leave the Wiki(s) alone. It's kind of like the "this house
protected by Almar security" sticker.

This would be a very useful feature for Bloggers and their blog comments,
and discussion forums too.

*********

If you like the idea, then others of you might want to send a similar e-mail
too, to apply (positive) pressure to Google.

Thank.
 
L

Lennon Day-Reynolds

This would make much more sense as a 'meta' tag value, or even as a
field in the robots.txt file for a site, than it would as an extension
to HTML.

Speaking of which, has anyone considered just rewriting the robots.txt
to block access to everything but the current version of each page?

Lennon
 
G

Greg Millam

Robert said:
Because of the Google Page Rank land grab, there are web sites running
scripts to deface popular Wikis with links to their site. For a dramatic
example look at the Revision history page for the Ruby Garden Wiki:

http://www.rubygarden.org/ruby?RecentChanges

The problem is, even though, we diligently delete the spam as it shows up,
most Wikis archive the old revisions in a revision list. Google (you) crawls
these revision list pages and finds the deleted spam links. In fact, you
find a lot of them because the spammers keep coming back and we keep
deleting them, creating lots of revision history pages that you crawl.

Here's a VERY SIMPLE way for you to help out the thousands of Wikis out
there.

robots.txt ?

Google adheres to that very strongly. and I notice there's no
http://www.rubygarden.org/robots.txt

http://www.google.com/webmasters/faq.html#norobots

- Greg
 
M

Mark Hubbart

robots.txt ?

Google adheres to that very strongly. and I notice there's no
http://www.rubygarden.org/robots.txt

http://www.google.com/webmasters/faq.html#norobots

Glancing at the specs, it seems that the benefits of someone posting
external links could be removed by a combination of wise robots.txt
settings and a redirect page for external links. Or, one could use the
meta tags that do the same thing:

<META NAME="ROBOTS" CONTENT="NOFOLLOW">

This should keep any compliant search engine (including Google) from
analyzing a page for links. Which should prevent the pageranking.

If, however, some external links should be respected, there's the
redirect trick. External links go to a page which redirects to the
link. That way, you can allow certain urls (links to rubycentral,
ruby-lang, etc.) to be read, but links to unknown sites could be
filtered out, by placing meta tags correctly.

cheers,
Mark
 
R

Robert Oschler

Greg Millam said:
robots.txt ?

Google adheres to that very strongly. and I notice there's no
http://www.rubygarden.org/robots.txt

http://www.google.com/webmasters/faq.html#norobots

- Greg

Greg,

I thought of that but robots.txt is by directory only isn't it, or can you
specify specific pages?

I was going for a solution that almost any wikimaster of any skill level
could implement. Also, if the <NO_EXTERNAL_PAGE_RANK/> tag usage was added
to the base Wiki software install (RuWiki, moin-moin, etc.), then the newbie
Wikimaster wouldn't have to do anything at all.

Thanks.
 
E

Eric Hodel

--Vr2UxLU0KdcKBaxP
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
=20


robots.txt ?

Google adheres to that very strongly. and I notice there's no=20
http://www.rubygarden.org/robots.txt

http://www.google.com/webmasters/faq.html#norobots
=20
Glancing at the specs, it seems that the benefits of someone posting=20
external links could be removed by a combination of wise robots.txt=20
settings and a redirect page for external links. Or, one could use the=20
meta tags that do the same thing:
=20
<META NAME=3D"ROBOTS" CONTENT=3D"NOFOLLOW">[/QUOTE]

<META NAME=3D"ROBOTS" CONTENT=3D"NOFOLLOW, NOINDEX">

Should be inserted into every page with any additional query arguments
beyond the page name. These pages do nothing other than give the search
engine more work to do with zero benefit, and cost rubygarden.org money
to be browsed.
This should keep any compliant search engine (including Google) from=20
analyzing a page for links. Which should prevent the pageranking.
=20
If, however, some external links should be respected, there's the=20
redirect trick. External links go to a page which redirects to the=20
link. That way, you can allow certain urls (links to rubycentral,=20
ruby-lang, etc.) to be read, but links to unknown sites could be=20
filtered out, by placing meta tags correctly.

--=20
Eric Hodel - (e-mail address removed) - http://segment7.net
All messages signed with fingerprint:
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04


--Vr2UxLU0KdcKBaxP
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (FreeBSD)

iD8DBQFA/YI7MypVHHlsnwQRAkCiAKDStxCUbMYO9h+TNu62lo2KzyJ8+gCfZ4nT
0pW+x5RwIjJiPJ1jDcvXQaI=
=KllR
-----END PGP SIGNATURE-----

--Vr2UxLU0KdcKBaxP--
 
E

Eric Hodel

--w1A23YewkF9s+fLd
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
I have sent the following e-mail message to (e-mail address removed). It's = an
idea that could dramatically help reduce the Wiki Spam problem on the Ruby
Garden Wiki. The spam problem on that Wiki is getting really bad:
=20
*********
=20
Allow the Wiki owners to add an HTML tag to the web page called
<NO_EXTERNAL_PAGE_RANK/>. If you find this tag on the web page, then only
for off-site URL's, you do not follow any external links on the page, or
pass on any page rank to the target web site.

This is already adequately handled by both robots.txt and
<META NAME=3D"ROBOTS">. (Described elsewhere in this thread.)

Even with an extension to HTML, spammers will still spam wikis
because Google is not the only search engine, and the spammers are too
lazy to check for such extensions.

(I get Referer spammers on my personal website, probably because my
stats are publicly accessible. They continue to spam despite the
Disallow in my robots.txt.)

--=20
Eric Hodel - (e-mail address removed) - http://segment7.net
All messages signed with fingerprint:
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04


--w1A23YewkF9s+fLd
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (FreeBSD)

iD8DBQFA/YP+MypVHHlsnwQRAuo5AKCluJplLPI6jc4B+qjGADrGc0ckRgCggFtz
fEFzsPju4KIQZVd0sxExdp4=
=Ucg0
-----END PGP SIGNATURE-----

--w1A23YewkF9s+fLd--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,534
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top