Site owners check your site for robots.txt file!

Discussion in 'HTML' started by softwarelabus@yahoo.com, Aug 8, 2006.

  1. Guest

    Hi,

    I wanted to warn all website owners that some evil web hosts like
    vistapages will periodically place a robots.txt file on your site that
    disallows all search engines. It happened to me.

    Over the last several months I've noticed my web traffic dropped to
    nearly zero. A few days ago I noticed a new file, robots.txt. As most
    of you know, if your site has a robots.txt in your websites home
    directory then all search engines will look at it for possible
    instructions. The robots.txt file tells search engines what to do or
    what not to do. In my case, it had simple instructions to disallow all
    user-agents; i.e., telling all sites they cannot come here.

    How to check:
    If you web site is called www.mywebsite.com then you want to check the
    following web page:
    www.mywebsite.com/robots.txt

    You should also look for this file when you ftp to your site in case
    your web host places a sneaky server script to make robots.txt
    invisible only to you.

    Paul
    , Aug 8, 2006
    #1
    1. Advertising

  2. Safalra Guest

    On 8 Aug 2006 07:16:01 -0700, wrote:
    > [snip hosts adding robots.txt file]
    > You should also look for this file when you ftp to your site in case
    > your web host places a sneaky server script to make robots.txt
    > invisible only to you.


    They could of course hide the file on the server (as mosts hosts do with
    server configuration files) - and much more reliably than doing so for HTTP
    requests.

    --
    Safalra (Stephen Morley)
    http://www.safalra.com/hypertext/
    Safalra, Aug 8, 2006
    #2
    1. Advertising

  3. Gazing into my crystal ball I observed writing in
    news::

    > Hi,
    >
    > I wanted to warn all website owners that some evil web hosts like
    > vistapages will periodically place a robots.txt file on your site that
    > disallows all search engines. It happened to me.
    >
    > Over the last several months I've noticed my web traffic dropped to
    > nearly zero. A few days ago I noticed a new file, robots.txt. As most
    > of you know, if your site has a robots.txt in your websites home
    > directory then all search engines will look at it for possible
    > instructions. The robots.txt file tells search engines what to do or
    > what not to do. In my case, it had simple instructions to disallow all
    > user-agents; i.e., telling all sites they cannot come here.
    >
    > How to check:
    > If you web site is called www.mywebsite.com then you want to check the
    > following web page:
    > www.mywebsite.com/robots.txt
    >
    > You should also look for this file when you ftp to your site in case
    > your web host places a sneaky server script to make robots.txt
    > invisible only to you.
    >
    > Paul
    >
    >


    I really doubt that this was done with evil intent, probably a misguided
    system administrator who got tired of seeing 404 errors, but was too lazy
    to look up the robots.txt protocol and get it right.

    It's perfectly okay to have blank file, that way the bots are happy, and
    the system admins are happy, too.


    --
    Adrienne Boswell at Home
    Arbpen Web Site Design Services
    http://www.cavalcade-of-coding.info
    Please respond to the group so others can share
    Adrienne Boswell, Aug 8, 2006
    #3
  4. Andy Dingley Guest

    wrote:

    > I wanted to warn all website owners that some evil web hosts like
    > vistapages will periodically place a robots.txt file on your site that
    > disallows all search engines. It happened to me.


    OK, so that's pretty evil. Not quite sharks with frickin' laser beams
    on their heads, but it's more evil than you want from people you're
    giving money to.

    What did they say about this? How abject was their grovelling apology?

    You're not still _with_ these people are you?!


    > You should also look for this file when you ftp to your site in case
    > your web host places a sneaky server script to make robots.txt
    > invisible only to you.


    If I were evil (Mwwaa ha ha ha) I wouldn't place a robots.txt in
    anyone's web root, I'd use some config to serve a standard robots.txt
    for HTTP requests for it, without you even having a file to see. As
    easy for the evil admin to do, and less obvious.
    Andy Dingley, Aug 8, 2006
    #4
  5. Guest

    Andy Dingley wrote:
    > wrote:
    >
    > > I wanted to warn all website owners that some evil web hosts like
    > > vistapages will periodically place a robots.txt file on your site that
    > > disallows all search engines. It happened to me.

    >
    > OK, so that's pretty evil. Not quite sharks with frickin' laser beams
    > on their heads, but it's more evil than you want from people you're
    > giving money to.
    >
    > What did they say about this? How abject was their grovelling apology?
    >
    > You're not still _with_ these people are you?!
    >
    >
    > > You should also look for this file when you ftp to your site in case
    > > your web host places a sneaky server script to make robots.txt
    > > invisible only to you.

    >
    > If I were evil (Mwwaa ha ha ha) I wouldn't place a robots.txt in
    > anyone's web root, I'd use some config to serve a standard robots.txt
    > for HTTP requests for it, without you even having a file to see. As
    > easy for the evil admin to do, and less obvious.




    Sometimes I wished some x-virus creator who turned good would write a
    god virus. A virus that actually did some good by destroying other
    viruses, removing evil disallows in robots.txt from your hosts server.
    ;-) I know, I know, two wrongs don't make a right ... don't sink to
    the level of evil, lol.

    Is this such a bad idea? If the government agencies caught a good
    virus maker would they be prosecuted or given the nobel price.

    just food for thought is all.
    Paul
    , Aug 8, 2006
    #5
  6. Guest

    Adrienne Boswell wrote:
    > Gazing into my crystal ball I observed writing in
    > news::
    >
    > > Hi,
    > >
    > > I wanted to warn all website owners that some evil web hosts like
    > > vistapages will periodically place a robots.txt file on your site that
    > > disallows all search engines. It happened to me.
    > >
    > > Over the last several months I've noticed my web traffic dropped to
    > > nearly zero. A few days ago I noticed a new file, robots.txt. As most
    > > of you know, if your site has a robots.txt in your websites home
    > > directory then all search engines will look at it for possible
    > > instructions. The robots.txt file tells search engines what to do or
    > > what not to do. In my case, it had simple instructions to disallow all
    > > user-agents; i.e., telling all sites they cannot come here.
    > >
    > > How to check:
    > > If you web site is called www.mywebsite.com then you want to check the
    > > following web page:
    > > www.mywebsite.com/robots.txt
    > >
    > > You should also look for this file when you ftp to your site in case
    > > your web host places a sneaky server script to make robots.txt
    > > invisible only to you.
    > >
    > > Paul
    > >
    > >

    >
    > I really doubt that this was done with evil intent, probably a misguided
    > system administrator who got tired of seeing 404 errors, but was too lazy
    > to look up the robots.txt protocol and get it right.
    >
    > It's perfectly okay to have blank file, that way the bots are happy, and
    > the system admins are happy, too.



    I'd say that was a misguided SA alright, lol. IMHO that's when it's
    time to call it quits, start looking for another web host because
    that's like dropping a nuke on a site.

    Paul
    , Aug 8, 2006
    #6
  7. SpaceGirl Guest

    wrote:
    > Andy Dingley wrote:
    > > wrote:
    > >
    > > > I wanted to warn all website owners that some evil web hosts like
    > > > vistapages will periodically place a robots.txt file on your site that
    > > > disallows all search engines. It happened to me.

    > >
    > > OK, so that's pretty evil. Not quite sharks with frickin' laser beams
    > > on their heads, but it's more evil than you want from people you're
    > > giving money to.
    > >
    > > What did they say about this? How abject was their grovelling apology?
    > >
    > > You're not still _with_ these people are you?!
    > >
    > >
    > > > You should also look for this file when you ftp to your site in case
    > > > your web host places a sneaky server script to make robots.txt
    > > > invisible only to you.

    > >
    > > If I were evil (Mwwaa ha ha ha) I wouldn't place a robots.txt in
    > > anyone's web root, I'd use some config to serve a standard robots.txt
    > > for HTTP requests for it, without you even having a file to see. As
    > > easy for the evil admin to do, and less obvious.

    >
    >
    >
    > Sometimes I wished some x-virus creator who turned good would write a
    > god virus. A virus that actually did some good by destroying other
    > viruses, removing evil disallows in robots.txt from your hosts server.
    > ;-) I know, I know, two wrongs don't make a right ... don't sink to
    > the level of evil, lol.
    >
    > Is this such a bad idea? If the government agencies caught a good
    > virus maker would they be prosecuted or given the nobel price.
    >
    > just food for thought is all.
    > Paul


    Given that even commercial antivirus occasionally mis-detects
    legitimate software as a virus, imagine if say, by some mistake,
    "photoshop.exe" is accidentally labelled as a virus. With your "virus
    killing virus", you could do vastly more damage than an regular wild
    virus would ever do. Really Really Bad Idea.
    SpaceGirl, Aug 8, 2006
    #7
  8. Andy Dingley Guest

    Adrienne Boswell wrote:

    > I really doubt that this was done with evil intent, probably a misguided
    > system administrator who got tired of seeing 404 errors,


    I'm cynical enough to suspect that it was evil intent, because hosting
    companies can reduce costs by reducing traffic to small sites on
    flat-fee hosting plans.
    No robots, no search hits, no traffic.
    Andy Dingley, Aug 8, 2006
    #8
  9. JDS Guest

    On Tue, 08 Aug 2006 07:56:46 -0700, softwarelabus wrote:

    > Sometimes I wished some x-virus creator who turned good would write a god
    > virus. A virus that actually did some good by destroying other viruses,
    > removing evil disallows in robots.txt from your hosts server. ;-) I know,
    > I know, two wrongs don't make a right ... don't sink to the level of evil,
    > lol.


    There was an example of this a couple of years ago that, due to a poorly
    written anti-virus virus, actually caused more harm than good. Well, to
    be precise, it caused very little good, and very little harm.

    --
    JDS
    JDS, Aug 8, 2006
    #9
  10. On 8 Aug 2006 07:16:01 -0700, opined:
    > Hi,
    >
    > I wanted to warn all website owners that some evil web hosts like
    > vistapages will periodically place a robots.txt file on your site
    > that disallows all search engines. It happened to me.


    That's putting a bandage on gunshot wound. The real issue is how
    a third party obtained write privileges.
    --
    "Black Hole": The economic effect of administering a DNSBL
    Our DNSBL - Eliminate Spam at the Source: http://www.TQMcube.com
    Don't Subsidize Criminals: http://boulderpledge.org
    David Cary Hart, Aug 8, 2006
    #10
  11. easygoin Guest

    wrote:
    > Hi,
    >
    > I wanted to warn all website owners that some evil web hosts like
    > vistapages will periodically place a robots.txt file on your site that
    > disallows all search engines. It happened to me.
    >


    Just as no one has mentioned this - I wouldn't assume its your ISP
    unless you have confirmation from them and sometimes hosts (being one
    myself) have set their servers up to add certain files / folders by
    default - usually an .htaccess file and this might be where it came from.

    But rather as default - change your FTP password to something secure
    using different cases and numbers, and also change any hosting passwords
    if you have a dedicated / reseller / managed etc server - just in case
    some malicious personage has decided to "secretly" sabotage your site
    as this would indeed be a very good way to do this... as most have
    backups (don't we) of our online sites ;).

    Just a thought - Dimitri
    easygoin, Aug 8, 2006
    #11
  12. Big Bill Guest

    On 8 Aug 2006 07:16:01 -0700, wrote:

    >Hi,
    >
    >I wanted to warn all website owners that some evil web hosts like
    >vistapages will periodically place a robots.txt file on your site that
    >disallows all search engines. It happened to me.
    >
    >Over the last several months I've noticed my web traffic dropped to
    >nearly zero. A few days ago I noticed a new file, robots.txt. As most
    >of you know, if your site has a robots.txt in your websites home
    >directory then all search engines will look at it for possible
    >instructions. The robots.txt file tells search engines what to do or
    >what not to do. In my case, it had simple instructions to disallow all
    >user-agents; i.e., telling all sites they cannot come here.
    >
    >How to check:
    >If you web site is called www.mywebsite.com then you want to check the
    >following web page:
    >www.mywebsite.com/robots.txt
    >
    >You should also look for this file when you ftp to your site in case
    >your web host places a sneaky server script to make robots.txt
    >invisible only to you.
    >
    >Paul


    I guess your web hosts don't like you, do they? Did you ask them about
    this?

    BB
    --
    http://www.here-be-posters.co.uk/marilyn-monroe-pictures.htm
    http://www.kruse.co.uk/seo-maintenance.htm
    http://www.crystal-liaison.com/artis-orbis/amici-della-luna-glass.html
    Big Bill, Aug 8, 2006
    #12
  13. Big Bill Guest

    On Tue, 08 Aug 2006 14:29:51 GMT, Adrienne Boswell <>
    wrote:

    >
    >I really doubt that this was done with evil intent, probably a misguided
    >system administrator who got tired of seeing 404 errors, but was too lazy
    >to look up the robots.txt protocol and get it right.
    >
    >It's perfectly okay to have blank file, that way the bots are happy, and
    >the system admins are happy, too.


    Adrienne! Not even a bit dead, I see, just absent, eh?

    BB
    --
    http://www.here-be-posters.co.uk/marilyn-monroe-pictures.htm
    http://www.kruse.co.uk/seo-maintenance.htm
    http://www.crystal-liaison.com/artis-orbis/amici-della-luna-glass.html
    Big Bill, Aug 8, 2006
    #13
  14. David Guest

    David Cary Hart wrote:
    > On 8 Aug 2006 07:16:01 -0700, opined:
    >> Hi,
    >>
    >> I wanted to warn all website owners that some evil web hosts like
    >> vistapages will periodically place a robots.txt file on your site
    >> that disallows all search engines. It happened to me.

    >
    > That's putting a bandage on gunshot wound. The real issue is how
    > a third party obtained write privileges.


    If it was the sysadmin obtaining write permissions is noteven a question.

    If it was a mis guided sysadmin who put the said robots.txt file there,
    he was just plain wrong. He should of instead of contacted the owner of
    the site telling the owner why it is needed and how to go about doing
    it. There is no reason that a sysadmin should be screwing with or
    adding files to my site, unless something I am doing is causing major
    problems. Sorry a 404 error is not good cause. If the sysadmin or the
    owner of the site didn't place the robots.txt file there, than there are
    possibly other issues that need to be looked at. If the sysadmin did
    place said file on his site, then he should by all means change hosting
    providers. no if ands or buts.
    David, Aug 8, 2006
    #14
  15. David Guest

    easygoin wrote:


    <snip>

    >
    > Just as no one has mentioned this - I wouldn't assume its your ISP
    > unless you have confirmation from them and sometimes hosts (being one
    > myself) have set their servers up to add certain files / folders by
    > default - usually an .htaccess file and this might be where it came from.


    Can you please show me one web hosting provider that places by default a
    robots.txt file that disallows search engines. Seeing that you are "in
    the business". I have yet come across a web provider that places such a
    restriction as that. And yes I do know that as a default some providers
    do add the .htaccess file, but I know none that go into a customers site
    and than adds or removes information. If I did find out that a sysadmin
    did or was doing that without my knowledge I would run fast to find a
    different provider......

    >
    > But rather as default - change your FTP password to something secure
    > using different cases and numbers, and also change any hosting passwords
    > if you have a dedicated / reseller / managed etc server - just in case
    > some malicious personage has decided to "secretly" sabotage your site
    > as this would indeed be a very good way to do this...


    It is well too late to think about changing your passwords after
    "someone" has gotten into your system. Who knows by the time you found
    out they were there, what they had changed or have done. The only way to
    make sure that they do not further damage is to wipe out and reinstall
    your stuff. But than again a reinstall isn't a 100% deal as if one was
    making a backup regularly they might have backed up infected files and
    at that point would be just copying them back.


    as most have
    > backups (don't we) of our online sites ;).


    The odds are no.......

    >
    > Just a thought - Dimitri
    David, Aug 8, 2006
    #15
  16. DJ Guest

    <> wrote in message
    news:...
    > Andy Dingley wrote:
    >> wrote:
    >>
    >> > I wanted to warn all website owners that some evil web hosts like
    >> > vistapages will periodically place a robots.txt file on your site that
    >> > disallows all search engines. It happened to me.

    >>
    >> OK, so that's pretty evil. Not quite sharks with frickin' laser beams
    >> on their heads, but it's more evil than you want from people you're
    >> giving money to.
    >>
    >> What did they say about this? How abject was their grovelling apology?
    >>
    >> You're not still _with_ these people are you?!
    >>
    >>
    >> > You should also look for this file when you ftp to your site in case
    >> > your web host places a sneaky server script to make robots.txt
    >> > invisible only to you.

    >>
    >> If I were evil (Mwwaa ha ha ha) I wouldn't place a robots.txt in
    >> anyone's web root, I'd use some config to serve a standard robots.txt
    >> for HTTP requests for it, without you even having a file to see. As
    >> easy for the evil admin to do, and less obvious.

    >
    >
    >
    > Sometimes I wished some x-virus creator who turned good would write a
    > god virus. A virus that actually did some good by destroying other
    > viruses, removing evil disallows in robots.txt from your hosts server.
    > ;-) I know, I know, two wrongs don't make a right ... don't sink to
    > the level of evil, lol.
    >
    > Is this such a bad idea? If the government agencies caught a good
    > virus maker would they be prosecuted or given the nobel price.
    >
    > just food for thought is all.
    > Paul
    >

    Did anyone else get affected or just you? If they did it to all of you
    perhaps you should get together and sue them. It would also be fairly strong
    proof that it was the hosting company that did this as it is unlikely
    someone would break the passowrds on serveral acccounts.
    DJ, Aug 8, 2006
    #16
  17. Guest

    In uk.net.web.authoring David <> wrote:

    > It is well too late to think about changing your passwords after
    > "someone" has gotten into your system. Who knows by the time you found
    > out they were there, what they had changed or have done. The only way to
    > make sure that they do not further damage is to wipe out and reinstall
    > your stuff. But than again a reinstall isn't a 100% deal as if one was
    > making a backup regularly they might have backed up infected files and
    > at that point would be just copying them back.


    > as most have
    >> backups (don't we) of our online sites ;).


    > The odds are no.......


    Rather than relying on back-ups of a site, a far better policy is to
    maintain a development version of the site on your own machines and
    refresh the production site from the development site as and when
    required.

    Of course if the site makes changes in a database, then the database
    will need to be backed up separately.

    Axel
    , Aug 8, 2006
    #17
  18. Guest

    easygoin wrote:
    > wrote:
    > > Hi,
    > >
    > > I wanted to warn all website owners that some evil web hosts like
    > > vistapages will periodically place a robots.txt file on your site that
    > > disallows all search engines. It happened to me.
    > >

    >
    > Just as no one has mentioned this - I wouldn't assume its your ISP
    > unless you have confirmation from them and sometimes hosts (being one
    > myself) have set their servers up to add certain files / folders by
    > default - usually an .htaccess file and this might be where it came from.
    >
    > But rather as default - change your FTP password to something secure
    > using different cases and numbers, and also change any hosting passwords
    > if you have a dedicated / reseller / managed etc server - just in case
    > some malicious personage has decided to "secretly" sabotage your site
    > as this would indeed be a very good way to do this... as most have
    > backups (don't we) of our online sites ;).
    >
    > Just a thought - Dimitri



    I wouldn't put it past vistapages given their horrible customer review
    record. One day their system admin deleted all my perl scripts just to
    see who was bombarding the server. Well, it wasn't my scripts, but he
    didn't even bother to put my scripts back, lol.

    Also I've told the SA many times about their security risks, but he
    doesn't bother fixing them. Not that long ago my entire account was
    wiped. He simply told me to make sure I periodically change my cPanel
    password. Well, it turned out the hacker even deleted all _their_ site
    back ups, so they couldn't restore my site. You can't just log into
    cPanel and start deleting the entire server files because you don't
    have access. It was obviously a site hack. Oh well. I guess we can
    look back at such things and laugh.

    Just beware of web hosts that promise huge amounts of bandwidth, like
    100GB, for practically nothing, like $5/month. If web host want, and
    they lack a little common compassion, they have countless methods of
    getting rid of you and even destroying your traffic while blaming it
    all on you, lol.

    Paul
    , Aug 8, 2006
    #18
  19. Guest

    Big Bill wrote:
    > On 8 Aug 2006 07:16:01 -0700, wrote:
    >
    > >Hi,
    > >
    > >I wanted to warn all website owners that some evil web hosts like
    > >vistapages will periodically place a robots.txt file on your site that
    > >disallows all search engines. It happened to me.
    > >
    > >Over the last several months I've noticed my web traffic dropped to
    > >nearly zero. A few days ago I noticed a new file, robots.txt. As most
    > >of you know, if your site has a robots.txt in your websites home
    > >directory then all search engines will look at it for possible
    > >instructions. The robots.txt file tells search engines what to do or
    > >what not to do. In my case, it had simple instructions to disallow all
    > >user-agents; i.e., telling all sites they cannot come here.
    > >
    > >How to check:
    > >If you web site is called www.mywebsite.com then you want to check the
    > >following web page:
    > >www.mywebsite.com/robots.txt
    > >
    > >You should also look for this file when you ftp to your site in case
    > >your web host places a sneaky server script to make robots.txt
    > >invisible only to you.
    > >
    > >Paul

    >
    > I guess your web hosts don't like you, do they? Did you ask them about
    > this?
    >
    > BB



    BB, I had several domains with vistapages, but only switched one domain
    so far. It was a very important domain-- business related.

    As far as contacting vistapages, I can't bear yet another conversation
    with them. And call me cheap for not yet switching all my domains to
    another host because they still have my money and laugh whenever I ask
    for a partial refund.

    BTW, so far I've had good luck with bethehost.com-- knock on wood.
    Except they've been under severe attacks from a hacker this week and
    had to take the entire server down for about 4 hours. I still give
    them an A grade. They always email everyone with server updates, etc.
    As for vistapages, I can't recall a single informative email from them.

    Anyhow, I would appreciate other web host recommendations.

    Paul
    , Aug 8, 2006
    #19
  20. Guest

    Here's how to verify googlebot can access your site!

    > Can you please show me one web hosting provider that places by default a
    > robots.txt file that disallows search engines. Seeing that you are "in
    > the business". I have yet come across a web provider that places such a
    > restriction as that. And yes I do know that as a default some providers
    > do add the .htaccess file, but I know none that go into a customers site
    > and than adds or removes information. If I did find out that a sysadmin
    > did or was doing that without my knowledge I would run fast to find a
    > different provider......



    Robots.txt, .htaccess, etc.? System Admins could use a lot of methods
    of block search engines without us knowing. Best to directly verify
    that googlebots can access your site. Unless the system admin is
    checking for actual google IP's, which would be crazy, I think you
    could test it by going to your windows command prompt. Go to windows
    Start, then Run, and type cmd. Once the black command prompt window
    comes up, then type telnet www.yourwebsite.com 80

    when the server responds then you paste the following (make sure you
    replace www.yourwebsite.com with your actual domain):

    GET /robots.txt HTTP/1.1
    Host: www.yourwebsite.com
    User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1;
    +http://www.google.com/bot.html)
    Accept: */*
    Connection: Keep-alive
    From: googlebot(at)googlebot.com

    You could also check any web page. Here's how to check
    www.yourwebsite.com/realestate/washington/bills.html

    GET realestate/washington/bills.html HTTP/1.1
    Host: www.yourwebsite.com
    User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1;
    +http://www.google.com/bot.html)
    Accept: */*
    Connection: Keep-alive
    From: googlebot(at)googlebot.com

    What do you think? I don't know any other user-agents. I think it's a
    good idea to check for the main search engines such as msn, yahoo, and
    google. Are there any windows programs that perform such checks? I'm
    a computer programmer so if there are no programs that do the above
    checks for the top search engines then I could write one and provide
    the source code ... as long as I don't make any web host enemies
    <<<G>>>

    Thanks fellow site owners,
    Paul
    , Aug 8, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Joe Blow

    Missing robots.txt file

    Joe Blow, Aug 29, 2004, in forum: HTML
    Replies:
    5
    Views:
    1,723
    data64
    Aug 30, 2004
  2. John Nagle
    Replies:
    5
    Views:
    447
    Nikita the Spider
    Jul 13, 2007
  3. John Nagle
    Replies:
    5
    Views:
    1,054
    Nikita the Spider
    Oct 4, 2007
  4. Cal Who
    Replies:
    8
    Views:
    796
    Cal Who
    Jun 14, 2010
  5. Tim w

    meta robots and robots txt

    Tim w, May 22, 2014, in forum: HTML
    Replies:
    1
    Views:
    120
Loading...

Share This Page