shapper said:
On my web site I have a CMS area which urls are:
cms/products/*
cms/brands/*
We have no way of knowing what you have there.
Shouldn't I block this urls to block in Robots?
It depends on what you have there.
As a rule, a robot won't try to visit a resource unless there is a link to
it somewhere. So any internal information won't normally be found if you
don't link to it and nobody else links to it either. To be on the safe side,
robots exclusion could be used, though, to protect against a case where some
link is magically created somewhere.
See "robots exclusion standard".
And what other urls should I block?
It depends on what you have there.
All the ones that require authentication?
It's perhaps a friendly move to tell robots not to visit them, as this may
save a little of their time. But they won't visit them anyway, as the server
responds by requesting for credentials and the robot gives up. Well, there
might be naughty robots that try to crack, but it won't help to use robots
exclusion against _them_ (rather the opposite...).
Rules of thumb:
1) If it's really secret, don't put it on a web server.
2) If it's just secret and you want to to put on a web server, set up access
control.
3) If it's not secret but nobody benefits from its having been findable via
search engines, use robots exclusion.