"strange subdirectories"

  • Thread starter Luigi Donatello Asero
  • Start date
L

Luigi Donatello Asero

It seems as Google robots visit subdirectories which I do not see in my
directory www
for example

crawl-66-249-65-44.googlebot.com - - [05/Mar/2005:01:03:48 +0100] "GET
///it/lamediazionemerci2.html HTTP/1.1" 200 4396 "-" "Mozilla

That means subdirectories with several "/" before my "normal subdirectories
(for example "it").
They are very many "strange subdirectories!
Is it normal?
What can I do?
 
R

Roy Schestowitz

Luigi said:
It seems as Google robots visit subdirectories which I do not see in my
directory www
for example

crawl-66-249-65-44.googlebot.com - - [05/Mar/2005:01:03:48 +0100] "GET
///it/lamediazionemerci2.html HTTP/1.1" 200 4396 "-" "Mozilla

That means subdirectories with several "/" before my "normal
subdirectories (for example "it").
They are very many "strange subdirectories!
Is it normal?
What can I do?

I am not sure about Google, but Yahoo make very many mistakes. One such
mistake involves mixing structures and files from other sites. If you see
strange filenames, that'll be the explanations. Another one that I suffer
from all the time is when Yahoo fail to traverse directories without an
index -- that is -- directories which invoke the default Apache file
listing. Yahoo descents to a lower level, which is incorrect and this
triggers many distracting errors.

Google might be doing similar mistakes. I noticed that it continuously
fails to deal with frames that come from different domains. Sometimes it
looks for .tex files when a .pdf is found. All in all, I do not totally
trust it.

Roy
 
T

Toby Inkster

Luigi said:
crawl-66-249-65-44.googlebot.com - - [05/Mar/2005:01:03:48 +0100] "GET
///it/lamediazionemerci2.html HTTP/1.1" 200 4396 "-" "Mozilla
That means subdirectories with several "/" before my "normal subdirectories
(for example "it").

Most servers will (by default[1]) consider the following URLs to be
equivalent:

///foo//bar
//foo//bar
/foo//bar
/foo/bar

However, most clients won't consider them the same. (Nor should they![1])

It may well be that some stupid robots, when they are at the main index
page ("/") see a link like "/it/lamedeiazionemerci2.html" and wrongly
invent a URL like "//it/lamedeiazionemerci2.html".

Hence the weird requests.

____
[1] An easy way to make the URLs "/" and "//" act differently would be:

1. In an .htaccess in your document root, turn on Multimodes;
2. Then create a file "somedir.php" in the document root:

<?php
$p = $_SERVER['PATH_INFO'];
echo strstr($p,'//')?'Foo':'Bar';
?>

3. Now visit:

http://www.yourdomain.com/somedir/hello/world/
and
http://www.yourdomain.com/somedir/hello//world/

and note the difference.
 
L

Luigi Donatello Asero

Toby Inkster said:
Luigi said:
crawl-66-249-65-44.googlebot.com - - [05/Mar/2005:01:03:48 +0100] "GET
///it/lamediazionemerci2.html HTTP/1.1" 200 4396 "-" "Mozilla
That means subdirectories with several "/" before my "normal subdirectories
(for example "it").

Most servers will (by default[1]) consider the following URLs to be
equivalent:

///foo//bar
//foo//bar
/foo//bar
/foo/bar



What does "foo" stand for?

However, most clients won't consider them the same. (Nor should they![1])

It may well be that some stupid robots, when they are at the main index
page ("/") see a link like "/it/lamedeiazionemerci2.html" and wrongly
invent a URL like "//it/lamedeiazionemerci2.html".

Hence the weird requests.

____
[1] An easy way to make the URLs "/" and "//" act differently would be:

1. In an .htaccess in your document root, turn on Multimodes;


How do I turn on Multimodes?
And what is Multimodes, anyway?

2. Then create a file "somedir.php" in the document root:

<?php
$p = $_SERVER['PATH_INFO'];
echo strstr($p,'//')?'Foo':'Bar';
?>


So, when you write "Foo" I should write for example
"it" or
https://www.scaiecat-spa-gigi.com/it/

And should "bar" be "lamediazionemerci2.html "
in the above mentioned example?
 
T

Toby Inkster

Luigi said:
What does "foo" stand for?

Paradoxically, "foo" stands for "fucked up", but that's not important.

"foo" and "bar" are simply example words that can be inserted into any
example when you can't think of a better word to use as an example.
[1] An easy way to make the URLs "/" and "//" act differently would be:
1. In an .htaccess in your document root, turn on Multimodes; How do I

turn on Multimodes?
And what is Multimodes, anyway?

Google for it.

Note: I'm not saying that you *should* to any of those steps -- I am
merely pointing out that it is possible to make /foo//bar be interpreted
differently from /foo/bar. In general, this is probably a bad idea, as
it's counter-intuitive.
 
K

Ken

Hi Toby -

Most servers will (by default[1]) consider the following URLs to be
equivalent:

///foo//bar
//foo//bar
/foo//bar
/foo/bar

I have my server configured to treat // anyplace in the URL as a
security violation.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top