Regex help for URL

B

Ben Morrow

Quoth "Seb said:
I am trying to find the right regular expression which would only
validate a URL with a given number of folders.

Example:

http://www.abc.com/folder/page.htm --> Valid (4 slahes)

http://www.abc.com/folder/subfolder/ --> not valid (5 slashes)

Basically, any URL not made of 4 slahes would be invalid.
However, the URL:

http://www.abc.com/folder/subfolder --> would also be invalid

Please explain how anyone is supposed to tell from looking at that URL
that the last /subfolder is a directory ('folder') and not a file.

Please explain how anyone is supposed to tell by any means whatever,
given that http (setting aside WebDAV) has no concept of 'directory';
other that by guessing that if retrieving it gives a redirect to
http://.../subfolder/ then the URL refers to some directory in some
filesystem somewhere.

A regex to match a string with exactly four slashes in it is

m{ ^ (?: [^/]* / ){4} [^/]* $ }x

, but I do not think that will help you here.

Ben
 
A

anno4000

Seb said:
Hi,

I am trying to find the right regular expression which would only
validate a URL with a given number of folders.

Example:

http://www.abc.com/folder/page.htm --> Valid (4 slahes)

http://www.abc.com/folder/subfolder/ --> not valid (5 slashes)

Basically, any URL not made of 4 slahes would be invalid.
However, the URL:

http://www.abc.com/folder/subfolder --> would also be invalid

Any ideas?

What have you tried?

We can help you fix your code, but we rarely deliver solutions
to specification.

Anno
 
R

rcia

Seb said:

perldoc -q count
How can I count the number of occurrences of a substring within a
string?

That's arbitrary. The client has no way to tell if "subfolder" is a
directory (where a default DirectoryIndex document presumably exists)
or a document/file named "subfolder". Only the server knows for sure.
 
T

Tad McClellan

Seb said:
I am trying to find the right regular expression which would only
validate a URL with a given number of folders.

Basically, any URL not made of 4 slahes would be invalid.


A regular expression is not the Right Tool for counting characters,
tr/// is better for that (tr does NOT use any regular expressions).

print "$url is invalid\n" unless $url =~ tr#/## == 4; # untested
 
S

Seb

Thanks. I do need to use a regular expression since it is the only
mechanism I can use in the search engine I am using to enter a valid
exclusion.
 
T

Tad McClellan

Seb said:
Thanks. I do need to use a regular expression since it is the only
mechanism I can use in the search engine I am using to enter a valid
exclusion.


Ask a question in the Perl newsgroup, get an answer in the Perl language.


[ snip TOFU ]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,562
Members
45,038
Latest member
OrderProperKetocapsules

Latest Threads

Top