Regular expression question

cerr · Oct 26, 2011

Hi There,

First thing, I'm a regular expression newbie.... somewhat anyways...
I would like to recognize the difference between this url:
http://quaaoutlodge.com/site/the-lodge/our-history.html
and that url:
http://quaaoutlodge.com/site/the-lodge.html
and at the same time extract the document name (our-history or the-
lodge) and the directory name if present (the-lodge).
I got stuck at how rto rcognize the second directory instead of the
first (the-lodge/ instead of site/) with "\b\/[a-z]+\/" how do i get
the second one only?
Thanks!
Ron

Mike Duffy · Oct 26, 2011

I would like to recognize the difference between this url:
http://quaaoutlodge.com/site/the-lodge/our-history.html
and that url:
http://quaaoutlodge.com/site/the-lodge.html

Since you say you are a beginner, it might be easier to first strip away
the leading "http://quaaoutlodge.com" and the trailing ".html".

Now your problem is recognizing the difference between:

"/site/the-lodge/our-history" and "/site/the-lodge". Your task has been
reduced simply to counting "/"s.

Denis McMahon · Oct 26, 2011

Hi There,

First thing, I'm a regular expression newbie.... somewhat anyways... I
would like to recognize the difference between this url:
http://quaaoutlodge.com/site/the-lodge/our-history.html and that url:
http://quaaoutlodge.com/site/the-lodge.html and at the same time extract
the document name (our-history or the- lodge) and the directory name if
present (the-lodge). I got stuck at how rto rcognize the second
directory instead of the first (the-lodge/ instead of site/) with
"\b\/[a-z]+\/" how do i get the second one only?

First of all, it seems that your structure is to have a "lodge-file" for
every lodge in the "site" directory. It would make more sense to use the
per-lodge file as the index file in the lodge directory:

eg:

http://quaaoutlodge.com/site/the-lodge.html

becomes

http://quaaoutlodge.com/site/the-lodge/index.html

Now, in your "site" directory, you only need a single "index.htm[l]" file
that has a list with elements something like:

<li><a href='http://quaaoutlodge.com/site/the-lodge/'>the-lodge</a></li>

Now instead of having the files for each lodge spread across two
directories, all the files for a single lodge are in a single directory.

If you made this change, it might make your regex problem easier, because
for any lodge file in any directory, the url will always be:

http://quaaoutlodge.com/site/the-lodge/[filename]

And now you can find the filename and the dir (lodge) without having to
use any regex:

var url = window.location;
var parts = url.split("/");
var fileName = parts[parts.length-1];
var lodgeDir = parts[parts.length-2];

See http://www.sined.co.uk/tmp/pathinfo.htm for an implementation.

Rgds

Denis McMahon

Lasse Reichstein Nielsen · Oct 26, 2011

cerr said:
First thing, I'm a regular expression newbie.... somewhat anyways...
I would like to recognize the difference between this url:
http://quaaoutlodge.com/site/the-lodge/our-history.html
and that url:
http://quaaoutlodge.com/site/the-lodge.html
and at the same time extract the document name (our-history or the-
lodge) and the directory name if present (the-lodge).
I got stuck at how rto rcognize the second directory instead of the
first (the-lodge/ instead of site/) with "\b\/[a-z]+\/" how do i get
the second one only?

When you think a RegExp might solve your problem - stop for a moment
and think whether there is also a simpler solution

In this case, I'd just do:

function name(url) {
var name_end = url.lastIndexOf(".");
var name_start = url.lastIndexOf("/", name_end) + 1;
return url.substr(name_start, name_end);
}

If your URLs aren't always that simple, you'd need to adapt a RegExp too.
/L

Thomas 'PointedEars' Lahn · Oct 26, 2011

Lasse said:
cerr said:

First thing, I'm a regular expression newbie.... somewhat anyways...
I would like to recognize the difference between this url:
http://quaaoutlodge.com/site/the-lodge/our-history.html
and that url:
http://quaaoutlodge.com/site/the-lodge.html
and at the same time extract the document name (our-history or the-
lodge) and the directory name if present (the-lodge).
I got stuck at how rto rcognize the second directory instead of the
first (the-lodge/ instead of site/) with "\b\/[a-z]+\/" how do i get
the second one only?

Click to expand...

When you think a RegExp might solve your problem - stop for a moment
and think whether there is also a simpler solution

I cannot think of anything that is simpler than

var matches = url.match(/(.*)\/([^\/]+)$/);

and then have a look at matches[1] ("directory") and matches[2] ("document
name"). But that's me.

In this case, I'd just do:

function name(url) {

That is a poor function identifier.

var name_end = url.lastIndexOf(".");
var name_start = url.lastIndexOf("/", name_end) + 1;

Paths may contain dots. Resource names do not need to.

return url.substr(name_start, name_end);

You meant

return url.substring(name_start, name_end);

String.prototyp.substr(), OTOH, is proprietary â€“ which is why it should not
be used â€“ and has ifferent semantics:

| B.2.3 String.prototype.substr (start, length)

}

If your URLs aren't always that simple, you'd need to adapt a RegExp too.

The general solution to this problem is so simple that you really could have
posted it (BTDT). OTOH, that is also why the OP could have found it by
STFW.

PointedEars

Dr J R Stockton · Oct 27, 2011

In comp.lang.javascript message <a9fa509f-5f3c-4926-abc6-c77a21427d8f@j3
6g2000prh.googlegroups.com>, Tue, 25 Oct 2011 21:43:35, cerr

First thing, I'm a regular expression newbie.... somewhat anyways...
I would like to recognize the difference between this url:
http://quaaoutlodge.com/site/the-lodge/our-history.html
and that url:
http://quaaoutlodge.com/site/the-lodge.html
and at the same time extract the document name (our-history or the-
lodge) and the directory name if present (the-lodge).
I got stuck at how rto rcognize the second directory instead of the
first (the-lodge/ instead of site/) with "\b\/[a-z]+\/" how do i get
the second one only?

The easiest way, teaching nothing about RegExps, should be to use the
string method 'split' with an argument "/", and contemplate the result
and its length.

Also, see <http://www.merlyn.demon.co.uk/js-valid.htm> generally.

That's assuming that your datum starts as a string.

If you are writing quaaoutlodge, and use include files, then you might
be including a location.href evaluation in your pages, in order that a
page can tell which it is. In that case, look up the other properties
of location.

Antony Scriven · Oct 28, 2011

Lasse said:
Lasse said:

cerr said:

First thing, I'm a regular expression newbie.... somewhat anyways...
I would like to recognize the difference between this url:
http://quaaoutlodge.com/site/the-lodge/our-history.html
and that url:
http://quaaoutlodge.com/site/the-lodge.html
and at the same time extract the document name (our-history or the-
lodge) and the directory name if present (the-lodge).
I got stuck at how rto rcognize the second directory instead of the
first (the-lodge/ instead of site/) with "\b\/[a-z]+\/" how do i get
the second one only?

Click to expand...

Click to expand...

When you think a RegExp might solve your problem - stop for a moment
and think whether there is also a simpler solution

Click to expand...

I cannot think of anything that is simpler than

var matches = url.match(/(.*)\/([^\/]+)$/);

var matches = url.match(/(.*)\/(.*)/);

--Antony

Antony Scriven · Oct 28, 2011

Lasse said:
Lasse said:

First thing, I'm a regular expression newbie....
somewhat anyways... I would like to recognize the
difference between this url:

http://quaaoutlodge.com/site/the-lodge/our-history.html

and that url:

http://quaaoutlodge.com/site/the-lodge.html

and at the same time extract the document name
(our-history or the- lodge) and the directory name
if present (the-lodge). I got stuck at how rto
rcognize the second directory instead of the first
(the-lodge/ instead of site/) with "\b\/[a-z]+\/"
how do i get the second one only?

Click to expand...

When you think a RegExp might solve your problem
- stop for a moment and think whether there is also
a simpler solution

Click to expand...

I cannot think of anything that is simpler than

var matches = url.match(/(.*)\/([^\/]+)$/);

Click to expand...

var matches = url.match(/(.*)\/(.*)/);

And the reason you didn't spot that is also the reason why
Lasse's solution (using String.prototype.lastIndexOf) is
preferable IMHO. --Antony

P.S. Sorry about the mangled quoting earlier.

Thomas 'PointedEars' Lahn · Oct 28, 2011

Antony said:
I cannot think of anything that is simpler than
var matches = url.match(/(.*)\/([^\/]+)$/);

Click to expand...

var matches = url.match(/(.*)\/(.*)/);

Click to expand...

And the reason you didn't spot that

Spot what? That your way is _not_ better?

is also the reason why Lasse's solution (using
String.prototype.lastIndexOf) is preferable IMHO. --Antony
http://foo.example/bar

P.S. Sorry about the mangled quoting earlier.

Don't be sorry about *that*.

PointedEars

Thomas 'PointedEars' Lahn · Oct 28, 2011

Dr said:
First thing, I'm a regular expression newbie.... somewhat anyways...
I would like to recognize the difference between this url:
http://quaaoutlodge.com/site/the-lodge/our-history.html
and that url:
http://quaaoutlodge.com/site/the-lodge.html
and at the same time extract the document name (our-history or the-
lodge) and the directory name if present (the-lodge).
I got stuck at how rto rcognize the second directory instead of the
first (the-lodge/ instead of site/) with "\b\/[a-z]+\/" how do i get
the second one only?

Click to expand...

The easiest way, teaching nothing about RegExps, should be to use the
string method 'split' with an argument "/", and contemplate the result
and its length.

By contrast, that requires accessing the `length' property of the resulting
array, too, and is inflexible with regard to potential query and fragment
parts.

PointedEars

Antony Scriven · Oct 28, 2011

Antony said:
Antony said:

On Oct 26, 10:40 pm, Thomas 'PointedEars' Lahn wrote:
I cannot think of anything that is simpler than
var matches = url.match(/(.*)\/([^\/]+)$/);
var matches = url.match(/(.*)\/(.*)/);

Click to expand...

Click to expand...

And the reason you didn't spot that

Click to expand...

Spot what? That your way is _not_ better?

How so? And, really, url.match(/site\/(.*\/)?(.*)/) is much
closer to what the OP actually asked for. And if the
complexity of the URLs increase at all, so does that of the
regexp. Regexps are a great way to hide bugs. --Antony

Lasse Reichstein Nielsen · Oct 28, 2011

Thomas 'PointedEars' Lahn said:
Antony Scriven wrote:

http://foo.example/bar

My "solution" was very hardcoded to the format that the OP used, i.e.,
ending in "/somename.html".
Since that was all the examples he gave, and no real textual explanation,
it's impossible to generalize further.

Maybe I should have said that

/L

Thomas 'PointedEars' Lahn · Oct 28, 2011

Lasse said:
My "solution" was very hardcoded to the format that the OP used, i.e.,
ending in "/somename.html".
Since that was all the examples he gave, and no real textual explanation,

It was clear enough to me that they wanted to know the last path component
of a URI.

it's impossible to generalize further.

Well, it wasn't.

Maybe I should have said that

It was clear to me that your code was limited, however I saw and still see
no good reason for doing that when the general solution â€“ the one using
RegExp, which was being asked for â€“ is so obvious.

PointedEars

Antony Scriven · Oct 28, 2011

It was clear enough to me that they wanted to know the
last path component of a URI.

Well, it wasn't.
Cough.

Unless you have Asperger's or some other similar condition,
I don't think there's any difficulty in understanding what
Lasse wrote, and its implications.

It was clear to me that your code was limited, however
I saw and still see no good reason for doing that when
the general solution -- the one using RegExp, which
was being asked for -- is so obvious.

Well, I already showed that that isn't so. And if an expert
such as yourself can't make an obvious regexp match the
specification, then I think there is a lesson to be learnt
there. Regexps can be powerful, terse, and convenient, but
they can be very tricky things to get right, even the simple
ones. --Antony

P.S. Having said what I've said, I think it's a good thing
that its regexps are somewhat limited compared to some other
implementations.

Antony Scriven · Oct 28, 2011

[...]

P.S. Having said what I've said, I think it's a good thing
that its regexps are somewhat limited compared to some other
implementations.

s/its/JS's/

Pattern Search Regular Expression	20	Jun 15, 2013
newbie question: Regular expression lookbehind?	1	Jun 28, 2007
Question: Optional Regular Expression Grouping	4	Oct 10, 2011
Regular expression	12	May 29, 2009
Regular Expression interesting problem	0	Mar 28, 2009
Unwanted collector in regular expression	2	Apr 1, 2011
grimace: a fluent regular expression generator in Python	0	Jul 15, 2013
Regular expression to structure HTML	11	Oct 2, 2009

Regular expression question

cerr

Mike Duffy

Denis McMahon

Lasse Reichstein Nielsen

Thomas 'PointedEars' Lahn

Dr J R Stockton

Antony Scriven

Antony Scriven

Thomas 'PointedEars' Lahn

Thomas 'PointedEars' Lahn

Antony Scriven

Lasse Reichstein Nielsen

Thomas 'PointedEars' Lahn

Antony Scriven

Antony Scriven

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads