Regular expression question

C

cerr

Hi There,

First thing, I'm a regular expression newbie.... somewhat anyways...
I would like to recognize the difference between this url:
http://quaaoutlodge.com/site/the-lodge/our-history.html
and that url:
http://quaaoutlodge.com/site/the-lodge.html
and at the same time extract the document name (our-history or the-
lodge) and the directory name if present (the-lodge).
I got stuck at how rto rcognize the second directory instead of the
first (the-lodge/ instead of site/) with "\b\/[a-z]+\/" how do i get
the second one only?
Thanks!
Ron
 
M

Mike Duffy

D

Denis McMahon

Hi There,

First thing, I'm a regular expression newbie.... somewhat anyways... I
would like to recognize the difference between this url:
http://quaaoutlodge.com/site/the-lodge/our-history.html and that url:
http://quaaoutlodge.com/site/the-lodge.html and at the same time extract
the document name (our-history or the- lodge) and the directory name if
present (the-lodge). I got stuck at how rto rcognize the second
directory instead of the first (the-lodge/ instead of site/) with
"\b\/[a-z]+\/" how do i get the second one only?

First of all, it seems that your structure is to have a "lodge-file" for
every lodge in the "site" directory. It would make more sense to use the
per-lodge file as the index file in the lodge directory:

eg:

http://quaaoutlodge.com/site/the-lodge.html

becomes

http://quaaoutlodge.com/site/the-lodge/index.html

Now, in your "site" directory, you only need a single "index.htm[l]" file
that has a list with elements something like:

<li><a href='http://quaaoutlodge.com/site/the-lodge/'>the-lodge</a></li>

Now instead of having the files for each lodge spread across two
directories, all the files for a single lodge are in a single directory.

If you made this change, it might make your regex problem easier, because
for any lodge file in any directory, the url will always be:

http://quaaoutlodge.com/site/the-lodge/[filename]

And now you can find the filename and the dir (lodge) without having to
use any regex:

var url = window.location;
var parts = url.split("/");
var fileName = parts[parts.length-1];
var lodgeDir = parts[parts.length-2];

See http://www.sined.co.uk/tmp/pathinfo.htm for an implementation.

Rgds

Denis McMahon
 
L

Lasse Reichstein Nielsen

cerr said:
First thing, I'm a regular expression newbie.... somewhat anyways...
I would like to recognize the difference between this url:
http://quaaoutlodge.com/site/the-lodge/our-history.html
and that url:
http://quaaoutlodge.com/site/the-lodge.html
and at the same time extract the document name (our-history or the-
lodge) and the directory name if present (the-lodge).
I got stuck at how rto rcognize the second directory instead of the
first (the-lodge/ instead of site/) with "\b\/[a-z]+\/" how do i get
the second one only?

When you think a RegExp might solve your problem - stop for a moment
and think whether there is also a simpler solution :)

In this case, I'd just do:

function name(url) {
var name_end = url.lastIndexOf(".");
var name_start = url.lastIndexOf("/", name_end) + 1;
return url.substr(name_start, name_end);
}

If your URLs aren't always that simple, you'd need to adapt a RegExp too.
/L
 
T

Thomas 'PointedEars' Lahn

Lasse said:
cerr said:
First thing, I'm a regular expression newbie.... somewhat anyways...
I would like to recognize the difference between this url:
http://quaaoutlodge.com/site/the-lodge/our-history.html
and that url:
http://quaaoutlodge.com/site/the-lodge.html
and at the same time extract the document name (our-history or the-
lodge) and the directory name if present (the-lodge).
I got stuck at how rto rcognize the second directory instead of the
first (the-lodge/ instead of site/) with "\b\/[a-z]+\/" how do i get
the second one only?

When you think a RegExp might solve your problem - stop for a moment
and think whether there is also a simpler solution :)

I cannot think of anything that is simpler than

var matches = url.match(/(.*)\/([^\/]+)$/);

and then have a look at matches[1] ("directory") and matches[2] ("document
name"). But that's me.
In this case, I'd just do:

function name(url) {

That is a poor function identifier.
var name_end = url.lastIndexOf(".");
var name_start = url.lastIndexOf("/", name_end) + 1;

Paths may contain dots. Resource names do not need to.
return url.substr(name_start, name_end);

You meant

return url.substring(name_start, name_end);

String.prototyp.substr(), OTOH, is proprietary – which is why it should not
be used – and has ifferent semantics:

| B.2.3 String.prototype.substr (start, length)
}

If your URLs aren't always that simple, you'd need to adapt a RegExp too.

The general solution to this problem is so simple that you really could have
posted it (BTDT). OTOH, that is also why the OP could have found it by
STFW.


PointedEars
 
D

Dr J R Stockton

In comp.lang.javascript message <a9fa509f-5f3c-4926-abc6-c77a21427d8f@j3
6g2000prh.googlegroups.com>, Tue, 25 Oct 2011 21:43:35, cerr
First thing, I'm a regular expression newbie.... somewhat anyways...
I would like to recognize the difference between this url:
http://quaaoutlodge.com/site/the-lodge/our-history.html
and that url:
http://quaaoutlodge.com/site/the-lodge.html
and at the same time extract the document name (our-history or the-
lodge) and the directory name if present (the-lodge).
I got stuck at how rto rcognize the second directory instead of the
first (the-lodge/ instead of site/) with "\b\/[a-z]+\/" how do i get
the second one only?

The easiest way, teaching nothing about RegExps, should be to use the
string method 'split' with an argument "/", and contemplate the result
and its length.

Also, see <http://www.merlyn.demon.co.uk/js-valid.htm> generally.

That's assuming that your datum starts as a string.

If you are writing quaaoutlodge, and use include files, then you might
be including a location.href evaluation in your pages, in order that a
page can tell which it is. In that case, look up the other properties
of location.
 
A

Antony Scriven

Lasse said:
cerr said:
First thing, I'm a regular expression newbie.... somewhat anyways...
I would like to recognize the difference between this url:
http://quaaoutlodge.com/site/the-lodge/our-history.html
and that url:
http://quaaoutlodge.com/site/the-lodge.html
and at the same time extract the document name (our-history or the-
lodge) and the directory name if present (the-lodge).
I got stuck at how rto rcognize the second directory instead of the
first (the-lodge/ instead of site/) with "\b\/[a-z]+\/" how do i get
the second one only?
When you think a RegExp might solve your problem - stop for a moment
and think whether there is also a simpler solution :)

I cannot think of anything that is simpler than

var matches = url.match(/(.*)\/([^\/]+)$/);

var matches = url.match(/(.*)\/(.*)/);

--Antony
 
A

Antony Scriven

Lasse said:
First thing, I'm a regular expression newbie....
somewhat anyways... I would like to recognize the
difference between this url:

http://quaaoutlodge.com/site/the-lodge/our-history.html

and that url:

http://quaaoutlodge.com/site/the-lodge.html

and at the same time extract the document name
(our-history or the- lodge) and the directory name
if present (the-lodge). I got stuck at how rto
rcognize the second directory instead of the first
(the-lodge/ instead of site/) with "\b\/[a-z]+\/"
how do i get the second one only?
When you think a RegExp might solve your problem
- stop for a moment and think whether there is also
a simpler solution :)

I cannot think of anything that is simpler than

var matches = url.match(/(.*)\/([^\/]+)$/);

var matches = url.match(/(.*)\/(.*)/);

And the reason you didn't spot that is also the reason why
Lasse's solution (using String.prototype.lastIndexOf) is
preferable IMHO. --Antony

P.S. Sorry about the mangled quoting earlier.
 
T

Thomas 'PointedEars' Lahn

Antony said:
I cannot think of anything that is simpler than
var matches = url.match(/(.*)\/([^\/]+)$/);
var matches = url.match(/(.*)\/(.*)/);

And the reason you didn't spot that

Spot what? That your way is _not_ better?
is also the reason why Lasse's solution (using
String.prototype.lastIndexOf) is preferable IMHO. --Antony
http://foo.example/bar

P.S. Sorry about the mangled quoting earlier.

Don't be sorry about *that*.


PointedEars
 
T

Thomas 'PointedEars' Lahn

Dr said:
First thing, I'm a regular expression newbie.... somewhat anyways...
I would like to recognize the difference between this url:
http://quaaoutlodge.com/site/the-lodge/our-history.html
and that url:
http://quaaoutlodge.com/site/the-lodge.html
and at the same time extract the document name (our-history or the-
lodge) and the directory name if present (the-lodge).
I got stuck at how rto rcognize the second directory instead of the
first (the-lodge/ instead of site/) with "\b\/[a-z]+\/" how do i get
the second one only?

The easiest way, teaching nothing about RegExps, should be to use the
string method 'split' with an argument "/", and contemplate the result
and its length.

By contrast, that requires accessing the `length' property of the resulting
array, too, and is inflexible with regard to potential query and fragment
parts.


PointedEars
 
A

Antony Scriven

Antony said:
On Oct 26, 10:40 pm, Thomas 'PointedEars' Lahn wrote:
I cannot think of anything that is simpler than
var matches = url.match(/(.*)\/([^\/]+)$/);
var matches = url.match(/(.*)\/(.*)/);
And the reason you didn't spot that

Spot what? That your way is _not_ better?

How so? And, really, url.match(/site\/(.*\/)?(.*)/) is much
closer to what the OP actually asked for. And if the
complexity of the URLs increase at all, so does that of the
regexp. Regexps are a great way to hide bugs. --Antony
 
L

Lasse Reichstein Nielsen

Thomas 'PointedEars' Lahn said:

My "solution" was very hardcoded to the format that the OP used, i.e.,
ending in "/somename.html".
Since that was all the examples he gave, and no real textual explanation,
it's impossible to generalize further.

Maybe I should have said that :)

/L
 
T

Thomas 'PointedEars' Lahn

Lasse said:
My "solution" was very hardcoded to the format that the OP used, i.e.,
ending in "/somename.html".
Since that was all the examples he gave, and no real textual explanation,

It was clear enough to me that they wanted to know the last path component
of a URI.
it's impossible to generalize further.

Well, it wasn't.
Maybe I should have said that :)

It was clear to me that your code was limited, however I saw and still see
no good reason for doing that when the general solution – the one using
RegExp, which was being asked for – is so obvious.


PointedEars
 
A

Antony Scriven

It was clear enough to me that they wanted to know the
last path component of a URI.


Well, it wasn't.
Cough.

Unless you have Asperger's or some other similar condition,
I don't think there's any difficulty in understanding what
Lasse wrote, and its implications.
It was clear to me that your code was limited, however
I saw and still see no good reason for doing that when
the general solution -- the one using RegExp, which
was being asked for -- is so obvious.

Well, I already showed that that isn't so. And if an expert
such as yourself can't make an obvious regexp match the
specification, then I think there is a lesson to be learnt
there. Regexps can be powerful, terse, and convenient, but
they can be very tricky things to get right, even the simple
ones. --Antony

P.S. Having said what I've said, I think it's a good thing
that its regexps are somewhat limited compared to some other
implementations.
 
A

Antony Scriven

[...]

P.S. Having said what I've said, I think it's a good thing
that its regexps are somewhat limited compared to some other
implementations.

s/its/JS's/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top