Regular expression question

Discussion in 'Javascript' started by cerr, Oct 26, 2011.

  1. cerr

    cerr Guest

    Hi There,

    First thing, I'm a regular expression newbie.... somewhat anyways...
    I would like to recognize the difference between this url:
    http://quaaoutlodge.com/site/the-lodge/our-history.html
    and that url:
    http://quaaoutlodge.com/site/the-lodge.html
    and at the same time extract the document name (our-history or the-
    lodge) and the directory name if present (the-lodge).
    I got stuck at how rto rcognize the second directory instead of the
    first (the-lodge/ instead of site/) with "\b\/[a-z]+\/" how do i get
    the second one only?
    Thanks!
    Ron
     
    cerr, Oct 26, 2011
    #1
    1. Advertising

  2. cerr

    Mike Duffy Guest

    cerr <> wrote in news:a9fa509f-5f3c-4926-abc6-
    :

    > I would like to recognize the difference between this url:
    > http://quaaoutlodge.com/site/the-lodge/our-history.html
    > and that url:
    > http://quaaoutlodge.com/site/the-lodge.html


    Since you say you are a beginner, it might be easier to first strip away
    the leading "http://quaaoutlodge.com" and the trailing ".html".

    Now your problem is recognizing the difference between:

    "/site/the-lodge/our-history" and "/site/the-lodge". Your task has been
    reduced simply to counting "/"s.

    --
    http://pages.videotron.ca/duffym/index.htm#
     
    Mike Duffy, Oct 26, 2011
    #2
    1. Advertising

  3. On Tue, 25 Oct 2011 21:43:35 -0700, cerr wrote:

    > Hi There,
    >
    > First thing, I'm a regular expression newbie.... somewhat anyways... I
    > would like to recognize the difference between this url:
    > http://quaaoutlodge.com/site/the-lodge/our-history.html and that url:
    > http://quaaoutlodge.com/site/the-lodge.html and at the same time extract
    > the document name (our-history or the- lodge) and the directory name if
    > present (the-lodge). I got stuck at how rto rcognize the second
    > directory instead of the first (the-lodge/ instead of site/) with
    > "\b\/[a-z]+\/" how do i get the second one only?


    First of all, it seems that your structure is to have a "lodge-file" for
    every lodge in the "site" directory. It would make more sense to use the
    per-lodge file as the index file in the lodge directory:

    eg:

    http://quaaoutlodge.com/site/the-lodge.html

    becomes

    http://quaaoutlodge.com/site/the-lodge/index.html

    Now, in your "site" directory, you only need a single "index.htm[l]" file
    that has a list with elements something like:

    <li><a href='http://quaaoutlodge.com/site/the-lodge/'>the-lodge</a></li>

    Now instead of having the files for each lodge spread across two
    directories, all the files for a single lodge are in a single directory.

    If you made this change, it might make your regex problem easier, because
    for any lodge file in any directory, the url will always be:

    http://quaaoutlodge.com/site/the-lodge/[filename]

    And now you can find the filename and the dir (lodge) without having to
    use any regex:

    var url = window.location;
    var parts = url.split("/");
    var fileName = parts[parts.length-1];
    var lodgeDir = parts[parts.length-2];

    See http://www.sined.co.uk/tmp/pathinfo.htm for an implementation.

    Rgds

    Denis McMahon
     
    Denis McMahon, Oct 26, 2011
    #3
  4. cerr <> writes:

    > First thing, I'm a regular expression newbie.... somewhat anyways...
    > I would like to recognize the difference between this url:
    > http://quaaoutlodge.com/site/the-lodge/our-history.html
    > and that url:
    > http://quaaoutlodge.com/site/the-lodge.html
    > and at the same time extract the document name (our-history or the-
    > lodge) and the directory name if present (the-lodge).
    > I got stuck at how rto rcognize the second directory instead of the
    > first (the-lodge/ instead of site/) with "\b\/[a-z]+\/" how do i get
    > the second one only?


    When you think a RegExp might solve your problem - stop for a moment
    and think whether there is also a simpler solution :)

    In this case, I'd just do:

    function name(url) {
    var name_end = url.lastIndexOf(".");
    var name_start = url.lastIndexOf("/", name_end) + 1;
    return url.substr(name_start, name_end);
    }

    If your URLs aren't always that simple, you'd need to adapt a RegExp too.
    /L
    --
    Lasse Reichstein Holst Nielsen
    'Javascript frameworks is a disruptive technology'
     
    Lasse Reichstein Nielsen, Oct 26, 2011
    #4
  5. Lasse Reichstein Nielsen wrote:

    > cerr <> writes:
    >> First thing, I'm a regular expression newbie.... somewhat anyways...
    >> I would like to recognize the difference between this url:
    >> http://quaaoutlodge.com/site/the-lodge/our-history.html
    >> and that url:
    >> http://quaaoutlodge.com/site/the-lodge.html
    >> and at the same time extract the document name (our-history or the-
    >> lodge) and the directory name if present (the-lodge).
    >> I got stuck at how rto rcognize the second directory instead of the
    >> first (the-lodge/ instead of site/) with "\b\/[a-z]+\/" how do i get
    >> the second one only?

    >
    > When you think a RegExp might solve your problem - stop for a moment
    > and think whether there is also a simpler solution :)


    I cannot think of anything that is simpler than

    var matches = url.match(/(.*)\/([^\/]+)$/);

    and then have a look at matches[1] ("directory") and matches[2] ("document
    name"). But that's me.

    > In this case, I'd just do:
    >
    > function name(url) {


    That is a poor function identifier.

    > var name_end = url.lastIndexOf(".");
    > var name_start = url.lastIndexOf("/", name_end) + 1;


    Paths may contain dots. Resource names do not need to.

    > return url.substr(name_start, name_end);


    You meant

    return url.substring(name_start, name_end);

    String.prototyp.substr(), OTOH, is proprietary – which is why it should not
    be used – and has ifferent semantics:

    | B.2.3 String.prototype.substr (start, length)

    > }
    >
    > If your URLs aren't always that simple, you'd need to adapt a RegExp too.


    The general solution to this problem is so simple that you really could have
    posted it (BTDT). OTOH, that is also why the OP could have found it by
    STFW.


    PointedEars
    --
    Prototype.js was written by people who don't know javascript for people
    who don't know javascript. People who don't know javascript are not
    the best source of advice on designing systems that use javascript.
    -- Richard Cornford, cljs, <f806at$ail$1$>
     
    Thomas 'PointedEars' Lahn, Oct 26, 2011
    #5
  6. In comp.lang.javascript message <a9fa509f-5f3c-4926-abc6-c77a21427d8f@j3
    6g2000prh.googlegroups.com>, Tue, 25 Oct 2011 21:43:35, cerr
    <> posted:

    >First thing, I'm a regular expression newbie.... somewhat anyways...
    >I would like to recognize the difference between this url:
    >http://quaaoutlodge.com/site/the-lodge/our-history.html
    >and that url:
    >http://quaaoutlodge.com/site/the-lodge.html
    >and at the same time extract the document name (our-history or the-
    >lodge) and the directory name if present (the-lodge).
    >I got stuck at how rto rcognize the second directory instead of the
    >first (the-lodge/ instead of site/) with "\b\/[a-z]+\/" how do i get
    >the second one only?


    The easiest way, teaching nothing about RegExps, should be to use the
    string method 'split' with an argument "/", and contemplate the result
    and its length.

    Also, see <http://www.merlyn.demon.co.uk/js-valid.htm> generally.

    That's assuming that your datum starts as a string.

    If you are writing quaaoutlodge, and use include files, then you might
    be including a location.href evaluation in your pages, in order that a
    page can tell which it is. In that case, look up the other properties
    of location.

    --
    (c) John Stockton, nr London, UK. ?@merlyn.demon.co.uk Turnpike v6.05 MIME.
    Web <http://www.merlyn.demon.co.uk/> - FAQqish topics, acronyms and links;
    Astro stuff via astron-1.htm, gravity0.htm ; quotings.htm, pascal.htm, etc.
    No Encoding. Quotes before replies. Snip well. Write clearly. Don't Mail News.
     
    Dr J R Stockton, Oct 27, 2011
    #6
  7. On Oct 26, 10:40 pm, Thomas 'PointedEars' Lahn wrote:

    > Lasse Reichstein Nielsen wrote:
    > > cerr <> writes:
    > > > First thing, I'm a regular expression newbie.... somewhat

    anyways...
    > > > I would like to recognize the difference between this url:
    > > >http://quaaoutlodge.com/site/the-lodge/our-history.html
    > > > and that url:
    > > >http://quaaoutlodge.com/site/the-lodge.html
    > > > and at the same time extract the document name (our-history or

    the-
    > > > lodge) and the directory name if present (the-lodge).
    > > > I got stuck at how rto rcognize the second directory instead of

    the
    > > > first (the-lodge/ instead of site/) with "\b\/[a-z]+\/" how do

    i get
    > > > the second one only?

    >
    > > When you think a RegExp might solve your problem - stop for a

    moment
    > > and think whether there is also a simpler solution :)

    >
    > I cannot think of anything that is simpler than
    >
    > var matches = url.match(/(.*)\/([^\/]+)$/);


    var matches = url.match(/(.*)\/(.*)/);

    --Antony
     
    Antony Scriven, Oct 28, 2011
    #7
  8. On Oct 28, 1:30 am, Antony Scriven wrote:

    > On Oct 26, 10:40 pm, Thomas 'PointedEars' Lahn wrote:
    >
    > > Lasse Reichstein Nielsen wrote:
    > > > cerr <> writes:
    > > > > First thing, I'm a regular expression newbie....
    > > > > somewhat anyways... I would like to recognize the
    > > > > difference between this url:
    > > > >
    > > > > http://quaaoutlodge.com/site/the-lodge/our-history.html
    > > > >
    > > > > and that url:
    > > > >
    > > > > http://quaaoutlodge.com/site/the-lodge.html
    > > > >
    > > > > and at the same time extract the document name
    > > > > (our-history or the- lodge) and the directory name
    > > > > if present (the-lodge). I got stuck at how rto
    > > > > rcognize the second directory instead of the first
    > > > > (the-lodge/ instead of site/) with "\b\/[a-z]+\/"
    > > > > how do i get the second one only?

    > >
    > > > When you think a RegExp might solve your problem
    > > > - stop for a moment and think whether there is also
    > > > a simpler solution :)

    > >
    > > I cannot think of anything that is simpler than
    > >
    > > var matches = url.match(/(.*)\/([^\/]+)$/);

    >
    > var matches = url.match(/(.*)\/(.*)/);


    And the reason you didn't spot that is also the reason why
    Lasse's solution (using String.prototype.lastIndexOf) is
    preferable IMHO. --Antony

    P.S. Sorry about the mangled quoting earlier.
     
    Antony Scriven, Oct 28, 2011
    #8
  9. Antony Scriven wrote:

    > On Oct 28, 1:30 am, Antony Scriven wrote:
    >> On Oct 26, 10:40 pm, Thomas 'PointedEars' Lahn wrote:
    >> > I cannot think of anything that is simpler than
    >> > var matches = url.match(/(.*)\/([^\/]+)$/);

    >> var matches = url.match(/(.*)\/(.*)/);

    >
    > And the reason you didn't spot that


    Spot what? That your way is _not_ better?

    > is also the reason why Lasse's solution (using
    > String.prototype.lastIndexOf) is preferable IMHO. --Antony


    http://foo.example/bar

    > P.S. Sorry about the mangled quoting earlier.


    Don't be sorry about *that*.


    PointedEars
    --
    When all you know is jQuery, every problem looks $olvable.
     
    Thomas 'PointedEars' Lahn, Oct 28, 2011
    #9
  10. Dr J R Stockton wrote:

    > <> posted:
    >> First thing, I'm a regular expression newbie.... somewhat anyways...
    >> I would like to recognize the difference between this url:
    >> http://quaaoutlodge.com/site/the-lodge/our-history.html
    >> and that url:
    >> http://quaaoutlodge.com/site/the-lodge.html
    >> and at the same time extract the document name (our-history or the-
    >> lodge) and the directory name if present (the-lodge).
    >> I got stuck at how rto rcognize the second directory instead of the
    >> first (the-lodge/ instead of site/) with "\b\/[a-z]+\/" how do i get
    >> the second one only?

    >
    > The easiest way, teaching nothing about RegExps, should be to use the
    > string method 'split' with an argument "/", and contemplate the result
    > and its length.


    By contrast, that requires accessing the `length' property of the resulting
    array, too, and is inflexible with regard to potential query and fragment
    parts.


    PointedEars
    --
    var bugRiddenCrashPronePieceOfJunk = (
    navigator.userAgent.indexOf('MSIE 5') != -1
    && navigator.userAgent.indexOf('Mac') != -1
    ) // Plone, register_function.js:16
     
    Thomas 'PointedEars' Lahn, Oct 28, 2011
    #10
  11. On Oct 28, 2:50 am, Thomas 'PointedEars' Lahn wrote:

    > Antony Scriven wrote:
    > > On Oct 28, 1:30 am, Antony Scriven wrote:
    > > > On Oct 26, 10:40 pm, Thomas 'PointedEars' Lahn wrote:
    > > > > I cannot think of anything that is simpler than
    > > > > var matches = url.match(/(.*)\/([^\/]+)$/);
    > > > var matches = url.match(/(.*)\/(.*)/);

    >
    > > And the reason you didn't spot that

    >
    > Spot what? That your way is _not_ better?


    How so? And, really, url.match(/site\/(.*\/)?(.*)/) is much
    closer to what the OP actually asked for. And if the
    complexity of the URLs increase at all, so does that of the
    regexp. Regexps are a great way to hide bugs. --Antony
     
    Antony Scriven, Oct 28, 2011
    #11
  12. Thomas 'PointedEars' Lahn <> writes:

    > Antony Scriven wrote:


    >> is also the reason why Lasse's solution (using
    >> String.prototype.lastIndexOf) is preferable IMHO. --Antony

    >
    > http://foo.example/bar


    My "solution" was very hardcoded to the format that the OP used, i.e.,
    ending in "/somename.html".
    Since that was all the examples he gave, and no real textual explanation,
    it's impossible to generalize further.

    Maybe I should have said that :)

    /L
    --
    Lasse Reichstein Holst Nielsen
    'Javascript frameworks is a disruptive technology'
     
    Lasse Reichstein Nielsen, Oct 28, 2011
    #12
  13. Lasse Reichstein Nielsen wrote:

    > Thomas 'PointedEars' Lahn <> writes:
    >> Antony Scriven wrote:
    >>> is also the reason why Lasse's solution (using
    >>> String.prototype.lastIndexOf) is preferable IMHO. --Antony

    >> http://foo.example/bar

    >
    > My "solution" was very hardcoded to the format that the OP used, i.e.,
    > ending in "/somename.html".
    > Since that was all the examples he gave, and no real textual explanation,


    It was clear enough to me that they wanted to know the last path component
    of a URI.

    > it's impossible to generalize further.


    Well, it wasn't.

    > Maybe I should have said that :)


    It was clear to me that your code was limited, however I saw and still see
    no good reason for doing that when the general solution – the one using
    RegExp, which was being asked for – is so obvious.


    PointedEars
    --
    Danny Goodman's books are out of date and teach practices that are
    positively harmful for cross-browser scripting.
    -- Richard Cornford, cljs, <cife6q$253$1$> (2004)
     
    Thomas 'PointedEars' Lahn, Oct 28, 2011
    #13
  14. On Oct 28, 6:50 pm, Thomas 'PointedEars' Lahn wrote:

    > Lasse Reichstein Nielsen wrote:
    > > Thomas 'PointedEars' Lahn <> writes:
    > > > Antony Scriven wrote:
    > > > > is also the reason why Lasse's solution (using
    > > > > String.prototype.lastIndexOf) is preferable IMHO. --Antony
    > > > http://foo.example/bar

    >
    > > My "solution" was very hardcoded to the format that the
    > > OP used, i.e., ending in "/somename.html". Since that
    > > was all the examples he gave, and no real textual
    > > explanation,

    >
    > It was clear enough to me that they wanted to know the
    > last path component of a URI.
    >
    > > it's impossible to generalize further.

    >
    > Well, it wasn't.


    Cough.

    > > Maybe I should have said that :)


    Unless you have Asperger's or some other similar condition,
    I don't think there's any difficulty in understanding what
    Lasse wrote, and its implications.

    > It was clear to me that your code was limited, however
    > I saw and still see no good reason for doing that when
    > the general solution -- the one using RegExp, which
    > was being asked for -- is so obvious.


    Well, I already showed that that isn't so. And if an expert
    such as yourself can't make an obvious regexp match the
    specification, then I think there is a lesson to be learnt
    there. Regexps can be powerful, terse, and convenient, but
    they can be very tricky things to get right, even the simple
    ones. --Antony

    P.S. Having said what I've said, I think it's a good thing
    that its regexps are somewhat limited compared to some other
    implementations.
     
    Antony Scriven, Oct 28, 2011
    #14
  15. On Oct 28, 7:41 pm, Antony Scriven wrote:

    > [...]
    >
    > P.S. Having said what I've said, I think it's a good thing
    > that its regexps are somewhat limited compared to some other
    > implementations.


    s/its/JS's/
     
    Antony Scriven, Oct 28, 2011
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Andrew Munn

    Regular expression question...

    Andrew Munn, Jun 29, 2003, in forum: Perl
    Replies:
    1
    Views:
    2,176
    rakesh sharma
    Jun 30, 2003
  2. Glenn Kidd

    Regular expression question

    Glenn Kidd, Aug 18, 2003, in forum: Perl
    Replies:
    0
    Views:
    947
    Glenn Kidd
    Aug 18, 2003
  3. VSK
    Replies:
    2
    Views:
    2,382
  4. =?iso-8859-1?B?bW9vcJk=?=

    Matching abitrary expression in a regular expression

    =?iso-8859-1?B?bW9vcJk=?=, Dec 1, 2005, in forum: Java
    Replies:
    8
    Views:
    883
    Alan Moore
    Dec 2, 2005
  5. GIMME
    Replies:
    3
    Views:
    12,048
    vforvikash
    Dec 29, 2008
Loading...

Share This Page