finding a substring from left back of a string

Discussion in 'Java' started by johndesp, Aug 18, 2004.

  1. johndesp

    johndesp Guest

    I have string representation of a url such as

    http://www.whatever.com/whatever/whenever/whoever.asp?why=25

    in the most efficient manner I would like to obtain the substring

    http://www.whatever.com/whatever/whenever/

    I think I can do this using a combination of StringTokenizer and the
    substring method of String. Is there an easier and/or more efficient
    way?

    I am looking for a method that I could pass the "/" as a character and
    have it return back to me everything left back of the last instance of
    "/".



    Thanks
    johndesp, Aug 18, 2004
    #1
    1. Advertising

  2. johndesp

    Lee Weiner Guest

    In article <>, (johndesp) wrote:
    >I have string representation of a url such as
    >
    >http://www.whatever.com/whatever/whenever/whoever.asp?why=25
    >
    >in the most efficient manner I would like to obtain the substring
    >
    >http://www.whatever.com/whatever/whenever/
    >
    >I think I can do this using a combination of StringTokenizer and the
    >substring method of String. Is there an easier and/or more efficient
    >way?
    >
    >I am looking for a method that I could pass the "/" as a character and
    >have it return back to me everything left back of the last instance of
    >"/".


    You need <String>.lastIndexOf.

    String url = "http://www.whatever.com/whatever/whenever/whoever.asp?why=25";
    int pos = url.lastIndexOf( '/' );
    String newUrl = url.substring( 0, pos + 1 );

    Lee Weiner
    lee AT leeweiner DOT org
    Lee Weiner, Aug 18, 2004
    #2
    1. Advertising

  3. johndesp

    KC Wong Guest

    > I have string representation of a url such as
    > http://www.whatever.com/whatever/whenever/whoever.asp?why=25
    >
    > in the most efficient manner I would like to obtain the substring
    > http://www.whatever.com/whatever/whenever/
    >
    > I think I can do this using a combination of StringTokenizer and the
    > substring method of String. Is there an easier and/or more efficient
    > way?
    >
    > I am looking for a method that I could pass the "/" as a character and
    > have it return back to me everything left back of the last instance of
    > "/".


    You should check the API docs for that. Browse the methods of class
    java.lang.String and find one that does the job.

    Alternatively, look at java.net.URL class. One of its method will make this
    task very easy.
    KC Wong, Aug 18, 2004
    #3
  4. johndesp

    Paul Lutus Guest

    johndesp wrote:

    > I have string representation of a url such as
    >
    > http://www.whatever.com/whatever/whenever/whoever.asp?why=25
    >
    > in the most efficient manner I would like to obtain the substring
    >
    > http://www.whatever.com/whatever/whenever/
    >
    > I think I can do this using a combination of StringTokenizer and the
    > substring method of String. Is there an easier and/or more efficient
    > way?


    Don't use StringTokenizer, it is a disaster area masquerading as a java
    class.

    > I am looking for a method that I could pass the "/" as a character and
    > have it return back to me everything left back of the last instance of
    > "/".


    Why not read up on the String class and select an appropriate way to (big
    hint) find the last index of "/", then take the substring from the start to
    that character?

    --
    Paul Lutus
    http://www.arachnoid.com
    Paul Lutus, Aug 18, 2004
    #4
  5. johndesp

    zoopy Guest

    On 18-8-2004 4:01, johndesp wrote:

    > I have string representation of a url such as
    >
    > http://www.whatever.com/whatever/whenever/whoever.asp?why=25
    >
    > in the most efficient manner I would like to obtain the substring


    Define efficient...

    >
    > http://www.whatever.com/whatever/whenever/
    >
    > I think I can do this using a combination of StringTokenizer and the
    > substring method of String. Is there an easier and/or more efficient
    > way?
    >
    > I am looking for a method that I could pass the "/" as a character and
    > have it return back to me everything left back of the last instance of
    > "/".
    >
    >
    >
    > Thanks


    If you mean by efficient 'the least programming to do by yourself', then the constructors of
    java.net.URL provide what you want:

    URL base = new URL("http://www.whatever.com/whatever/whenever/whoever.asp?why=25");
    // -> http://www.whatever.com/whatever/whenever/whoever.asp?why=25

    URL current = new URL(base, ".");
    // -> http://www.whatever.com/whatever/whenever/

    URL parent = new URL(base, "..");
    // -> http://www.whatever.com/whatever/

    URL root = new URL(base, "/");
    // -> http://www.whatever.com/

    URL here = new URL(base, "here.html");
    // -> http://www.whatever.com/whatever/whenever/here.html

    URL there = new URL(base, "/there.html");
    // -> http://www.whatever.com/there.html

    URL everywhere = new URL(base, "../everywhere.html");
    // -> http://www.whatever.com/whatever/everywhere.html

    [... and use URL.toString() to convert it back to a string]

    --
    Regards,
    Z.
    zoopy, Aug 18, 2004
    #5
  6. johndesp

    zoopy Guest

    zoopy, Aug 18, 2004
    #6
  7. Paul Lutus wrote:
    > Don't use StringTokenizer, it is a disaster area masquerading as a java
    > class.


    I think that's rather strong. StringTokenizer does a fine job on those
    things it is documented to do. As with any class, people tend to have
    problems with StringTokenizer when they expect it to do things
    differently than it in fact does, which does not usually happen to
    people who have read its documentation prior to using it. The most
    common issue tends to be with the the way the class defines a token,
    which excludes the possibility of empty tokens.

    A stronger argument can be made for StreamTokenizer being problematic.
    The same comments about reading documentation still apply, but
    StreamTokenizer does exhibit some (documented) behaviors that make it
    difficult to use in a variety of circumstances.

    With all that said, let's be clear that StringTokenizer will
    nevertheless not serve as the best basis for the task that the OP wants
    to perform.


    John Bollinger
    John C. Bollinger, Aug 19, 2004
    #7
  8. johndesp

    Paul Lutus Guest

    John C. Bollinger wrote:

    > Paul Lutus wrote:
    >> Don't use StringTokenizer, it is a disaster area masquerading as a java
    >> class.

    >
    > I think that's rather strong.


    Not really, especially if you have tried to use it in the kind of vanilla
    parsing tasks for which it was originally intended.

    > StringTokenizer does a fine job on those
    > things it is documented to do.


    Sadly, not true. The various defects are not clearly documented except in
    newsgroups, where complaints about this class have the status of legend.

    > As with any class, people tend to have
    > problems with StringTokenizer when they expect it to do things
    > differently than it in fact does,


    A malady most often brought on by reading the documentation.

    > which does not usually happen to
    > people who have read its documentation prior to using it.


    No, this is not correct. The documentation doesn't accurately reflect the
    behavior of the class.

    > The most
    > common issue tends to be with the the way the class defines a token,
    > which excludes the possibility of empty tokens.


    And this is not clearly documented, and it is not expected, and it is
    inexcusable. To see exactly how inexcusable, one need only write a method
    to parse a string on specified tokens and produce consistent results. It
    just isn't that difficult.

    If the documentation were honestly written, it would warn people not to use
    the class at all and advise that it is present in the language only because
    applications have already been written using it.

    It is one thing to deprecate a method in a class, it is quite another to
    deprecate an entire class, which must be why this has not happened ... yet.

    But this is sort of academic since regular expressions have been added to
    Java. In all but the most speed-critical applications, that is now the
    preferred approach. For speed-critical cases in which, for example, a
    record needs to be parsed into fields, people are reduced to writing a
    replacement for StringTokenizer in order that each record have the correct
    number of fields, including empty ones.

    > With all that said, let's be clear that StringTokenizer will
    > nevertheless not serve as the best basis for the task that the OP wants
    > to perform.


    Concur.

    --
    Paul Lutus
    http://www.arachnoid.com
    Paul Lutus, Aug 19, 2004
    #8
  9. Paul Lutus wrote:

    > John C. Bollinger wrote:
    >
    >
    >>Paul Lutus wrote:
    >>
    >>>Don't use StringTokenizer, it is a disaster area masquerading as a java
    >>>class.

    >>
    >>I think that's rather strong.

    >
    >
    > Not really, especially if you have tried to use it in the kind of vanilla
    > parsing tasks for which it was originally intended.


    I use it all over the place for vanilla parsing tasks. I don't think
    I've ever had a problem with it.

    >>StringTokenizer does a fine job on those
    >>things it is documented to do.

    >
    >
    > Sadly, not true. The various defects are not clearly documented except in
    > newsgroups, where complaints about this class have the status of legend.


    I'm sure I haven't been participating here as long as you have, but in
    my recollection (and my Google search) that just doesn't seem to be the
    case. There is one notorious event in StringTokenizer history: the
    change in behavior of StringTokenizer.nextToken(String) at some point in
    the Java 1.3 series. That did generate more than one thread around that
    time, at least one of them quite long, so perhaps that particular
    complaint is legendary. The issue of null tokens certainly has the
    status of a FAQ; if that's what you mean then I already stipulated so.

    >>which does not usually happen to
    >>people who have read its documentation prior to using it.

    >
    >
    > No, this is not correct. The documentation doesn't accurately reflect the
    > behavior of the class.


    I'm sorry, but I guess I'm too dense or blind. In what way is the
    documentation inaccurate?

    >>The most
    >>common issue tends to be with the the way the class defines a token,
    >>which excludes the possibility of empty tokens.

    >
    >
    > And this is not clearly documented, and it is not expected, and it is
    > inexcusable. To see exactly how inexcusable, one need only write a method
    > to parse a string on specified tokens and produce consistent results. It
    > just isn't that difficult.


    OK, I'll give you that the fact that the class docs don't make it clear
    that delimiters are formed of sequences of delimiter characters, not
    strictly by individual delimiter characters. As for whether or not
    that's expected, I'd say it must depend heavily on the person whose
    expectations are in question. I certainly wouldn't call the behavior
    "inexcusable", however, as frequently it is exactly the behavior I want,
    and I'm sure I'm not such an odd bird as to be the only one who ever
    wants it.

    > If the documentation were honestly written, it would warn people not to use
    > the class at all and advise that it is present in the language only because
    > applications have already been written using it.


    "Honest"? I don't see where honesty comes into it. But as a matter of
    fact: "StringTokenizer is a legacy class that is retained for
    compatibility reasons although its use is discouraged in new code. It is
    recommended that anyone seeking this functionality use the split method
    of String or the java.util.regex package instead."



    Wherefrom comes such animosity, anyway?


    John Bollinger
    John C. Bollinger, Aug 20, 2004
    #9
  10. johndesp

    Paul Lutus Guest

    John C. Bollinger wrote:

    > Paul Lutus wrote:
    >
    >> John C. Bollinger wrote:
    >>
    >>
    >>>Paul Lutus wrote:
    >>>
    >>>>Don't use StringTokenizer, it is a disaster area masquerading as a java
    >>>>class.
    >>>
    >>>I think that's rather strong.

    >>
    >>
    >> Not really, especially if you have tried to use it in the kind of vanilla
    >> parsing tasks for which it was originally intended.

    >
    > I use it all over the place for vanilla parsing tasks. I don't think
    > I've ever had a problem with it.


    To see the real perverse behavior of this class, the basis for its
    notoriety, try parsing database records or comma- or tab-separated records
    that have occasional empty fields. Students typically create a record
    parser using StringTokenizer and only much later see behavior they cannot
    readily explain.

    --
    Paul Lutus
    http://www.arachnoid.com
    Paul Lutus, Aug 20, 2004
    #10
  11. johndesp

    Alan Moore Guest

    On Thu, 19 Aug 2004 21:07:52 -0700, Paul Lutus <>
    wrote:

    >To see the real perverse behavior of this class, the basis for its
    >notoriety, try parsing database records or comma- or tab-separated records
    >that have occasional empty fields. Students typically create a record
    >parser using StringTokenizer and only much later see behavior they cannot
    >readily explain.


    The biggest problem with StringTokenizer is that people expect to be
    able to do certain thing with it, only to learn either that they can't
    do what they want (i.e., use multi-character delimiters), or that it's
    a lot harder than it should be (i.e., parse colon-delimited data,
    allowing for empty fields). Of course, this will be true to some
    extent for any class, no matter how well-designed its API is, but
    StringTokenizer's behavior is particularly perverse, and its
    documentation does nothing to offset that.

    The split() method is supposed to be StringTokenizer's replacement,
    but it's no easier for newbies to grok. If you're already familiar
    with regexes and the split function from other languages, you're fine;
    otherwise, you might as well be standing at the bottom of a sheer
    cliff, looking up. And when it comes to parsing CSV data, split() is
    just as tantalizingly useless ia StringTokenizer.

    I wouldn't have phrased it as strongly as Paul did, but I agree that
    StringTokenizer should never have been included in the JDK; it's like
    a sore that never heals.
    Alan Moore, Aug 20, 2004
    #11
  12. johndesp

    Paul Lutus Guest

    Alan Moore wrote:

    > On Thu, 19 Aug 2004 21:07:52 -0700, Paul Lutus <>
    > wrote:
    >
    >>To see the real perverse behavior of this class, the basis for its
    >>notoriety, try parsing database records or comma- or tab-separated records
    >>that have occasional empty fields. Students typically create a record
    >>parser using StringTokenizer and only much later see behavior they cannot
    >>readily explain.

    >
    > The biggest problem with StringTokenizer is that people expect to be
    > able to do certain thing with it, only to learn either that they can't
    > do what they want (i.e., use multi-character delimiters), or that it's
    > a lot harder than it should be (i.e., parse colon-delimited data,
    > allowing for empty fields). Of course, this will be true to some
    > extent for any class, no matter how well-designed its API is,


    Actually, it is very easy to create a tokenizer that accepts multi-character
    tokens and always produces the right number of fields, but such a method is
    not particularly fast compared to one that only accepts single character
    tokens;

    String[] split(String data,String token)
    {
    Vector v = new Vector();
    int a = 0,b;
    int tlen = token.length();
    while((b = data.indexOf(token,a)) != -1) {
    v.add(data.substring(a,b));
    a = b + tlen;
    }
    v.add(data.substring(a));
    return (String[]) v.toArray(new String[v.size()]);
    }

    > but
    > StringTokenizer's behavior is particularly perverse, and its
    > documentation does nothing to offset that.
    >
    > The split() method is supposed to be StringTokenizer's replacement,
    > but it's no easier for newbies to grok. If you're already familiar
    > with regexes and the split function from other languages, you're fine;
    > otherwise, you might as well be standing at the bottom of a sheer
    > cliff, looking up. And when it comes to parsing CSV data, split() is
    > just as tantalizingly useless ia StringTokenizer.


    Yes, ironically enough, which is why I find myself applying the above method
    with great regularity. As I said, it is slower that a carefully designed
    method that accepts only one-character tokens, but it competes well with
    the regex methods.

    >
    > I wouldn't have phrased it as strongly as Paul did, but I agree that
    > StringTokenizer should never have been included in the JDK; it's like
    > a sore that never heals.


    I don't think the original programmers understood what StringTokenizer
    actually needed to be able to do.

    --
    Paul Lutus
    http://www.arachnoid.com
    Paul Lutus, Aug 20, 2004
    #12
  13. Paul Lutus wrote:

    > To see the real perverse behavior of this class, the basis for its
    > notoriety, try parsing database records or comma- or tab-separated records
    > that have occasional empty fields. Students typically create a record
    > parser using StringTokenizer and only much later see behavior they cannot
    > readily explain.


    I already know that that doesn't work -- we have discussed the fact in
    this thread. I don't consider the behavior "perverse" in any way,
    however, on which point I suppose we'll just have to disagree. The
    simple fact that StringTokenizer's behavior is often exactly what I want
    is all the basis I need for my dissent. I don't see how StringTokenizer
    being the wrong class for some purposes makes its behavior perverse.

    I'm sure you're quite right that students sometimes stumble over the
    behavior. On the other hand, when they do there is an opportunity to
    teach them something about reading specifications (what do the docs
    actually say, and what did you read into them that isn't really there?)
    and about testing. It sounds like that won't sway you, though. It
    wouldn't sway me either if it were the only justification.


    John Bollinger
    John C. Bollinger, Aug 20, 2004
    #13
  14. johndesp

    Paul Lutus Guest

    John C. Bollinger wrote:

    > Paul Lutus wrote:
    >
    >> To see the real perverse behavior of this class, the basis for its
    >> notoriety, try parsing database records or comma- or tab-separated
    >> records that have occasional empty fields. Students typically create a
    >> record parser using StringTokenizer and only much later see behavior they
    >> cannot readily explain.

    >
    > I already know that that doesn't work -- we have discussed the fact in
    > this thread. I don't consider the behavior "perverse" in any way,
    > however, on which point I suppose we'll just have to disagree. The
    > simple fact that StringTokenizer's behavior is often exactly what I want
    > is all the basis I need for my dissent. I don't see how StringTokenizer
    > being the wrong class for some purposes makes its behavior perverse.


    It is a question of the most common use of this class, and its apparent
    suitability for this particular, very common, task.

    > I'm sure you're quite right that students sometimes stumble over the
    > behavior.


    In particuilar because the example given in the StringTokenizer
    documentation strongly hints at its primary purpose, and no mention is made
    of its primary flaw.

    > On the other hand, when they do there is an opportunity to
    > teach them something about reading specifications (what do the docs
    > actually say, and what did you read into them that isn't really there?)


    I just read the entire document for StringTokenizer amnd it very simply does
    not say that the wrong number of tokens will be returned if there are empty
    fields. The problem lies with the class and its documentation, the user in
    this case is quite blameless.

    > and about testing.


    Yes, unfortunately it is not that common for someone to exhaustively test a
    class' correspondence with its published documentation. That raises
    cynicism to an art form.

    > It sounds like that won't sway you, though.


    It really won't, especially now that I have read the documentation once
    again and noted the absence of mention of this serious shortcoming.

    --
    Paul Lutus
    http://www.arachnoid.com
    Paul Lutus, Aug 20, 2004
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Badass Scotsman

    Finding a SubString within a String

    Badass Scotsman, Mar 31, 2006, in forum: ASP .Net
    Replies:
    2
    Views:
    6,098
    S. Justin Gengo
    Mar 31, 2006
  2. Tarun

    Finding a substring in a binary string

    Tarun, Aug 23, 2005, in forum: C Programming
    Replies:
    6
    Views:
    550
    CBFalconer
    Aug 23, 2005
  3. =?iso-8859-1?q?Jean-Fran=E7ois_Michaud?=

    Help on table align on left of page vs left hanging indent

    =?iso-8859-1?q?Jean-Fran=E7ois_Michaud?=, Jul 10, 2007, in forum: XML
    Replies:
    2
    Views:
    977
    =?iso-8859-1?q?Jean-Fran=E7ois_Michaud?=
    Jul 16, 2007
  4. pc
    Replies:
    2
    Views:
    1,281
    crisgoogle
    Jun 8, 2011
  5. Replies:
    3
    Views:
    181
    Sherm Pendley
    Aug 3, 2005
Loading...

Share This Page