finding a substring from left back of a string

J

johndesp

I have string representation of a url such as

http://www.whatever.com/whatever/whenever/whoever.asp?why=25

in the most efficient manner I would like to obtain the substring

http://www.whatever.com/whatever/whenever/

I think I can do this using a combination of StringTokenizer and the
substring method of String. Is there an easier and/or more efficient
way?

I am looking for a method that I could pass the "/" as a character and
have it return back to me everything left back of the last instance of
"/".



Thanks
 
L

Lee Weiner

I have string representation of a url such as

http://www.whatever.com/whatever/whenever/whoever.asp?why=25

in the most efficient manner I would like to obtain the substring

http://www.whatever.com/whatever/whenever/

I think I can do this using a combination of StringTokenizer and the
substring method of String. Is there an easier and/or more efficient
way?

I am looking for a method that I could pass the "/" as a character and
have it return back to me everything left back of the last instance of
"/".

You need <String>.lastIndexOf.

String url = "http://www.whatever.com/whatever/whenever/whoever.asp?why=25";
int pos = url.lastIndexOf( '/' );
String newUrl = url.substring( 0, pos + 1 );

Lee Weiner
lee AT leeweiner DOT org
 
K

KC Wong

I have string representation of a url such as
http://www.whatever.com/whatever/whenever/whoever.asp?why=25

in the most efficient manner I would like to obtain the substring
http://www.whatever.com/whatever/whenever/

I think I can do this using a combination of StringTokenizer and the
substring method of String. Is there an easier and/or more efficient
way?

I am looking for a method that I could pass the "/" as a character and
have it return back to me everything left back of the last instance of
"/".

You should check the API docs for that. Browse the methods of class
java.lang.String and find one that does the job.

Alternatively, look at java.net.URL class. One of its method will make this
task very easy.
 
P

Paul Lutus

johndesp said:
I have string representation of a url such as

http://www.whatever.com/whatever/whenever/whoever.asp?why=25

in the most efficient manner I would like to obtain the substring

http://www.whatever.com/whatever/whenever/

I think I can do this using a combination of StringTokenizer and the
substring method of String. Is there an easier and/or more efficient
way?

Don't use StringTokenizer, it is a disaster area masquerading as a java
class.
I am looking for a method that I could pass the "/" as a character and
have it return back to me everything left back of the last instance of
"/".

Why not read up on the String class and select an appropriate way to (big
hint) find the last index of "/", then take the substring from the start to
that character?
 
Z

zoopy

I have string representation of a url such as

http://www.whatever.com/whatever/whenever/whoever.asp?why=25

in the most efficient manner I would like to obtain the substring

Define efficient...
http://www.whatever.com/whatever/whenever/

I think I can do this using a combination of StringTokenizer and the
substring method of String. Is there an easier and/or more efficient
way?

I am looking for a method that I could pass the "/" as a character and
have it return back to me everything left back of the last instance of
"/".



Thanks

If you mean by efficient 'the least programming to do by yourself', then the constructors of
java.net.URL provide what you want:

URL base = new URL("http://www.whatever.com/whatever/whenever/whoever.asp?why=25");
// -> http://www.whatever.com/whatever/whenever/whoever.asp?why=25

URL current = new URL(base, ".");
// -> http://www.whatever.com/whatever/whenever/

URL parent = new URL(base, "..");
// -> http://www.whatever.com/whatever/

URL root = new URL(base, "/");
// -> http://www.whatever.com/

URL here = new URL(base, "here.html");
// -> http://www.whatever.com/whatever/whenever/here.html

URL there = new URL(base, "/there.html");
// -> http://www.whatever.com/there.html

URL everywhere = new URL(base, "../everywhere.html");
// -> http://www.whatever.com/whatever/everywhere.html

[... and use URL.toString() to convert it back to a string]
 
J

John C. Bollinger

Paul said:
Don't use StringTokenizer, it is a disaster area masquerading as a java
class.

I think that's rather strong. StringTokenizer does a fine job on those
things it is documented to do. As with any class, people tend to have
problems with StringTokenizer when they expect it to do things
differently than it in fact does, which does not usually happen to
people who have read its documentation prior to using it. The most
common issue tends to be with the the way the class defines a token,
which excludes the possibility of empty tokens.

A stronger argument can be made for StreamTokenizer being problematic.
The same comments about reading documentation still apply, but
StreamTokenizer does exhibit some (documented) behaviors that make it
difficult to use in a variety of circumstances.

With all that said, let's be clear that StringTokenizer will
nevertheless not serve as the best basis for the task that the OP wants
to perform.


John Bollinger
(e-mail address removed)
 
P

Paul Lutus

John said:
I think that's rather strong.

Not really, especially if you have tried to use it in the kind of vanilla
parsing tasks for which it was originally intended.
StringTokenizer does a fine job on those
things it is documented to do.

Sadly, not true. The various defects are not clearly documented except in
newsgroups, where complaints about this class have the status of legend.
As with any class, people tend to have
problems with StringTokenizer when they expect it to do things
differently than it in fact does,

A malady most often brought on by reading the documentation.
which does not usually happen to
people who have read its documentation prior to using it.

No, this is not correct. The documentation doesn't accurately reflect the
behavior of the class.
The most
common issue tends to be with the the way the class defines a token,
which excludes the possibility of empty tokens.

And this is not clearly documented, and it is not expected, and it is
inexcusable. To see exactly how inexcusable, one need only write a method
to parse a string on specified tokens and produce consistent results. It
just isn't that difficult.

If the documentation were honestly written, it would warn people not to use
the class at all and advise that it is present in the language only because
applications have already been written using it.

It is one thing to deprecate a method in a class, it is quite another to
deprecate an entire class, which must be why this has not happened ... yet.

But this is sort of academic since regular expressions have been added to
Java. In all but the most speed-critical applications, that is now the
preferred approach. For speed-critical cases in which, for example, a
record needs to be parsed into fields, people are reduced to writing a
replacement for StringTokenizer in order that each record have the correct
number of fields, including empty ones.
With all that said, let's be clear that StringTokenizer will
nevertheless not serve as the best basis for the task that the OP wants
to perform.

Concur.
 
J

John C. Bollinger

Paul said:
John C. Bollinger wrote:




Not really, especially if you have tried to use it in the kind of vanilla
parsing tasks for which it was originally intended.

I use it all over the place for vanilla parsing tasks. I don't think
I've ever had a problem with it.
Sadly, not true. The various defects are not clearly documented except in
newsgroups, where complaints about this class have the status of legend.

I'm sure I haven't been participating here as long as you have, but in
my recollection (and my Google search) that just doesn't seem to be the
case. There is one notorious event in StringTokenizer history: the
change in behavior of StringTokenizer.nextToken(String) at some point in
the Java 1.3 series. That did generate more than one thread around that
time, at least one of them quite long, so perhaps that particular
complaint is legendary. The issue of null tokens certainly has the
status of a FAQ; if that's what you mean then I already stipulated so.
No, this is not correct. The documentation doesn't accurately reflect the
behavior of the class.

I'm sorry, but I guess I'm too dense or blind. In what way is the
documentation inaccurate?
And this is not clearly documented, and it is not expected, and it is
inexcusable. To see exactly how inexcusable, one need only write a method
to parse a string on specified tokens and produce consistent results. It
just isn't that difficult.

OK, I'll give you that the fact that the class docs don't make it clear
that delimiters are formed of sequences of delimiter characters, not
strictly by individual delimiter characters. As for whether or not
that's expected, I'd say it must depend heavily on the person whose
expectations are in question. I certainly wouldn't call the behavior
"inexcusable", however, as frequently it is exactly the behavior I want,
and I'm sure I'm not such an odd bird as to be the only one who ever
wants it.
If the documentation were honestly written, it would warn people not to use
the class at all and advise that it is present in the language only because
applications have already been written using it.

"Honest"? I don't see where honesty comes into it. But as a matter of
fact: "StringTokenizer is a legacy class that is retained for
compatibility reasons although its use is discouraged in new code. It is
recommended that anyone seeking this functionality use the split method
of String or the java.util.regex package instead."



Wherefrom comes such animosity, anyway?


John Bollinger
(e-mail address removed)
 
P

Paul Lutus

John said:
I use it all over the place for vanilla parsing tasks. I don't think
I've ever had a problem with it.

To see the real perverse behavior of this class, the basis for its
notoriety, try parsing database records or comma- or tab-separated records
that have occasional empty fields. Students typically create a record
parser using StringTokenizer and only much later see behavior they cannot
readily explain.
 
A

Alan Moore

To see the real perverse behavior of this class, the basis for its
notoriety, try parsing database records or comma- or tab-separated records
that have occasional empty fields. Students typically create a record
parser using StringTokenizer and only much later see behavior they cannot
readily explain.

The biggest problem with StringTokenizer is that people expect to be
able to do certain thing with it, only to learn either that they can't
do what they want (i.e., use multi-character delimiters), or that it's
a lot harder than it should be (i.e., parse colon-delimited data,
allowing for empty fields). Of course, this will be true to some
extent for any class, no matter how well-designed its API is, but
StringTokenizer's behavior is particularly perverse, and its
documentation does nothing to offset that.

The split() method is supposed to be StringTokenizer's replacement,
but it's no easier for newbies to grok. If you're already familiar
with regexes and the split function from other languages, you're fine;
otherwise, you might as well be standing at the bottom of a sheer
cliff, looking up. And when it comes to parsing CSV data, split() is
just as tantalizingly useless ia StringTokenizer.

I wouldn't have phrased it as strongly as Paul did, but I agree that
StringTokenizer should never have been included in the JDK; it's like
a sore that never heals.
 
P

Paul Lutus

Alan said:
The biggest problem with StringTokenizer is that people expect to be
able to do certain thing with it, only to learn either that they can't
do what they want (i.e., use multi-character delimiters), or that it's
a lot harder than it should be (i.e., parse colon-delimited data,
allowing for empty fields). Of course, this will be true to some
extent for any class, no matter how well-designed its API is,

Actually, it is very easy to create a tokenizer that accepts multi-character
tokens and always produces the right number of fields, but such a method is
not particularly fast compared to one that only accepts single character
tokens;

String[] split(String data,String token)
{
Vector v = new Vector();
int a = 0,b;
int tlen = token.length();
while((b = data.indexOf(token,a)) != -1) {
v.add(data.substring(a,b));
a = b + tlen;
}
v.add(data.substring(a));
return (String[]) v.toArray(new String[v.size()]);
}
but
StringTokenizer's behavior is particularly perverse, and its
documentation does nothing to offset that.

The split() method is supposed to be StringTokenizer's replacement,
but it's no easier for newbies to grok. If you're already familiar
with regexes and the split function from other languages, you're fine;
otherwise, you might as well be standing at the bottom of a sheer
cliff, looking up. And when it comes to parsing CSV data, split() is
just as tantalizingly useless ia StringTokenizer.

Yes, ironically enough, which is why I find myself applying the above method
with great regularity. As I said, it is slower that a carefully designed
method that accepts only one-character tokens, but it competes well with
the regex methods.
I wouldn't have phrased it as strongly as Paul did, but I agree that
StringTokenizer should never have been included in the JDK; it's like
a sore that never heals.

I don't think the original programmers understood what StringTokenizer
actually needed to be able to do.
 
J

John C. Bollinger

Paul said:
To see the real perverse behavior of this class, the basis for its
notoriety, try parsing database records or comma- or tab-separated records
that have occasional empty fields. Students typically create a record
parser using StringTokenizer and only much later see behavior they cannot
readily explain.

I already know that that doesn't work -- we have discussed the fact in
this thread. I don't consider the behavior "perverse" in any way,
however, on which point I suppose we'll just have to disagree. The
simple fact that StringTokenizer's behavior is often exactly what I want
is all the basis I need for my dissent. I don't see how StringTokenizer
being the wrong class for some purposes makes its behavior perverse.

I'm sure you're quite right that students sometimes stumble over the
behavior. On the other hand, when they do there is an opportunity to
teach them something about reading specifications (what do the docs
actually say, and what did you read into them that isn't really there?)
and about testing. It sounds like that won't sway you, though. It
wouldn't sway me either if it were the only justification.


John Bollinger
(e-mail address removed)
 
P

Paul Lutus

John said:
I already know that that doesn't work -- we have discussed the fact in
this thread. I don't consider the behavior "perverse" in any way,
however, on which point I suppose we'll just have to disagree. The
simple fact that StringTokenizer's behavior is often exactly what I want
is all the basis I need for my dissent. I don't see how StringTokenizer
being the wrong class for some purposes makes its behavior perverse.

It is a question of the most common use of this class, and its apparent
suitability for this particular, very common, task.
I'm sure you're quite right that students sometimes stumble over the
behavior.

In particuilar because the example given in the StringTokenizer
documentation strongly hints at its primary purpose, and no mention is made
of its primary flaw.
On the other hand, when they do there is an opportunity to
teach them something about reading specifications (what do the docs
actually say, and what did you read into them that isn't really there?)

I just read the entire document for StringTokenizer amnd it very simply does
not say that the wrong number of tokens will be returned if there are empty
fields. The problem lies with the class and its documentation, the user in
this case is quite blameless.
and about testing.

Yes, unfortunately it is not that common for someone to exhaustively test a
class' correspondence with its published documentation. That raises
cynicism to an art form.
It sounds like that won't sway you, though.

It really won't, especially now that I have read the documentation once
again and noted the absence of mention of this serious shortcoming.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,012
Latest member
RoxanneDzm

Latest Threads

Top