StringTokenizer functionality in split

  • Thread starter El Guerrero del Interfaz
  • Start date
E

El Guerrero del Interfaz

Hi everybody,


First of all, sorry if that's a stupid question but I'm a newbie
with Java.

I'm writing a parser that needs the functionality of the
"StringTokenizer" method. That is, I want a method that not only
returns the tokens but also the delimiters found. But in the Java doc
I've seen that "StringTokenizer" is being phased out and that "split"
should be used instead. The problem is that in "split" I did not find
the possibility to return the delimiters together with the tokens as
"StringTokenizer" does. Is that so or am I just too dumb to find that
option? And if it is, is there a workaround or do I have to use the
old "StringTokenizer" to do what I want to do?


Thanx and bye.
 
A

Aaron Isotton

El said:
I'm writing a parser that needs the functionality of the
"StringTokenizer" method. That is, I want a method that not only
returns the tokens but also the delimiters found. But in the Java doc
I've seen that "StringTokenizer" is being phased out and that "split"
should be used instead. The problem is that in "split" I did not find
the possibility to return the delimiters together with the tokens as
"StringTokenizer" does.

You can't do that using String.split. Since you want to use
StringTokenizer I suppose that your delimiters are simple (fixed
strings, not regular expressions). Thus you could use String.indexOf and
String.substring. It's not so elegant, but it works and is probably
faster than String.split or some other regex-based approach.

Greetings,
Aaron
 
D

Dale King

El said:
I'm writing a parser that needs the functionality of the
"StringTokenizer" method. That is, I want a method that not only
returns the tokens but also the delimiters found. But in the Java doc
I've seen that "StringTokenizer" is being phased out and that "split"
should be used instead. The problem is that in "split" I did not find
the possibility to return the delimiters together with the tokens as
"StringTokenizer" does. Is that so or am I just too dumb to find that
option? And if it is, is there a workaround or do I have to use the
old "StringTokenizer" to do what I want to do?

And its a good thing that they are finally phasing it out. They
basically rendered it useless since they changed the behavior between
different versions of Java. Therefore you could not count on it at all.

For your case you should look at the regular expression support in Pattern.
 
A

Alan Moore

First of all, sorry if that's a stupid question but I'm a newbie
with Java.

I'm writing a parser that needs the functionality of the
"StringTokenizer" method. That is, I want a method that not only
returns the tokens but also the delimiters found. But in the Java doc
I've seen that "StringTokenizer" is being phased out and that "split"
should be used instead. The problem is that in "split" I did not find
the possibility to return the delimiters together with the tokens as
"StringTokenizer" does. Is that so or am I just too dumb to find that
option? And if it is, is there a workaround or do I have to use the
old "StringTokenizer" to do what I want to do?

Java's split feature, like its regex flavor, is based on Perl's, but
you've stumbled on a major difference. In Perl, you could just wrap
the split extression in parentheses, and the delimiters would be
returned along with the tokens, but that doesn't work in Java. Java's
split doesn't provide a way to return the delimiters, but you can fake
it in many circumstances by using the regex lookahead and lookbehind
features. Here's an example:

public class Test
{
public static void main(String[] args)
{
String whole = "boo:and:foo";

// match any position that is followed or preceded by ':'
String[] parts = whole.split("(?=:)|(?<=:)");

for (int i = 0; i < parts.length; i++)
{
System.out.println(parts);
}
}
}

The trick is that the regex doesn't match any characters, it matches a
position where either the next character or the previous one is a
colon. If you want to treat multiple consecutive delimiter characters
as a single delimiter, as StringTokenizer does, you would normally
just add a plus sign to the regex (i.e., split(":+")). But to capture
the delimiters, you have to get even trickier:

String whole = "boo::and:foo";

// match any position that is
// (1) followed by ':' but not preceded by one, or
// (2) preceded by ':' but not followed by one
String[] parts = whole.split("(?=:)(?<!:)|(?<=:)(?!:)");

If you're using multiple, single-character delimiters, just plug in a
suitable character class. For example, if you want to split on either
colons or semicolons, you would do this:

String whole = "boo;and:foo";

// match any position that is followed or preceded by ':' or ';'
String[] parts = whole.split("(?=[:;])|(?<=[:;])");

The trick also works with delimiters consisting of more than one
character, like "foo" or "<br>", and even with somewhat indeterminate
regexes like "<br/?>" (i.e., a BR tag with or without the XML-style
trailing slash). But it won't work with a regex like "<.*?>", where
it's impossible to determine the maximum possible length of the
delimiter (lookbehinds require that).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,044
Latest member
RonaldNen

Latest Threads

Top