StringTokenizer troubles

gajo · Jun 23, 2004

Hi!
I'm interested if there is a simple way of doing the following: I have a
structure of data, let's say String A, String B and String C. The user can
fill out all three with some data, but he doesn't have to, so in that case
that string will remain empty (""). Now after the user clicks on "Save", the
data is saved to a file, delimited by a string of signs, let's say *#$. So,
a usual line would look like this:
tree*#$mountain*#$apple A=tree, B=mountain, C=apple
lion*#$*#$monkey A=lion, B="", C=monkey
*#$*#$*#$ A = B = C = ""

Then I try to read this back from the file, and using the StringTokenizer I
split a line into tokens. The trouble is that the StringTokenizer doesn't
read the empty strings, but instead jumps over them, so for example where I
had lion*#$*#$monkey A=lion, B="", C=monkey, after reading it back I will
get A=lion, B=monkey, C=<an exception will occur, no more tokens>.
Is there a way for me to do it in a simple way that the StringTokenizer does
read the empty strings? I could of course split the string with
StringTokenizer(someString, true), but then I have to check if a token is *,
# or $, or something else, how many lines do I have to read more, is it the
end of the line etc. or in other words I wouldn't have to use the
StringTokenizer at all!

Gajo

Steve Horsley · Jun 23, 2004

gajo said:
Hi!
I'm interested if there is a simple way of doing the following: I have a
structure of data, let's say String A, String B and String C. The user can
fill out all three with some data, but he doesn't have to, so in that case
that string will remain empty (""). Now after the user clicks on "Save", the
data is saved to a file, delimited by a string of signs, let's say *#$. So,
a usual line would look like this:
tree*#$mountain*#$apple A=tree, B=mountain, C=apple
lion*#$*#$monkey A=lion, B="", C=monkey
*#$*#$*#$ A = B = C = ""

Then I try to read this back from the file, and using the StringTokenizer I
split a line into tokens. The trouble is that the StringTokenizer doesn't
read the empty strings, but instead jumps over them, so for example where I
had lion*#$*#$monkey A=lion, B="", C=monkey, after reading it back I will
get A=lion, B=monkey, C=<an exception will occur, no more tokens>.
Is there a way for me to do it in a simple way that the StringTokenizer does
read the empty strings? I could of course split the string with
StringTokenizer(someString, true), but then I have to check if a token is *,
# or $, or something else, how many lines do I have to read more, is it the
end of the line etc. or in other words I wouldn't have to use the
StringTokenizer at all!

Gajo

You are right - string tokenizer omits blank fields (unless you tell it to
also return the separators). Another thing - the separator that you give
to the constructor (StringTokenizer(myString, "*#$")) provides a LIST of
ALTERNATIVE separator CHARACTERS, not a multiple-character separator String.
So "*#$" says any one of those three characters could be a separator.

So StringTokenizer won't do what you want.
Either roll your own string splitter (not hard) or use the regular
expression package that was introduced with (IIRC) java 1.3.

Steve

Oscar kind · Jun 23, 2004

gajo said:
I'm interested if there is a simple way of doing the following: I have a
structure of data, let's say String A, String B and String C. The user can
fill out all three with some data, but he doesn't have to, so in that case
that string will remain empty (""). Now after the user clicks on "Save", the
data is saved to a file, delimited by a string of signs, let's say *#$. So,
a usual line would look like this:
tree*#$mountain*#$apple A=tree, B=mountain, C=apple
lion*#$*#$monkey A=lion, B="", C=monkey
*#$*#$*#$ A = B = C = ""

You'll run into problems with StringTokenizer, because it only uses single
characters as delimiters. See the API docs for more details.

A better way to write the strings is in CSV format. See:
http://www.csc.liv.ac.uk/~sphelps/jasa/

Specifically, check the API for these classes:
uk.ac.liv.util.io.CSVReader
uk.ac.liv.util.io.CSVWriter

mvg,
Oscar

V S Rawat · Jun 23, 2004

gajo said:
Hi! I'm interested if there is a simple way of doing the
following: I have a structure of data, let's say String
A, String B and String C. The user can fill out all three
with some data, but he doesn't have to, so in that case
that string will remain empty (""). Now after the user
clicks on "Save", the data is saved to a file, delimited
by a string of signs, let's say *#$. So, a usual line
would look like this: tree*#$mountain*#$apple A=tree,
B=mountain, C=apple lion*#$*#$monkey A=lion, B="",
C=monkey *#$*#$*#$ A = B = C = ""

Then I try to read this back from the file, and using the
StringTokenizer I split a line into tokens. The trouble
is that the StringTokenizer doesn't read the empty
strings, but instead jumps over them, so for example
where I had lion*#$*#$monkey A=lion, B="", C=monkey,
after reading it back I will get A=lion, B=monkey, C=<an
exception will occur, no more tokens>. Is there a way for
me to do it in a simple way that the StringTokenizer does
read the empty strings? I could of course split the
string with StringTokenizer(someString, true), but then I
have to check if a token is *, # or $, or something else,
how many lines do I have to read more, is it the end of
the line etc. or in other words I wouldn't have to use
the StringTokenizer at all!

hope it helps.
-Rawat

import java.util.*;
class Token3 {
public static void main(String args[]) {
String[] input = { "*#$tree*#$mountain*#$apple*#$",
"*#$lion*#$*#$monkey*#$" };
String[][] the_tokens = new String[2][3];
String token = "*#$";
int len;
for (int j = 0; j < 2; j++) {
StringTokenizer t = new
StringTokenizer(input[j],token);
int number = t.countTokens();

len = 0;
for (int i = 0; i < 3; i++) {
if
(input[j].substring(len+3,len+6).equals(token)) {
the_tokens[j] = "";
}
else {
the_tokens[j] = t.nextToken();
}
len = len + the_tokens[j].length() + 3;
System.out.println(j + "," + i + ": " +
the_tokens[j]);
}
}
} // main
}

gajo · Jun 23, 2004

Hey, thanks for the effort, unfortunately I took Steve's advice a few hours
ago and wrote my own parser, which works fine

gajo · Jun 23, 2004

Steve Horsley said:
So StringTokenizer won't do what you want.
Either roll your own string splitter (not hard) or use the regular
expression package that was introduced with (IIRC) java 1.3.

Steve

I know it's not hard, but I was wondering if there was a built-in function.
Anyway, I wrote my own class after.
What is this regular expression package you are talking about?

V S Rawat · Jun 23, 2004

gajo said:
Hey, thanks for the effort, unfortunately I took Steve's advice a few hours
ago and wrote my own parser, which works fine

not fair.
when you post a query you should have given some time for
people to respond.

never mind. just kidding.

why not post that parser how you tackled it.
-Rawat

gajo · Jun 24, 2004

V S Rawat said:
why not post that parser how you tackled it.
-Rawat

I think I should throw out the exception throwing part in the constructor.
It gave me a lot of troubles because I had to put the whole code in a try
block, and my program ignored all errors and I had a hard time to debug it.
This is the code. You create a parser with new MyParser(String,delimiter),
where delimiter can be any string.

public class MyParser { // my forgery of StringTokenizer
private String[] parts;
private int numParts = 0;
private int indeks = 0;

public MyParser(String s, String delimiter) throws Exception {
if (s == null) {
throw new Exception("String is empty!");
} else {
int nbr = 0;
int from = 0;
int rez = 0;
while (rez != -1) {
rez = s.indexOf(delimiter,from);
if (rez == -1) {
nbr++;
break;
}
nbr++;
from = rez+1;
}
numParts = nbr;
parts = new String[nbr];
rez = 0;
from = 0;
nbr = 0;
while (rez != -1) {
int tmp=-1;
if (rez == 0) {
tmp = rez;
} else {
tmp = rez + delimiter.length();
}
rez = s.indexOf(delimiter,from);
if (rez == -1) {
parts[nbr] = s.substring(tmp,s.length());
break;
}
parts[nbr] = s.substring(tmp,rez);
nbr++;
from = rez+1;
}
}
}

public String next() {
try {
indeks++;
return parts[indeks-1];
} catch (Exception e) {
indeks--;
return parts[indeks];
}
}
public String first() {
indeks = 0;
return parts[0];
}
public String prev() {
try {
indeks--;
return parts[indeks+1];
} catch (Exception e) {
indeks++;
return parts[indeks];
}
}
public String last() {
indeks = parts.length-1;
return parts[indeks];
}

public int numParts() {
return numParts;
}

public boolean hasMore() {
return indeks != parts.length;
}
} // end of class

Adam · Jun 24, 2004

gajo said:
I know it's not hard, but I was wondering if there was a built-in function.
Anyway, I wrote my own class after.
What is this regular expression package you are talking about?

@since java 1.4
java.utils.regex

But for your needs it would be enough to look at String.split(String
regex) method.

You could have a single line of code instead of the parser you had to
write,
same happened to me

Adam

Guest · Jun 24, 2004

gajo said:
why not post that parser how you tackled it.
-Rawat

Click to expand...

I think I should throw out the exception throwing part in the constructor.
It gave me a lot of troubles because I had to put the whole code in a try
block, and my program ignored all errors and I had a hard time to debug it.
This is the code. You create a parser with new MyParser(String,delimiter),
where delimiter can be any string.

public class MyParser { // my forgery of StringTokenizer
private String[] parts;
private int numParts = 0;
private int indeks = 0;

public MyParser(String s, String delimiter) throws Exception {
if (s == null) {
throw new Exception("String is empty!");
} else {
int nbr = 0;
int from = 0;
int rez = 0;
while (rez != -1) {
rez = s.indexOf(delimiter,from);
if (rez == -1) {
nbr++;
break;
}
nbr++;
from = rez+1;
}
numParts = nbr;
parts = new String[nbr];
rez = 0;
from = 0;
nbr = 0;
while (rez != -1) {
int tmp=-1;
if (rez == 0) {
tmp = rez;
} else {
tmp = rez + delimiter.length();
}
rez = s.indexOf(delimiter,from);
if (rez == -1) {
parts[nbr] = s.substring(tmp,s.length());
break;
}
parts[nbr] = s.substring(tmp,rez);
nbr++;
from = rez+1;
}
}
}

public String next() {
try {
indeks++;
return parts[indeks-1];
} catch (Exception e) {
indeks--;
return parts[indeks];
}
}
public String first() {
indeks = 0;
return parts[0];
}
public String prev() {
try {
indeks--;
return parts[indeks+1];
} catch (Exception e) {
indeks++;
return parts[indeks];
}
}
public String last() {
indeks = parts.length-1;
return parts[indeks];
}

public int numParts() {
return numParts;
}

public boolean hasMore() {
return indeks != parts.length;
}
} // end of class

The following test method (using String.split)
is much simpler than your ad-hoc parser...

import java.util.regex.*;
class TokReg {
public static void test(String s) {
String [] result = s.split("\\*#\\$", -1);
System.out.print("s=\""+s+"\" => ");
for(int i=0; i<result.length; i++)
System.out.print(" \""+result+"\"");
System.out.println();
}
public static void main(String[]args) {
test("tree*#$mountain*#$apple");
test("tree*#$*#$apple");
test("*#$*#$apple");
test("tree*#$*#$");
test("*#$*#$");
}
}

Excuting it:

s="tree*#$mountain*#$apple" => "tree" "mountain" "apple"
s="tree*#$*#$apple" => "tree" "" "apple"
s="*#$*#$apple" => "" "" "apple"
s="tree*#$*#$" => "tree" "" ""
s="*#$*#$" => "" "" ""

- Dario

StringTokenizer() with "\" character	3	Aug 30, 2008
Stringtokenizer	0	Feb 16, 2008
need help with StringTokenizer	2	May 6, 2007
StringTokenizer	30	Jul 27, 2004
Chatbot	0	Oct 8, 2024
StringTokenizer functionality in split	3	Jun 4, 2005
Help with java StringTokenizer	0	Dec 17, 2008
Strange parsing problem	6	Oct 29, 2008

StringTokenizer troubles

gajo

Steve Horsley

Oscar kind

V S Rawat

gajo

gajo

V S Rawat

gajo

Adam

Guest

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads