StringTokenizer troubles

G

gajo

Hi!
I'm interested if there is a simple way of doing the following: I have a
structure of data, let's say String A, String B and String C. The user can
fill out all three with some data, but he doesn't have to, so in that case
that string will remain empty (""). Now after the user clicks on "Save", the
data is saved to a file, delimited by a string of signs, let's say *#$. So,
a usual line would look like this:
tree*#$mountain*#$apple A=tree, B=mountain, C=apple
lion*#$*#$monkey A=lion, B="", C=monkey
*#$*#$*#$ A = B = C = ""

Then I try to read this back from the file, and using the StringTokenizer I
split a line into tokens. The trouble is that the StringTokenizer doesn't
read the empty strings, but instead jumps over them, so for example where I
had lion*#$*#$monkey A=lion, B="", C=monkey, after reading it back I will
get A=lion, B=monkey, C=<an exception will occur, no more tokens>.
Is there a way for me to do it in a simple way that the StringTokenizer does
read the empty strings? I could of course split the string with
StringTokenizer(someString, true), but then I have to check if a token is *,
# or $, or something else, how many lines do I have to read more, is it the
end of the line etc. or in other words I wouldn't have to use the
StringTokenizer at all!

Gajo
 
S

Steve Horsley

gajo said:
Hi!
I'm interested if there is a simple way of doing the following: I have a
structure of data, let's say String A, String B and String C. The user can
fill out all three with some data, but he doesn't have to, so in that case
that string will remain empty (""). Now after the user clicks on "Save", the
data is saved to a file, delimited by a string of signs, let's say *#$. So,
a usual line would look like this:
tree*#$mountain*#$apple A=tree, B=mountain, C=apple
lion*#$*#$monkey A=lion, B="", C=monkey
*#$*#$*#$ A = B = C = ""

Then I try to read this back from the file, and using the StringTokenizer I
split a line into tokens. The trouble is that the StringTokenizer doesn't
read the empty strings, but instead jumps over them, so for example where I
had lion*#$*#$monkey A=lion, B="", C=monkey, after reading it back I will
get A=lion, B=monkey, C=<an exception will occur, no more tokens>.
Is there a way for me to do it in a simple way that the StringTokenizer does
read the empty strings? I could of course split the string with
StringTokenizer(someString, true), but then I have to check if a token is *,
# or $, or something else, how many lines do I have to read more, is it the
end of the line etc. or in other words I wouldn't have to use the
StringTokenizer at all!

Gajo
You are right - string tokenizer omits blank fields (unless you tell it to
also return the separators). Another thing - the separator that you give
to the constructor (StringTokenizer(myString, "*#$")) provides a LIST of
ALTERNATIVE separator CHARACTERS, not a multiple-character separator String.
So "*#$" says any one of those three characters could be a separator.

So StringTokenizer won't do what you want.
Either roll your own string splitter (not hard) or use the regular
expression package that was introduced with (IIRC) java 1.3.

Steve
 
O

Oscar kind

gajo said:
I'm interested if there is a simple way of doing the following: I have a
structure of data, let's say String A, String B and String C. The user can
fill out all three with some data, but he doesn't have to, so in that case
that string will remain empty (""). Now after the user clicks on "Save", the
data is saved to a file, delimited by a string of signs, let's say *#$. So,
a usual line would look like this:
tree*#$mountain*#$apple A=tree, B=mountain, C=apple
lion*#$*#$monkey A=lion, B="", C=monkey
*#$*#$*#$ A = B = C = ""

You'll run into problems with StringTokenizer, because it only uses single
characters as delimiters. See the API docs for more details.

A better way to write the strings is in CSV format. See:
http://www.csc.liv.ac.uk/~sphelps/jasa/

Specifically, check the API for these classes:
uk.ac.liv.util.io.CSVReader
uk.ac.liv.util.io.CSVWriter


mvg,
Oscar
 
V

V S Rawat

gajo said:
Hi! I'm interested if there is a simple way of doing the
following: I have a structure of data, let's say String
A, String B and String C. The user can fill out all three
with some data, but he doesn't have to, so in that case
that string will remain empty (""). Now after the user
clicks on "Save", the data is saved to a file, delimited
by a string of signs, let's say *#$. So, a usual line
would look like this: tree*#$mountain*#$apple A=tree,
B=mountain, C=apple lion*#$*#$monkey A=lion, B="",
C=monkey *#$*#$*#$ A = B = C = ""

Then I try to read this back from the file, and using the
StringTokenizer I split a line into tokens. The trouble
is that the StringTokenizer doesn't read the empty
strings, but instead jumps over them, so for example
where I had lion*#$*#$monkey A=lion, B="", C=monkey,
after reading it back I will get A=lion, B=monkey, C=<an
exception will occur, no more tokens>. Is there a way for
me to do it in a simple way that the StringTokenizer does
read the empty strings? I could of course split the
string with StringTokenizer(someString, true), but then I
have to check if a token is *, # or $, or something else,
how many lines do I have to read more, is it the end of
the line etc. or in other words I wouldn't have to use
the StringTokenizer at all!

hope it helps.
-Rawat

import java.util.*;
class Token3 {
public static void main(String args[]) {
String[] input = { "*#$tree*#$mountain*#$apple*#$",
"*#$lion*#$*#$monkey*#$" };
String[][] the_tokens = new String[2][3];
String token = "*#$";
int len;
for (int j = 0; j < 2; j++) {
StringTokenizer t = new
StringTokenizer(input[j],token);
int number = t.countTokens();

len = 0;
for (int i = 0; i < 3; i++) {
if
(input[j].substring(len+3,len+6).equals(token)) {
the_tokens[j] = "";
}
else {
the_tokens[j] = t.nextToken();
}
len = len + the_tokens[j].length() + 3;
System.out.println(j + "," + i + ": " +
the_tokens[j]);
}
}
} // main
}
 
G

gajo

Hey, thanks for the effort, unfortunately I took Steve's advice a few hours
ago and wrote my own parser, which works fine :)
 
G

gajo

Steve Horsley said:
So StringTokenizer won't do what you want.
Either roll your own string splitter (not hard) or use the regular
expression package that was introduced with (IIRC) java 1.3.

Steve

I know it's not hard, but I was wondering if there was a built-in function.
Anyway, I wrote my own class after.
What is this regular expression package you are talking about?
 
V

V S Rawat

gajo said:
Hey, thanks for the effort, unfortunately I took Steve's advice a few hours
ago and wrote my own parser, which works fine :)

not fair.
when you post a query you should have given some time for
people to respond.

never mind. just kidding. :)

why not post that parser how you tackled it.
-Rawat
 
G

gajo

V S Rawat said:
why not post that parser how you tackled it.
-Rawat

I think I should throw out the exception throwing part in the constructor.
It gave me a lot of troubles because I had to put the whole code in a try
block, and my program ignored all errors and I had a hard time to debug it.
This is the code. You create a parser with new MyParser(String,delimiter),
where delimiter can be any string.

public class MyParser { // my forgery of StringTokenizer
private String[] parts;
private int numParts = 0;
private int indeks = 0;

public MyParser(String s, String delimiter) throws Exception {
if (s == null) {
throw new Exception("String is empty!");
} else {
int nbr = 0;
int from = 0;
int rez = 0;
while (rez != -1) {
rez = s.indexOf(delimiter,from);
if (rez == -1) {
nbr++;
break;
}
nbr++;
from = rez+1;
}
numParts = nbr;
parts = new String[nbr];
rez = 0;
from = 0;
nbr = 0;
while (rez != -1) {
int tmp=-1;
if (rez == 0) {
tmp = rez;
} else {
tmp = rez + delimiter.length();
}
rez = s.indexOf(delimiter,from);
if (rez == -1) {
parts[nbr] = s.substring(tmp,s.length());
break;
}
parts[nbr] = s.substring(tmp,rez);
nbr++;
from = rez+1;
}
}
}

public String next() {
try {
indeks++;
return parts[indeks-1];
} catch (Exception e) {
indeks--;
return parts[indeks];
}
}
public String first() {
indeks = 0;
return parts[0];
}
public String prev() {
try {
indeks--;
return parts[indeks+1];
} catch (Exception e) {
indeks++;
return parts[indeks];
}
}
public String last() {
indeks = parts.length-1;
return parts[indeks];
}

public int numParts() {
return numParts;
}

public boolean hasMore() {
return indeks != parts.length;
}
} // end of class
 
A

Adam

gajo said:
I know it's not hard, but I was wondering if there was a built-in function.
Anyway, I wrote my own class after.
What is this regular expression package you are talking about?

@since java 1.4
java.utils.regex

But for your needs it would be enough to look at String.split(String
regex) method.

You could have a single line of code instead of the parser you had to
write,
same happened to me :)

Adam
 
G

Guest

gajo said:
why not post that parser how you tackled it.
-Rawat


I think I should throw out the exception throwing part in the constructor.
It gave me a lot of troubles because I had to put the whole code in a try
block, and my program ignored all errors and I had a hard time to debug it.
This is the code. You create a parser with new MyParser(String,delimiter),
where delimiter can be any string.

public class MyParser { // my forgery of StringTokenizer
private String[] parts;
private int numParts = 0;
private int indeks = 0;

public MyParser(String s, String delimiter) throws Exception {
if (s == null) {
throw new Exception("String is empty!");
} else {
int nbr = 0;
int from = 0;
int rez = 0;
while (rez != -1) {
rez = s.indexOf(delimiter,from);
if (rez == -1) {
nbr++;
break;
}
nbr++;
from = rez+1;
}
numParts = nbr;
parts = new String[nbr];
rez = 0;
from = 0;
nbr = 0;
while (rez != -1) {
int tmp=-1;
if (rez == 0) {
tmp = rez;
} else {
tmp = rez + delimiter.length();
}
rez = s.indexOf(delimiter,from);
if (rez == -1) {
parts[nbr] = s.substring(tmp,s.length());
break;
}
parts[nbr] = s.substring(tmp,rez);
nbr++;
from = rez+1;
}
}
}

public String next() {
try {
indeks++;
return parts[indeks-1];
} catch (Exception e) {
indeks--;
return parts[indeks];
}
}
public String first() {
indeks = 0;
return parts[0];
}
public String prev() {
try {
indeks--;
return parts[indeks+1];
} catch (Exception e) {
indeks++;
return parts[indeks];
}
}
public String last() {
indeks = parts.length-1;
return parts[indeks];
}

public int numParts() {
return numParts;
}

public boolean hasMore() {
return indeks != parts.length;
}
} // end of class

The following test method (using String.split)
is much simpler than your ad-hoc parser...

import java.util.regex.*;
class TokReg {
public static void test(String s) {
String [] result = s.split("\\*#\\$", -1);
System.out.print("s=\""+s+"\" => ");
for(int i=0; i<result.length; i++)
System.out.print(" \""+result+"\"");
System.out.println();
}
public static void main(String[]args) {
test("tree*#$mountain*#$apple");
test("tree*#$*#$apple");
test("*#$*#$apple");
test("tree*#$*#$");
test("*#$*#$");
}
}

Excuting it:

s="tree*#$mountain*#$apple" => "tree" "mountain" "apple"
s="tree*#$*#$apple" => "tree" "" "apple"
s="*#$*#$apple" => "" "" "apple"
s="tree*#$*#$" => "tree" "" ""
s="*#$*#$" => "" "" ""

- Dario
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top