How to split String with MS-JVM ?

O

O.L.

Hello,

I'm writing an applet wich have to split a big string (100 KB) into a
lot of small pieces (3-4 B each).
These parts are separated by a single space " ".

But MS-JVM doesn't support the String().split() method, so I wrote my
own, but it's very very slow on big strings.

Does someone has an idea to do that quickly ?

Thank you, and excuse my bad english,
Olivier

Here is my code :
 
O

O.L.

Here is my code :

public static String[] explode(String sep, String str) {
int nbr = substr_count(sep, str)+1;
String[] result = new String[nbr];
if(nbr==1) {
result[0] = str;
return result;
}
int lastPos = 0;
int len = sep.length();
String extract = "";
for(int i=0; i<nbr; i++) {
int pos = str.indexOf(sep, lastPos+(i==0?0:len));
if(i==nbr-1) pos = str.length();
try {extract = str.substring(lastPos+(i==0?0:len), pos);}
catch(Exception e) {
System.exit(0);
}
result = extract;
lastPos = pos;
}
return result;
}
 
O

O.L.

Roedy Green avait soumis l'idée :
that is 10 year old technology over a century in computer years.

Bury it!

But a lot of "basic" computers have MS-JVM installed with WinXP ... :-s
So I'd like to make my applet compatible, if it's possible.
 
V

vj

Hello Oliver,

Well I have few suggestions for you that *might* make your code work
faster. By looking at the statement below
int nbr = substr_count(sep, str)+1;
String[] result = new String[nbr];
i conclude that you are counting number of space/separator chars in
your string to asses the length of array to be allocated. Then you
start extracting the individual tokens from the string by passing over
the string in a single pass. The problem here is that in all you are
passing over the entire string two times. So if we can reduce it to one
pass then you code may be Twice as fast!!.

I suggest that you use StreamTokenizer class. This will greatly reduce
the complecity/length of you code and moreover will make it fast as its
method may we writtern as native methods. I am giving you below a
sample code:

import java.io.*;
import java.util.*;

public class Main2 {
public static Vector explode(String str) throws IOException
{ StringReader sr=new StringReader(str);
StreamTokenizer stz=new StreamTokenizer(sr);
//Vector to hold tokens. Cannot use other containers as
//MS-JVM is 1.1 based.
Vector vct=new Vector(str.length()/4);
//intial capacity of vector aproximated to number of chars
// to improve speed. However it might waste memory space

stz.resetSyntax();
stz.wordChars(0,0xff);
stz.whitespaceChars(' ',' '); // add your seperator charecters
here
// i am assuming space as the only separator

while(stz.nextToken()!=stz.TT_EOF) // this loop iterators through
the string
vct.add(stz.sval); // returning only a parsed tokens
//ps: if memory usage is to be minimized then call
vct.trimToSize() here
// though it might take some time
return vct; // return the vector from the method
}

public static void main(String[] args) throws Exception
{ Vector v= explode(" \"Hello There\" ,how are you hi there");

java.util.Enumeration eu= v.elements();//request the enumeration
//MS-JVM provides no iterators but only enumeratios hence you are
stuck
while(eu.hasMoreElements()) //iterate
System.out.println((String)eu.nextElement()); //print tokens
}
}
/*************************Code Ends******************************/
/**************************Output******************************/
"Hello
There"
,how
are
you
hi
there
/**************************Output Ends******************************/

In the above code i have used a Vector instead on array because we
don't know how much words are there in the string. This is done to
ensure that we pass over the entire string only once. however we do
know that each word in approx 3-4 charecters wide(as you said). Thats
why we have provided the vector with an approx required capacity to
improve allocation speed. Rest of the code is self explainatory.

I hope that it will work faster than you code. However i have not
tested it yet. Please let me know if you find any bugs in this code.
 
M

Mickey Segal

O.L. said:
But a lot of "basic" computers have MS-JVM installed with WinXP ... :-s
So I'd like to make my applet compatible, if it's possible.

If you ask questions like "How to split String with Java 1.1" you will get
less flak since many people will understand that this supports a variety of
lagging-edge environments such as Java on Macintosh OS 9.

BTW, are there Windows JREs other than Sun's available free to users? I ask
this because of the "second supplier" issue. If Sun's JVM develops some
problem that can't be worked around such as a HotSpot crash can users get
the IBM JRE for free, or are their only choices the MS JVM or a much earlier
version of the Sun JRE? As the original poster indicates, many of us are
keeping our code at Java 1.1 in order to support the lagging-edge users and
any of these choices would be workable.

I find the MS JVM, antique though it is, still provides the best performance
in running our huge applet. However, since it exhibits some problems
running on Tablet PCs (can't paste into TextField using the pen and the TIP
issue described at www.segal.org/java/tablet_events/) it is becomming less
suitable to us as a "second supplier".

As a historical note, many see the "second supplier" issue as a major factor
in the success of Windows over the Macintosh. Many companies would not buy
Apple hardware because of corporate policies not to buy equipment for which
there was no competition.
 
M

Mark Thornton

Mickey said:
As a historical note, many see the "second supplier" issue as a major factor
in the success of Windows over the Macintosh. Many companies would not buy
Apple hardware because of corporate policies not to buy equipment for which
there was no competition.

How do such companies justify purchasing Windows over Unix? Windows is
every bit as proprietary as the Apple Mac.

Mark Thornton
 
M

Mickey Segal

Mark Thornton said:
How do such companies justify purchasing Windows over Unix? Windows is
every bit as proprietary as the Apple Mac.

The rules seem to apply to hardware more than to software, but as Linux
software offerings improve it would not be surprising to see the same rules
invoked for operating systems. However, with the costs of Windows and MS
Office being low compared to the training costs involved people are not in a
hurry to pay for a switch.
 
O

O.L.

Mark Thornton a exposé le 11/12/2005 :
How do such companies justify purchasing Windows over Unix? Windows is every
bit as proprietary as the Apple Mac.

Mark Thornton

About my question : I think java.util.StringTokenizer is what I'm
looking for, it's available with MS JVM and it's able to split strings.

Bye
 
R

Roedy Green

try {extract = str.substring(lastPos+(i==0?0:len), pos);}
catch(Exception e) {
System.exit(0);
}

A string splitting method has absolutely no business calling
System.exit and especially System.exit(0) which says all terminated
ok.

You should simply allow the error to propagate up for the caller to
deal with. Perhaps he can limp by without completing this split.
 
R

Roedy Green

int nbr = substr_count(sep, str)+1;

You did not show the code for this, but this technique requires
scanning twice. If you used an ArrayList to accumulate your strings,
even if you used toArray when done, you could avoid the double scan.
 
R

Roedy Green

But a lot of "basic" computers have MS-JVM installed with WinXP ... :-s
So I'd like to make my applet compatible, if it's possible.

This is collaborating with the enemy.
 
A

andreaz

The development and support of Microsoft Java Virtual Machine were
stopped some years before. One of the main properties and philosophy of
Java is "Write once, run everywhere". So the question is : "Why not to
use Sun JVM", IBM JVM or some other standard compliant Virtual Machine
ant continue to use the MS-JVM trash trying to reinvent the wheel.
 
M

Mickey Segal

andreaz said:
One of the main properties and philosophy of
Java is "Write once, run everywhere". So the question is : "Why not to
use Sun JVM", IBM JVM or some other standard compliant Virtual Machine
ant continue to use the MS-JVM trash trying to reinvent the wheel.

Since there are many Windows and Macintosh users still stuck with Java 1.1
environments, the "Write once, run everywhere" philosophy would suggest
doing as the original poster is doing, restricting his code to Java 1.1. It
does not sound like he is trying to use any Microsoft-specific features,
something which was risky 6 years ago and seemingly irrational now.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,169
Latest member
ArturoOlne
Top