Optimizing Java method

B

Benjamin White

The routine below is supposed to convert a String containing decimal
digits to a IBM mainframe Z/series packed decimal number. It seems to
consume a lot of CPU time. Background about this is at
http://benjaminjwhite.name/zdecimal/doc/index.html Is there a faster
way to code this?

/**
* Converts String to packed decimal number. Decimal points, commas and
* spaces are ignored. Sign character is processed. Avoid multiple signs.
* Characters other than digits are invalid and will cause DataException.
* Comma, blank, period, dollar sign and plus are ignored. Scaling and
* exponents are not valid.
*
* @param str
* String of number to convert
* @return byte array of packed decimal, 16 long
*/
public static byte[] stringToPack(String str) throws DataException {
int i; // string index
int j; // byte array index
boolean nibble_ordinal = false;
char ch1;
byte nibble;
int strlen = str.length();
byte[] pknum = new byte[16];
for (i = 0; i < 16; i++) { // initialize to zero
pknum = 0;
}
i = strlen - 1;
j = 15; /* byte index */
pknum[j] = 12; // start with positive sign
while (i > -1) {
ch1 = str.charAt(i);
switch (ch1) {
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
nibble = (byte) Character.getNumericValue(ch1);
if (nibble_ordinal) {
pknum[j] = (byte) (pknum[j] | nibble);
nibble_ordinal ^= true;
} else {
pknum[j] = (byte) (pknum[j] | nibble << 4);
nibble_ordinal ^= true;
--j;
}
--i; // get next char
break;
case ',':
case ' ':
case '.':
case '$':
case '+':
--i; // get next char
break;
case '-':
pknum[15] = (byte) (pknum[15] & 0xf0);
pknum[15] = (byte) (pknum[15] | 0x0d);
--i; // get next char
break;
default:
throw new DataException("Invalid decimal digit: " + ch1);
}
}
return (pknum);
}
 
S

Stefan Ram

Benjamin White said:
Is there a faster way to code this?

A look-up table »conv«, used - simplified - like

pknum[ j++ ]= conv[ str.charAt( i++ )]

could be worth a try.
 
E

Eric Sosman

Benjamin said:
The routine below is supposed to convert a String containing decimal
digits to a IBM mainframe Z/series packed decimal number. It seems to
consume a lot of CPU time. Background about this is at
http://benjaminjwhite.name/zdecimal/doc/index.html Is there a faster
way to code this?

/**
* Converts String to packed decimal number. Decimal points, commas and
* spaces are ignored. Sign character is processed. Avoid multiple signs.
* Characters other than digits are invalid and will cause DataException.
* Comma, blank, period, dollar sign and plus are ignored. Scaling and
* exponents are not valid.
*
* @param str
* String of number to convert
* @return byte array of packed decimal, 16 long
*/
public static byte[] stringToPack(String str) throws DataException {
int i; // string index
int j; // byte array index
boolean nibble_ordinal = false;
char ch1;
byte nibble;
int strlen = str.length();
byte[] pknum = new byte[16];
for (i = 0; i < 16; i++) { // initialize to zero
pknum = 0;
}


This loop is unnecessary, since the newly-created array
is already zero-filled.
i = strlen - 1;
j = 15; /* byte index */
pknum[j] = 12; // start with positive sign
while (i > -1) {
ch1 = str.charAt(i);
switch (ch1) {
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':

The majority of the input characters probably wind up in
this branch. It might -- *might* -- be quicker to use an
explicit comparison '0' <= ch1 && ch1 <= '9' than to pull
out the switch machinery; you could resort to a switch if
you found a non-digit character.
nibble = (byte) Character.getNumericValue(ch1);

Since you don't accept (and hence can ignore) locale-dependent
"other digit" characters, (byte)(ch1 - '0') gives the same answer
and might be faster.
if (nibble_ordinal) {
pknum[j] = (byte) (pknum[j] | nibble);
nibble_ordinal ^= true;
} else {
pknum[j] = (byte) (pknum[j] | nibble << 4);
nibble_ordinal ^= true;
--j;
}

This logic could be streamlined a bit. Instead of testing a
flag and branching, consider maintaining a shift count that
alternates between 0 and 4 (shift = 4 - shift is one way). The
only conditional that remains is deciding when to decrement j,
which you'd do after shift returns to zero.
--i; // get next char
break;
case ',':
case ' ':
case '.':
case '$':
case '+':
--i; // get next char
break;
case '-':
pknum[15] = (byte) (pknum[15] & 0xf0);
pknum[15] = (byte) (pknum[15] | 0x0d);

This could also be streamlined, but probably doesn't have
much effect on the performance since it's executed only once
per number.
--i; // get next char
break;
default:
throw new DataException("Invalid decimal digit: " + ch1);

Tastes vary, but I think it would be more helpful to display
the entire objectionable String. When you're trying to track
down the source of an invalid input, knowing that the invalid
digit was "x" tells you a little, but knowing that the wider
context was "8.5x11" tells you more.
}
}
return (pknum);
}

One thing I don't like is the way the maintenance of the
index 'i' is scattered all over the code: It's just a loop
counter, yet it's modified at three different sites inside
the loop. This introduces comprehensional complexity ("How
sure am I that 'i' is decremented exactly once on each path
through the loop?"), and may also make it harder for the JIT
compiler to recognize that it's just an index and optimize
accordingly.

Putting it all together, I'd suggest something along the
lines of (just typed in, not tested):

public static byte[] stringToPack(String str)
throws DataException
{
byte[] pknum = new byte[16];
byte[15] = 0x0C;
int j = 15;
int shift = 4;
for (int i = str.length; --i >= 0; ) {
char ch = str.charAt(i);
if ('0' <= ch && ch <= '9') {
// optional:
if (j < 0)
throw ... // something nicer than ArrayIndexOOBE
pknum[j] |= (byte)((ch - '0') << shift);
shift = 4 - shift;
if (shift == 0)
--j;
}
else {
switch (ch) {
case ',':
case ... // the remaining punctuation
break;
case '-':
pknum[15] |= 0x01;
break;
default:
throw ...;
}
}
return pknum;
}

The explicit digit-ness test may or may not save time;
if it's important you should code it both ways and measure.

Validation: There's not much in the way of validation
in your original code, nor in my rewrite. I can't say how
careful you should be because I don't know the context:
how trustworthy are the inputs? But it seems to me you
might consider allowing only one + or one -, and insisting
that it be either at the end or at the beginning rather
than embedded: "123+-0.4" probably shouldn't be valid. If
you want to make such tests, I'd suggest checking the first
and last characters before starting the loop and adjusting
the loop's start and end accordingly; then any embedded
sign characters will show up as invalid. Be careful when
checking those boundary characters, though: if the input
string is "" there won't be any (you might or might not
want to consider that an error in its own right).
 
I

Ingo R. Homann

Hi,

some suggestions (note that they may depend upon your hardware, os and
jvm)...

Benjamin said:
public static byte[] stringToPack(String str) throws DataException {
int i; // string index
int j; // byte array index
boolean nibble_ordinal = false;
char ch1;
byte nibble;
int strlen = str.length();
byte[] pknum = new byte[16];
for (i = 0; i < 16; i++) { // initialize to zero
pknum = 0;
}


This schould not be necessary and can be done within the loop below!
i = strlen - 1;
j = 15; /* byte index */
pknum[j] = 12; // start with positive sign
while (i > -1) {
ch1 = str.charAt(i);

It might (!) be faster to first get the complete char[] from the String
and then to iterate over it.

Of course it would of course be great (if possible) if the
method-signature would accept a char[] instead of a String.

Note that a loop with an increasing array-index is *much* faster in most
cases (due to processor-caches) - if your logic allows that!
switch (ch1) {
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':

Here, an if might be faster (but might be slower as well - measure it):

if(c1>=0 && ch1<=9) {
...
} else if (...

An if-cascade, of course, is much faster if you know the statistical
probability of the cases. If e.g. you know, that 99% of all characters
are an '-', then put the corresponding if-statement at the beginning!
nibble = (byte) Character.getNumericValue(ch1);
if (nibble_ordinal) {
pknum[j] = (byte) (pknum[j] | nibble);
nibble_ordinal ^= true;
} else {
pknum[j] = (byte) (pknum[j] | nibble << 4);
nibble_ordinal ^= true;
--j;
}
--i; // get next char
break;
case ',':
case ' ':
case '.':
case '$':
case '+':
--i; // get next char
break;
case '-':
pknum[15] = (byte) (pknum[15] & 0xf0);
pknum[15] = (byte) (pknum[15] | 0x0d);

This is slow: Why not write the result in a temporary variable first:

byte tmp = (byte) (pknum[15] & 0xf0);
pknum[15] = (byte) (tmp | 0x0d);

....or even do it in one single line:

pknum[15] = (byte) ((pknum[15] & 0xf0)| 0x0d);
--i; // get next char
break;
default:
throw new DataException("Invalid decimal digit: " + ch1);
}
}
return (pknum);
}

And, of course, if you have to convert millions of Strings but there are
only a hand full of *different* Strings (perhaps 1000 or 100.000), then
it is often a good idea to initialize a (Hash)Map with the Strings as
key and the result as value and only lookup the results.

Hth,
Ingo
 
T

Tom Hawtin

Benjamin said:
The routine below is supposed to convert a String containing decimal
digits to a IBM mainframe Z/series packed decimal number. It seems to
consume a lot of CPU time. Background about this is at
http://benjaminjwhite.name/zdecimal/doc/index.html Is there a faster
way to code this?

Are you really sure that the performance of this method is significant?
public static byte[] stringToPack(String str) throws DataException {
int i; // string index
int j; // byte array index
boolean nibble_ordinal = false;
char ch1;
byte nibble;
int strlen = str.length();

It's generally better from a code readability point of view to declare
variables with as small a scope as possible. strlen you could remove
altogether.
byte[] pknum = new byte[16];
for (i = 0; i < 16; i++) { // initialize to zero
pknum = 0;
}


No need to zero newly allocated arrays.
i = strlen - 1;
j = 15; /* byte index */
pknum[j] = 12; // start with positive sign
while (i > -1) {
ch1 = str.charAt(i);
switch (ch1) {
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':

I would replace this with "if ('0' <= c && c <= '9') {", which should be
significantly more efficient (relatively).
nibble = (byte) Character.getNumericValue(ch1);

As you are only dealing with ASCII characters, no need for all weird
Unicode code. So just use "int nibble = c-'0';"
if (nibble_ordinal) {
pknum[j] = (byte) (pknum[j] | nibble);
nibble_ordinal ^= true;
} else {
pknum[j] = (byte) (pknum[j] | nibble << 4);
nibble_ordinal ^= true;
--j;
}

There's a lot of duplicate code there. Try:

if (!nibbleOrdinal) {
--j;
}
pkNum[j] = (byte) (pkNum[j] | nibble);
nibbleOrdinal = !nibbleOrdinal;
--i; // get next char

You do this for every (success) case. So take it out of the switch.

Tom Hawtin
 
P

Patricia Shanahan

Benjamin said:
The routine below is supposed to convert a String containing decimal
digits to a IBM mainframe Z/series packed decimal number. It seems to
consume a lot of CPU time. Background about this is at
http://benjaminjwhite.name/zdecimal/doc/index.html Is there a faster
way to code this?
....

Probably. Do you have a test harness? A program that runs the method for
a typical set of inputs and reports an overall score indicating how well
it did?

Do you care about a particular compiler and JVM?

Patricia
 
B

bugbear

Benjamin said:
The routine below is supposed to convert a String containing decimal
digits to a IBM mainframe Z/series packed decimal number. It seems to
consume a lot of CPU time. Background about this is at
http://benjaminjwhite.name/zdecimal/doc/index.html Is there a faster
way to code this?

What kind of data do you feed it
(statistically...)

long numbers?
Short numbers
how common are the "ignored" characters
positive/negative distribution?

BugBear
 
L

Lothar Kimmeringer

Benjamin said:
The routine below is supposed to convert a String containing decimal
digits to a IBM mainframe Z/series packed decimal number. It seems to
consume a lot of CPU time. Background about this is at
http://benjaminjwhite.name/zdecimal/doc/index.html Is there a faster
way to code this?

To sum up things:

private static WeakHashMap cache = new WeakHashMap();

public static byte[] stringToPack(String str) throws DataException {
WeakReference wr = (WeakReference) cache.get(str);
if (wr != null && wr.get() != null){
return (byte[]) wr.get().clone();
}
byte[] res = new byte[16];
int shift = 4;
int currIndex = res.length; // decreased in loop
for (int i = str.length() - 1; i >= 0; i--){
char elem = str.charAt(i);
if (elem >= '0' && elem <= '9'){
if (shift != 0){
currIndex--;
}
res[currIndex] |= (byte) ((elem - '0') << shift);
shift ^= 4;
}
else{
switch(elem){
case ',':
case ' ':
case '.':
case '$':
case '+':{
break;
}
case '0':{
res[res.length - 1] &= 0xf0;
res[res.length - 1] |= 0x0d;
break;
}
default:{
throw new DataException("Invalid decimal digit: " + ch1 + " in string '" + str + "'");
}
}
}
}
cache.put(str, new WeakReference(res));
return res;
}

Not tested, just typed.


Regards, Lothar
--
Lothar Kimmeringer E-Mail: (e-mail address removed)
PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

Always remember: The answer is forty-two, there can only be wrong
questions!
 
E

Eric Sosman

Lothar Kimmeringer wrote On 07/11/07 12:18,:
Benjamin White wrote:

The routine below is supposed to convert a String containing decimal
digits to a IBM mainframe Z/series packed decimal number. It seems to
consume a lot of CPU time. Background about this is at
http://benjaminjwhite.name/zdecimal/doc/index.html Is there a faster
way to code this?


To sum up things:

private static WeakHashMap cache = new WeakHashMap();

public static byte[] stringToPack(String str) throws DataException {
WeakReference wr = (WeakReference) cache.get(str);
if (wr != null && wr.get() != null){
return (byte[]) wr.get().clone();
}

One would need to measure to be sure, but the cache
strikes me as unlikely to help in this case. The lookup
will call str.hashCode(), and unless these String instances
have already been put in other maps it's likely that the
hashCode() method will actually compute the hash rather
than using a value already cached. The hash computation
is probably only a little faster than the conversion it
tries to avoid, so you'd need a *really* high hit rate
to make it worth while.

... but, as is usual with performance questions,
the gold standard is measurement.
byte[] res = new byte[16];

Insert `byte[15] = 0x0C;' here.
int shift = 4;
int currIndex = res.length; // decreased in loop
for (int i = str.length() - 1; i >= 0; i--){
char elem = str.charAt(i);
if (elem >= '0' && elem <= '9'){
if (shift != 0){
currIndex--;
}

Another micro-optimization, avoiding the branch:

currIndex -= shift >> 2;
res[currIndex] |= (byte) ((elem - '0') << shift);
shift ^= 4;
}
else{
switch(elem){
case ',':
case ' ':
case '.':
case '$':
case '+':{
break;
}
case '0':{

ITYM '-'.
res[res.length - 1] &= 0xf0;
res[res.length - 1] |= 0x0d;

Why use two operations to set one bit?
 
L

Lothar Kimmeringer

Eric said:
One would need to measure to be sure, but the cache
strikes me as unlikely to help in this case.

It was a suggestion during in this thread, that caching
might help if there are many conversions of the same
numbers over and over again.
The lookup
will call str.hashCode(), and unless these String instances
have already been put in other maps it's likely that the
hashCode() method will actually compute the hash rather
than using a value already cached.

AFAIR the hashcode is computed during String-instantiation.
... but, as is usual with performance questions,
the gold standard is measurement.
Right

Another micro-optimization, avoiding the branch:

currIndex -= shift >> 2;

I prefer readability over micro-optimization. So if it's
fast enough, I stay with the branch.
case '0':{

ITYM '-'.
correct.
res[res.length - 1] &= 0xf0;
res[res.length - 1] |= 0x0d;

Why use two operations to set one bit?

Because not only two bits are set, also two bits are resetted.


Regards, Lothar
--
Lothar Kimmeringer E-Mail: (e-mail address removed)
PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

Always remember: The answer is forty-two, there can only be wrong
questions!
 
E

Eric Sosman

Lothar Kimmeringer wrote On 07/11/07 14:22,:
Eric Sosman wrote:




It was a suggestion during in this thread, that caching
might help if there are many conversions of the same
numbers over and over again.




AFAIR the hashcode is computed during String-instantiation.

That may depend on the Java version -- ISTR that
the hash code for String was not always as it is now.
In 1.6, though, it's instantiated as zero and then
calculated if the hashCode() method finds that it's
still zero.
... but, as is usual with performance questions,
the gold standard is measurement.

Right


Another micro-optimization, avoiding the branch:

currIndex -= shift >> 2;


I prefer readability over micro-optimization. So if it's
fast enough, I stay with the branch.

case '0':{

ITYM '-'.

correct.

res[res.length - 1] &= 0xf0;
res[res.length - 1] |= 0x0d;

Why use two operations to set one bit?


Because not only two bits are set, also two bits are resetted.

The low-order nybble goes from C to D: 1100 to 1101.
"No bits were destroyed in the production of this value."
 
P

Piotr Kobzda

Benjamin said:
byte[] pknum = new byte[16];

Consider passing to the method already created byte[] array instead of
creating it per each invocation.

See also your JVM options, sometimes there are some, which allows to
optimize your code execution (e.g. -server ...).


piotr
 
P

Piotr Kobzda

Lothar said:
private static WeakHashMap cache = new WeakHashMap();
cache.put(str, new WeakReference(res));

Usefulness of a cache built that way depends mostly on how the GC is
implemented. In general, it requires holding externally a strong
references to both, keys, and values. Otherwise, that cache may
reference a garbage only...

Better than that would be using a SoftReferences-based cache.
Unfortunately, there is no a SoftCache officially available in standard
Java distribution (however, the one undocumented named as such is
already there). Hopefully, we can build it ourself, or reuse already
written one, for example:

http://google-guice.googlecode.com/svn/trunk/src/com/google/inject/internal/ReferenceCache.java

A soft-cache useful here would be:

new ReferenceCache(SOFT, SOFT) { ...


Of course, as already said by Eric, before using a cache, you have to
proof yourself (by measurement), that it gives you any observable benefit.


piotr
 
A

andrewmcdonagh

The routine below is supposed to convert a String containing decimal
digits to a IBM mainframe Z/series packed decimal number. It seems to
consume a lot of CPU time. Background about this is athttp://benjaminjwhite.name/zdecimal/doc/index.html Is there a faster
way to code this?

/**
* Converts String to packed decimal number. Decimal points, commas and
* spaces are ignored. Sign character is processed. Avoid multiple signs.
* Characters other than digits are invalid and will cause DataException.
* Comma, blank, period, dollar sign and plus are ignored. Scaling and
* exponents are not valid.
*
* @param str
* String of number to convert
* @return byte array of packed decimal, 16 long
*/
public static byte[] stringToPack(String str) throws DataException {
int i; // string index
int j; // byte array index
boolean nibble_ordinal = false;
char ch1;
byte nibble;
int strlen = str.length();
byte[] pknum = new byte[16];
for (i = 0; i < 16; i++) { // initialize to zero
pknum = 0;
}
i = strlen - 1;
j = 15; /* byte index */
pknum[j] = 12; // start with positive sign
while (i > -1) {
ch1 = str.charAt(i);
switch (ch1) {
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
nibble = (byte) Character.getNumericValue(ch1);
if (nibble_ordinal) {
pknum[j] = (byte) (pknum[j] | nibble);
nibble_ordinal ^= true;
} else {
pknum[j] = (byte) (pknum[j] | nibble << 4);
nibble_ordinal ^= true;
--j;
}
--i; // get next char
break;
case ',':
case ' ':
case '.':
case '$':
case '+':
--i; // get next char
break;
case '-':
pknum[15] = (byte) (pknum[15] & 0xf0);
pknum[15] = (byte) (pknum[15] | 0x0d);
--i; // get next char
break;
default:
throw new DataException("Invalid decimal digit: " + ch1);
}
}
return (pknum);
}



Have you actually measured how fast this method runs?

I ask, because I just created a (crude) test program to call it and
its fast - I had to measure in nanoseconds to actually get any
timings.

Sure we can make it faster - but do we need to?

Before optimising, you really need to profile using a good profiler -
not a developers guess. Its often the case that we optimise the wrong
parts of the system and the real bottle neck is left untreated.

Anyway...here's the timing I get on a P4 3Ghz 512 ram Windows PC.

Input: 12345.0
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 35, 69, 12,
Time to convert: 154164ns
Input: -12345.0
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 35, 69, 13,
Time to convert: 18028ns
Input: 9,876,543.0
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 9, -121, 101, 67, 12,
Time to convert: 22296ns
Input: 1 2345.0
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 35, 69, 12,
Time to convert: 22818ns
Input: $12345.0
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 35, 69, 12,
Time to convert: 17084ns
Input: -$1 2345.0
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 35, 69, 13,
Time to convert: 15455ns

Here's the class I created which has a copy & paste of your method 'as
is'.


package com.amd.sandbox;

public class StringConverter {
/**
* Converts String to packed decimal number. Decimal points, commas
and spaces
* are ignored. Sign character is processed. Avoid multiple signs.
Characters
* other than digits are invalid and will cause DataException.
Comma, blank,
* period, dollar sign and plus are ignored. Scaling and exponents
are not
* valid.
*
* @param str
* String of number to convert
* @return byte array of packed decimal, 16 long
*/
public static byte[] stringToPack(String str) throws Exception {
int i; // string index
int j; // byte array index
boolean nibble_ordinal = false;
char ch1;
byte nibble;
int strlen = str.length();
byte[] pknum = new byte[16];
for (i = 0; i < 16; i++) { // initialize to zero
pknum = 0;
}
i = strlen - 1;
j = 15; /* byte index */
pknum[j] = 12; // start with positive sign
while (i > -1) {
ch1 = str.charAt(i);
switch (ch1) {
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
nibble = (byte) Character.getNumericValue(ch1);
if (nibble_ordinal) {
pknum[j] = (byte) (pknum[j] | nibble);
nibble_ordinal ^= true;
} else {
pknum[j] = (byte) (pknum[j] | nibble << 4);
nibble_ordinal ^= true;
--j;
}
--i; // get next char
break;
case ',':
case ' ':
case '.':
case '$':
case '+':
--i; // get next char
break;
case '-':
pknum[15] = (byte) (pknum[15] & 0xf0);
pknum[15] = (byte) (pknum[15] | 0x0d);
--i; // get next char
break;
default:
throw new Exception("Invalid decimal digit: " + ch1);
}
}
return (pknum);
}

public static void main(String[] args) throws Exception {
testWith("12345.0");
testWith("-12345.0");
testWith("9,876,543.0");
testWith("1 2345.0");
testWith("$12345.0");
testWith("-$1 2345.0");
}

private static void testWith(String inputString) throws Exception {
System.out.println("Input: " + inputString);

long startTime = System.nanoTime();
byte[] answer = stringToPack(inputString);
long endTime = System.nanoTime();

for (int i = 0; i < answer.length; i++) {
System.out.print(answer + ", ");
}
System.out.println("\nTime to convert: " + (endTime - startTime)
+"ns");
}
}
 
B

Benjamin White

andrewmcdonagh said:
The routine below is supposed to convert a String containing decimal
digits to a IBM mainframe Z/series packed decimal number. It seems to
consume a lot of CPU time. Background about this is athttp://benjaminjwhite.name/zdecimal/doc/index.html Is there a faster
way to code this?

/**
* Converts String to packed decimal number. Decimal points, commas and
* spaces are ignored. Sign character is processed. Avoid multiple signs.
* Characters other than digits are invalid and will cause DataException.
* Comma, blank, period, dollar sign and plus are ignored. Scaling and
* exponents are not valid.
*
* @param str
* String of number to convert
* @return byte array of packed decimal, 16 long
*/
public static byte[] stringToPack(String str) throws DataException {
int i; // string index
int j; // byte array index
boolean nibble_ordinal = false;
char ch1;
byte nibble;
int strlen = str.length();
byte[] pknum = new byte[16];
for (i = 0; i < 16; i++) { // initialize to zero
pknum = 0;
}
i = strlen - 1;
j = 15; /* byte index */
pknum[j] = 12; // start with positive sign
while (i > -1) {
ch1 = str.charAt(i);
switch (ch1) {
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
nibble = (byte) Character.getNumericValue(ch1);
if (nibble_ordinal) {
pknum[j] = (byte) (pknum[j] | nibble);
nibble_ordinal ^= true;
} else {
pknum[j] = (byte) (pknum[j] | nibble << 4);
nibble_ordinal ^= true;
--j;
}
--i; // get next char
break;
case ',':
case ' ':
case '.':
case '$':
case '+':
--i; // get next char
break;
case '-':
pknum[15] = (byte) (pknum[15] & 0xf0);
pknum[15] = (byte) (pknum[15] | 0x0d);
--i; // get next char
break;
default:
throw new DataException("Invalid decimal digit: " + ch1);
}
}
return (pknum);
}



Have you actually measured how fast this method runs?

I ask, because I just created a (crude) test program to call it and
its fast - I had to measure in nanoseconds to actually get any
timings.

Sure we can make it faster - but do we need to?

Before optimising, you really need to profile using a good profiler -
not a developers guess. Its often the case that we optimise the wrong
parts of the system and the real bottle neck is left untreated.

Anyway...here's the timing I get on a P4 3Ghz 512 ram Windows PC.

Input: 12345.0
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 35, 69, 12,
Time to convert: 154164ns
Input: -12345.0
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 35, 69, 13,
Time to convert: 18028ns
Input: 9,876,543.0
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 9, -121, 101, 67, 12,
Time to convert: 22296ns
Input: 1 2345.0
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 35, 69, 12,
Time to convert: 22818ns
Input: $12345.0
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 35, 69, 12,
Time to convert: 17084ns
Input: -$1 2345.0
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 35, 69, 13,
Time to convert: 15455ns

Here's the class I created which has a copy & paste of your method 'as
is'.


package com.amd.sandbox;

public class StringConverter {
/**
* Converts String to packed decimal number. Decimal points, commas
and spaces
* are ignored. Sign character is processed. Avoid multiple signs.
Characters
* other than digits are invalid and will cause DataException.
Comma, blank,
* period, dollar sign and plus are ignored. Scaling and exponents
are not
* valid.
*
* @param str
* String of number to convert
* @return byte array of packed decimal, 16 long
*/
public static byte[] stringToPack(String str) throws Exception {
int i; // string index
int j; // byte array index
boolean nibble_ordinal = false;
char ch1;
byte nibble;
int strlen = str.length();
byte[] pknum = new byte[16];
for (i = 0; i < 16; i++) { // initialize to zero
pknum = 0;
}
i = strlen - 1;
j = 15; /* byte index */
pknum[j] = 12; // start with positive sign
while (i > -1) {
ch1 = str.charAt(i);
switch (ch1) {
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
nibble = (byte) Character.getNumericValue(ch1);
if (nibble_ordinal) {
pknum[j] = (byte) (pknum[j] | nibble);
nibble_ordinal ^= true;
} else {
pknum[j] = (byte) (pknum[j] | nibble << 4);
nibble_ordinal ^= true;
--j;
}
--i; // get next char
break;
case ',':
case ' ':
case '.':
case '$':
case '+':
--i; // get next char
break;
case '-':
pknum[15] = (byte) (pknum[15] & 0xf0);
pknum[15] = (byte) (pknum[15] | 0x0d);
--i; // get next char
break;
default:
throw new Exception("Invalid decimal digit: " + ch1);
}
}
return (pknum);
}

public static void main(String[] args) throws Exception {
testWith("12345.0");
testWith("-12345.0");
testWith("9,876,543.0");
testWith("1 2345.0");
testWith("$12345.0");
testWith("-$1 2345.0");
}

private static void testWith(String inputString) throws Exception {
System.out.println("Input: " + inputString);

long startTime = System.nanoTime();
byte[] answer = stringToPack(inputString);
long endTime = System.nanoTime();

for (int i = 0; i < answer.length; i++) {
System.out.print(answer + ", ");
}
System.out.println("\nTime to convert: " + (endTime - startTime)
+"ns");
}
}

Thanks for testing. Someone was complaining that the routine was slow.
It may be called 10 to 100 million times for a large batch of data.
So every little bit of improvement will help.

Thanks everyone for all the good suggestions. I learned alot. I will
try many of the suggestions and test. I may have to call the program
multiple times in a loop to see the differences. If anyone is
interested I will post the results.
 
B

Benjamin White

Ingo said:
Hi,

Benjamin said:
Thanks for testing. Someone was complaining that the routine was slow.
It may be called 10 to 100 million times for a large batch of data.
So every little bit of improvement will help.

Thanks everyone for all the good suggestions. I learned alot. I will
try many of the suggestions and test. I may have to call the program
multiple times in a loop to see the differences. If anyone is
interested I will post the results.

[X] Interested! (Because measuring is most important on optimization.)

Ciao,
Ingo
Yes, I am measuring to test each of these good optimization suggestions,
but I don't want to make a change for a specific JVM option or hardware
 
B

Benjamin White

Everyone, thanks for all the good suggestions. I am making changes and
testing. Some have made good improvement. Calling the Java method
100,000 times, the duration dropped from 94 to 31 milliseconds.

The ZDecimal package is open source. If someone would like to review,
criticize and help optimize, I would be happy to give your name credit
in the project. http://benjaminjwhite.name/decimal
 
R

Roedy Green

The routine below is supposed to convert a String containing decimal
digits to a IBM mainframe Z/series packed decimal number. It seems to
consume a lot of CPU time. Background about this is at
http://benjaminjwhite.name/zdecimal/doc/index.html Is there a faster
way to code this?

Some time ago we did a contest to write a signum method and learned
quite a bit about the various JVMs and compilers. Perhaps we could
similarly create a benchmark timing framework to see how you could
come up with the best code for this.

See http://mindprod.com/jgloss/benchmark.html
 
B

Benjamin White

Roedy said:
Some time ago we did a contest to write a signum method and learned
quite a bit about the various JVMs and compilers. Perhaps we could
similarly create a benchmark timing framework to see how you could
come up with the best code for this.

See http://mindprod.com/jgloss/benchmark.html
Thanks for the great link.

My first goal is to just learn to write fast code in theory, a good
algorithm. The package may run on many different JVMs and hardware. I
would want to code for one type of environment and make to others suffer.

You should contact IBM or someone that has a mainframe, IBM is
promoting Java on their Z/Series, the have a special hardware/microcode
to process Java. It would be good to see what your benchmarks do on the
ZAAP processors.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top