Signs of stupid Java code

T

Thomas G. Marshall

Darryl L. Pierce said:
True, though I feel confident, again, in saying that if it's
necessary to specifically cast a null object then that would indicate
to me that a no-arg version of the method is required.

But you'd be wrong. There are many times where null is considered a
perfectly valid parameter to a subsystem whose entry point is a method
requiring an object. There is no reason to supply a non-overloaded method
name specifically for null parameters. In fact, I'd argue strongly that to
do so would be evident of bad design.
 
D

Darryl L. Pierce

Jesper said:
Too inefficient, I would use:

if (address != null) {
int index = address.indexOf('@');
if (index > 0 && address.indexOf('.', index + 1) >= 0) {
...
}
}

I guess there are different levels of programming stupidity (no offense
;-).

Have you checked that your implemention is more efficient? Because, quite
frankly, your code is no more efficient than mine and it also introduces a
variable that's not used outside of a single check and has no gain in
functionality or efficiency. So, where exactly do you see your code being
more efficient?
 
D

Darryl L. Pierce

Hurm... Duplicate code. Even worse, taking a performance hit for
calculating the same value. Why are you calling indexOf("@") more than
once?

To not waste cycles creating a variable that may not be used, and to have
the check fail quickly if there's no '@' in the address being checked. If
there's no "@", the expression ends there.
I wouldn't hold this up as a "good" example.

Perhaps not, but have you gone over the whole range of possible values being
checked before coming to that conclusion? And, have you compared the
efficiency to the assumed "create a variable and check that to avoid an
extra function call" performance boost?
 
D

Darryl L. Pierce

Ben_ said:
What about using javax.mail.internet.InternetAddress instead ?

The code I mentioned is from a MIDP application, where InternetAddress is
not available.
 
D

Darryl L. Pierce

Thomas said:
But you'd be wrong.

Really? Based on what requirement? (note: we're talking hypothetical
situations and probabilities, not specific situations)
There are many times where null is considered a
perfectly valid parameter to a subsystem whose entry point is a method
requiring an object. There is no reason to supply a non-overloaded method
name specifically for null parameters. In fact, I'd argue strongly that
to do so would be evident of bad design.

Based on what requirement, again? Can you quote specific situations where
casting a null is better than having an overloaded method which doesn't
have the null-casted object? Give details, please.
 
T

Timo Kinnunen

Darryl L. Pierce said:
Have you checked that your implemention is more efficient?
Because, quite frankly, your code is no more efficient than mine
and it also introduces a variable that's not used outside of a
single check and has no gain in functionality or efficiency. So,
where exactly do you see your code being more efficient?

In general, the compiler doesn't know which library version is going
to be used to run the code, so it cannot prove that the two
address.indexOf("@") calls can be combined. The same goes for the
..length() call and the first .indexOf() call. So, on a non Hotspot VM
his code is more efficient, if method call cost is dominating.

On input "timo.kinnunen@localhost", his code scans the "localhost"
part, while your code scans ".kinnunen@localhost". If the amount of
chars to be scanned is dominating, his code is more efficient.
 
D

Darryl L. Pierce

Timo said:
In general, the compiler doesn't know which library version is going
to be used to run the code, so it cannot prove that the two
address.indexOf("@") calls can be combined. The same goes for the
.length() call and the first .indexOf() call. So, on a non Hotspot VM
his code is more efficient, if method call cost is dominating.

"...on a non Hotspot VM..." Have you compared both in a real-world scenario?
On input "timo.kinnunen@localhost", his code scans the "localhost"
part, while your code scans ".kinnunen@localhost". If the amount of
chars to be scanned is dominating, his code is more efficient.

His code also creates a local variable with the necessary alteration tot he
local stack and necessary clean-up work. My code does not. My code instead
has a single additional function call that, on an optimizing VM, is far
quicker.
 
J

Jesper Nordenberg

Four optimizations:

1. Removed the "address.length() > 0" expression.
2. Used char instead of String when searching for characters.
3. Search for the '.' only on characters after the '@'. You search the
whole address.
4. Store the result of 'address.indexOf('@')' in a temporary variable.
His code also creates a local variable with the necessary alteration tot he
local stack and necessary clean-up work. My code does not. My code instead
has a single additional function call that, on an optimizing VM, is far
quicker.

Maybe you should check your facts before posting. Just to make sure I
benchmarked both algorithms on 1000 random addresses. Here are my
results:

j2sdk1.4.1_05\bin\javaw.exe:
My algorithm: 2414 ms
Your algorithm: 3806 ms

j2sdk1.4.1_05\bin\javaw.exe -server:
My algorithm: 1232 ms
Your algorithm: 3706 ms

So, it seems in a good JVM, like Hotspot server, your code is much
easier to optimize than mine (not) ;-) Feel free to perform your own
benchmarks, but I doubt you'll get different results.

/Jesper Nordenberg
 
T

Timo Kinnunen

Darryl L. Pierce said:
"...on a non Hotspot VM..." Have you compared both in a real-world
scenario?

No, I haven't. I added that restriction there because the
optimization in question *is* available to the VM and is the kind of
optimization a Hotspot VM might make.
His code also creates a local variable with the necessary
alteration tot he local stack and necessary clean-up work. My code
does not. My code instead has a single additional function call
that, on an optimizing VM, is far quicker.

My gut feeling tells me an extra local variable is less overhead than
a method call, but unfornately I don't know the class file format or
the VM execution model well enough to speculate further.
 
D

Darryl L. Pierce

Jesper said:
Four optimizations:

1. Removed the "address.length() > 0" expression.

That's an acceptable optimization, since the next condition will fail just
as easily.
2. Used char instead of String when searching for characters.

The code I posted wasn't c-n-p from our source tree but just what I typed
in. Our real code uses char searches.
3. Search for the '.' only on characters after the '@'. You search the
whole address.

Our search looks for the last index of '.' to compare to the index for '@'.
4. Store the result of 'address.indexOf('@')' in a temporary variable.

I avoid creating local variables that aren't used beyond a single line.
Unless significant optimization is achieved, I wouldn't do that for
maintainability.
Maybe you should check your facts before posting.

At this point, it's been speculation.
Just to make sure I
benchmarked both algorithms on 1000 random addresses. Here are my
results:

j2sdk1.4.1_05\bin\javaw.exe:
My algorithm: 2414 ms
Your algorithm: 3806 ms

So one too 2ms each and the other took 3ms? I ran my algorithm versus yours
with 100,000 reps for the address "(e-mail address removed)" and got a wide
range of results. Sometimes yours was faster, sometimes mine was faster.
Both processed the address in less than a millisecond The source code (open
to your correction) is:

---8<[snip]---
public class Test
{
private static final long REPETITIONS = 1000000L;

public static long myWay(String address)
{
long time = System.currentTimeMillis();
boolean pass = false;

for(long reps = 0;reps < REPETITIONS;reps++)
{
if((address != null &&
address.length() > 0 &&
(address.indexOf('@') > 0 &&
address.lastIndexOf('.') > address.indexOf('@'))))
{
pass = true;
}
}

time = System.currentTimeMillis() - time;

return time;
}

public static long hisWay(String address)
{
long time = System.currentTimeMillis();
boolean pass = false;

for(long reps = 0;reps < REPETITIONS;reps++)
{
if(address != null)
{
int index = address.indexOf('@');

if(index > 0 &&
address.indexOf('.',index + 1) > 0)
{
pass = true;
}
}
}

time = System.currentTimeMillis() - time;

return time;
}

public static void main(String[] args)
{
for(int run = 0;run < 2;run++)
{
long mine = myWay("(e-mail address removed)");
long his = hisWay("(e-mail address removed)");

if(run > 0)
{
System.out.println(" My way: " + mine + " [" + (mine / REPETITIONS) + "ms
each]");
System.out.println("His way: " + his + " [" + (his / REPETITIONS) + "ms
each]");
}
}
}
}
---8<[snip]---

Your algorithm and mine came very close to each other, processing 1,000,000
entries in < 150ms, giving each one about 0ms processing time. I don't know
why you had such large processing times unless you're including your random
address generation times in the processing times, which would be unfair to
the algorithm since it includes non-algo time in their results. With my
test, your algorithm is no more effecient than mine.

To quote your post, "[m]aybe you should check your facts before posting."
Care to post your code?
j2sdk1.4.1_05\bin\javaw.exe -server:
My algorithm: 1232 ms
Your algorithm: 3706 ms

So, it seems in a good JVM, like Hotspot server, your code is much
easier to optimize than mine (not) ;-) Feel free to perform your own
benchmarks, but I doubt you'll get different results.

Your doubts would be unfounded, then.
 
T

Timo Kinnunen

Darryl L. Pierce said:
Your algorithm and mine came very close to each other, processing
1,000,000 entries in < 150ms, giving each one about 0ms processing
time. I don't know why you had such large processing times unless
you're including your random address generation times in the
processing times, which would be unfair to the algorithm since it
includes non-algo time in their results. With my test, your
algorithm is no more effecient than mine.

To quote your post, "[m]aybe you should check your facts before
posting." Care to post your code?

I'll post mine:

public class EmailParseComparison implements Runnable {
private int which;
private volatile int count = 0;
private volatile boolean ending = false;
private volatile boolean flag = false;

private static String[] emails = {
// 20 random emails addresses from this newsgroup
};

private EmailParseComparison(int which) {
this.which = which;
}

private void test() {
Thread thread = new Thread(this);
thread.start();

sleep(10);
long startTime = System.currentTimeMillis();
int startCount = count;
sleep(30);
long endTime = System.currentTimeMillis();
int endCount = count;
ending = true;

try { thread.join(); } catch(InterruptedException ignored) {}

System.out.println("In " + (endTime - startTime) +
" ms: " + (endCount - startCount) +
(which == 0 ? " hisWay" : " myWay") +
" parses, and BTW, flag is " + flag);
}

private static void sleep(int seconds) {
try {
Thread.currentThread().sleep(seconds * 1000);
} catch(InterruptedException ignored) {}
}

public final void run() {
if(which == 0) {
runHisWayLoop();
} else {
runMyWayLoop();
}
}

private void runHisWayLoop() {
while(true) {
for(int i = 0, n = emails.length; i < n; i++) {
if(hisWay(emails)) {
flag = !flag;
}
count++;
if(ending) {
return;
}
}
}
}

private void runMyWayLoop() {
while(true) {
for(int i = 0, n = emails.length; i < n; i++) {
if(myWay(emails)) {
flag = !flag;
}
count++;
if(ending) {
return;
}
}
}
}

public static boolean myWay(String address) {
boolean pass = false;

if((address != null &&
address.length() > 0 &&
(address.indexOf('@') > 0 &&
address.lastIndexOf('.') > address.indexOf('@')))) {

pass = true;
}
return pass;
}

public static boolean hisWay(String address) {
boolean pass = false;

if(address != null) {
int index = address.indexOf('@');

if(index > 0 &&
address.indexOf('.',index + 1) > 0) {

pass = true;
}
}
return pass;
}

public static void main(String[] args) {
if(args.length == 0) {
new EmailParseComparison(0).test();
} else {
new EmailParseComparison(1).test();
}
}
}

And the results:
\j2sdk1.4.2\bin\java -version
java version "1.4.2"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-b28)
Java HotSpot(TM) Client VM (build 1.4.2-b28, mixed mode)
\j2sdk1.4.2\bin\java EmailParseComparison
In 30003 ms: 129879845 hisWay parses, and BTW, flag is false
\j2sdk1.4.2\bin\java EmailParseComparison h
In 30003 ms: 92840834 myWay parses, and BTW, flag is true
\j2sdk1.4.2\bin\java -server EmailParseComparison
In 30004 ms: 100740326 hisWay parses, and BTW, flag is true
\j2sdk1.4.2\bin\java -server EmailParseComparison j
In 30003 ms: 75937227 myWay parses, and BTW, flag is true

This gives hisWay 39.9% and 32.7% edges, respectively, over myWay.
Interestingly, 18 of the 20 sample email addresses I used favored
myWay in the amount needed to scan.
 
A

Adam Jenkins

Jesper said:
Too inefficient, I would use:

if (address != null) {
int index = address.indexOf('@');
if (index > 0 && address.indexOf('.', index + 1) >= 0) {
...
}
}

I would have used something like

if (address != null &&
address.matches("[^@]+(@\\w+(\\.\w+)*))?")) {
...
}

On many systems, it's valid for an email address to just contain a
username, or to contain a hostname without a dot in it, and the above
regex handles those possibilities (assuming I got it right, I didn't
test it.) Unless the miniscule performance hit of compiling the regex
is really a problem, I think this is the most robust solution, and it's
the most straightforward. Of course this assumes you're using Java 1.4+
and the programmer knows regexes.
 
C

Chris Smith

Darryl said:
His code also creates a local variable with the necessary alteration tot he
local stack and necessary clean-up work. My code does not. My code instead
has a single additional function call that, on an optimizing VM, is far
quicker.

Darryl,

Based on the assumption that the VM implementation is remotely well-
written, this simply won't matter one way or the other. During the
generation of native machine code for a register-based machine, all
temporary results of subexpressions are going to be assigned to
temporary variables anyway, and then register allocation will be done to
eliminate whatever stack variables are most helpful to eliminate. By
the time register allocation is performed, the optimizing code generator
has already forgotten whether that variable was declared as a variable
in the Java source, or was simply the result of a subexpression.

So I wouldn't consider use of local variables more or less important
either way with respect to performance. That frees them up to be used
according to readability, according to the general readability naming
rule: "can I come up with a name that expresses the purpose of this
value better than the expression that produces it?"

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
D

Darryl L. Pierce

Chris said:
Based on the assumption that the VM implementation is remotely well-
written, this simply won't matter one way or the other. During the
generation of native machine code for a register-based machine, all
temporary results of subexpressions are going to be assigned to
temporary variables anyway, and then register allocation will be done to
eliminate whatever stack variables are most helpful to eliminate. By
the time register allocation is performed, the optimizing code generator
has already forgotten whether that variable was declared as a variable
in the Java source, or was simply the result of a subexpression.

So I wouldn't consider use of local variables more or less important
either way with respect to performance. That frees them up to be used
according to readability, according to the general readability naming
rule: "can I come up with a name that expresses the purpose of this
value better than the expression that produces it?"

Quite right. My bad.
 
J

Jesper Nordenberg

Timo Kinnunen said:
private void test() {
Thread thread = new Thread(this);
thread.start();

sleep(10);
long startTime = System.currentTimeMillis();
int startCount = count;
sleep(30);
long endTime = System.currentTimeMillis();
int endCount = count;
ending = true;

try { thread.join(); } catch(InterruptedException ignored) {}

Hmmm, interesting way to perform a benchmark. Should work though.
In 30003 ms: 129879845 hisWay parses, and BTW, flag is false

In 30003 ms: 92840834 myWay parses, and BTW, flag is true

In 30004 ms: 100740326 hisWay parses, and BTW, flag is true

In 30003 ms: 75937227 myWay parses, and BTW, flag is true

This gives hisWay 39.9% and 32.7% edges, respectively, over myWay.
Interestingly, 18 of the 20 sample email addresses I used favored
myWay in the amount needed to scan.

The client JVM results are similar to mine, but it's interesting that
your server JVM results are worse. In my the myWay version is twice as
fast when run in the server JVM.

/Jesper Nordenberg
 
J

Jesper Nordenberg

Darryl L. Pierce said:
The code I posted wasn't c-n-p from our source tree but just what I typed
in. Our real code uses char searches.

Ok, it's still an optimization though.
Our search looks for the last index of '.' to compare to the index for '@'.

Yes, and it the worst case finding the last index of '.' can search
beyond the '@' char.
At this point, it's been speculation.

The statement "My code instead has a single additional function call
that, on an optimizing VM, is far quicker." doesn't sound like
speculation to me.
So one too 2ms each and the other took 3ms?

No, I tested the 1000 email addresses 10000 times.
Care to post your code?

Sure. The generated email addresses might not be a 100% accurate model
of a real data set, but there are some addresses without '@' and some
without '.'. Here's my code:

import java.util.Random;

public class Main {
private static void test1(String[] addresses, boolean[] results) {
for (int i = 0; i < addresses.length; i++) {
String address = addresses;

if (address != null) {
int index = address.indexOf('@');
if (index > 0 && address.indexOf('.', index + 1) >= 0) {
results = true;
}
}
}
}

private static void test2(String[] addresses, boolean[] results) {
for (int i = 0; i < addresses.length; i++) {
String address = addresses;

if (address != null && address.length() > 0 &&
(address.indexOf("@") > 0 &&
address.lastIndexOf(".") > address.indexOf("@")))
{
results = true;
}
}
}

public static final void main(String[] args) throws Exception {
String[] addresses = new String[1000];
boolean[] results = new boolean[addresses.length];
Random r = new Random();

for (int i=0; i<addresses.length; i++) {
addresses = "name" + r.nextInt();

if (r.nextInt(10) >= 3)
addresses += "@";

for (int j=0; j<r.nextInt(4); j++) {
addresses += (j > 0 ? "." : "") + "host" + r.nextInt();
}
}

for (int i=0; i<1000; i++)
test1(addresses, results);

long time = System.currentTimeMillis();

for (int i=0; i<10000; i++)
test1(addresses, results);

System.out.println("Test1: " + (System.currentTimeMillis() - time)
+ " ms");

for (int i=0; i<1000; i++)
test2(addresses, results);

time = System.currentTimeMillis();

for (int i=0; i<10000; i++)
test2(addresses, results);

System.out.println("Test2: " + (System.currentTimeMillis() - time)
+ " ms");
}
}

The results running this on my Athlon 1400 MHz:

j2sdk1.4.2_02\bin\javaw.exe:
Test1: 2254 ms
Test2: 3976 ms

j2sdk1.4.2_02\bin\javaw.exe -server:
Test1: 1382 ms
Test2: 3175 ms

My algorithm seems to be much easier to optimize for the server JVM.

/Jesper Nordenberg
 
D

Darryl L. Pierce

Jesper said:
Ok, it's still an optimization though.
Agreed.


Yes, and it the worst case finding the last index of '.' can search
beyond the '@' char.

Yes, it can. I believe it to be no less efficient than creating a substring
and searching that. By searching for the last index, we don't create an
intermediate string to search, which your algorithm does.
The statement "My code instead has a single additional function call
that, on an optimizing VM, is far quicker." doesn't sound like
speculation to me.

It may not sound like it, but it is only speculation at that point.
No, I tested the 1000 email addresses 10000 times.
<snip code>

Your code is using the indexOf(String) call, not the indexOf(char). Can you
re-run your tests with that change and do they provide different results?
The results running this on my Athlon 1400 MHz:
<snip>

I reversed the methods (tested my algorithm and then yours) and the numbers
changed, moving closer together, around 2000ms each (VMware w/ 256M and
Blackdown JVM 1.4.2)
 
S

skeptic

Joona I Palaste said:
Can you think of examples of Java code that are signs of programmer
stupidity? I'll start:

synchronized (new Object()) {
/* ... */
}

The synchronisation is utterly pointless in this case.

And no a single more good example.. WTF?

Regards
 
C

Chris Smith

Darryl said:
Yes, it can. I believe it to be no less efficient than creating a substring
and searching that. By searching for the last index, we don't create an
intermediate string to search, which your algorithm does.

This is, of course, going to depend on test cases. If you're testing
extremely long strings with the @ close to the end and no dots at all,
then clearly the constant-time substring operation will be overwhelmed
by the linear-time additional search distance. In cases where the dot
lies after the @ (best case for not doing the substring), the version
with substring will be slower by a small constant increase (not even a
constant factor) in execution time.

This also raises questions about importance of efficiency; the advantage
for the substring lies in failure cases, which tend to be comparatively
unimportant with regard to efficiency in many applications. The
advantage for the non-substring code, while being a far smaller
advantage, occurs in the mainline case rather than the failure case.

In other words, there is no "best" answer without more description of
the problem.

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
M

Mike Baranczak

Darryl L. Pierce said:
To not waste cycles creating a variable that may not be used, and to have
the check fail quickly if there's no '@' in the address being checked. If
there's no "@", the expression ends there.


Perhaps not, but have you gone over the whole range of possible values being
checked before coming to that conclusion? And, have you compared the
efficiency to the assumed "create a variable and check that to avoid an
extra function call" performance boost?


If my understanding of the JVM is correct, a method call has MUCH more
overhead than creating a local int variable. Also consider how
indexOf(char) is probably implemented: by iterating through each
character in the string until a match is found - this means YET ANOTHER
method call for each iteration.

And wouldn't the compiler have to create the variable anyway? After all,
we need SOME place to store the return value.

-MB
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,124
Latest member
JuniorPell
Top