Which scope for variables being used in a loop?

B

bugnthecode

Hi everyone,

I've been working on an application for work and I'm using a library
provided by someone else. I'm now running into a problem where I'm
running out of memory under certain circumstances and have begun
tracing the problem. Anyway, this made me think about the way I've
coded some of my for loops. Take for instance the following snippet:

public void printCustomers(List<Customer> customers) {
String customerName;
String customerPhone;
String customerLocation;
for(Customer customer : customers) {
customerName = customer.getName();
customerPhone = customer.getPhone();
customerLocation = customer.getLocation();
System.out.println(customerName+customerPhone+customerLocation);
}
}

Now my original thinking was that I would declare the strings outside
of the loop so that I'm not re-creating a reference each time through
the loop; I would just keep re-assigning it. Then I started to think
that maybe this wasn't doing as I expected, and the realized that
those 3 strings stay in scope until the end of the method meaning the
locations would be unavailable to the gc until the method was finished
processing. In the above snippet that really doesn't matter, but what
if I had much more code below the loop.

So my question is which way would be better on memory or performance.
Which is better coding style if performance and memory usage are
negligible either way?

Thanks in advance for your help.
Will
 
M

Manish Pandit

Hi everyone,

I've been working on an application for work and I'm using a library
provided by someone else. I'm now running into a problem where I'm
running out of memory under certain circumstances and have begun
tracing the problem. Anyway, this made me think about the way I've
coded some of my for loops. Take for instance the following snippet:

public void printCustomers(List<Customer> customers) {
String customerName;
String customerPhone;
String customerLocation;
for(Customer customer : customers) {
customerName = customer.getName();
customerPhone = customer.getPhone();
customerLocation = customer.getLocation();
System.out.println(customerName+customerPhone+customerLocation);
}

}

Now my original thinking was that I would declare the strings outside
of the loop so that I'm not re-creating a reference each time through
the loop; I would just keep re-assigning it. Then I started to think
that maybe this wasn't doing as I expected, and the realized that
those 3 strings stay in scope until the end of the method meaning the
locations would be unavailable to the gc until the method was finished
processing. In the above snippet that really doesn't matter, but what
if I had much more code below the loop.

So my question is which way would be better on memory or performance.
Which is better coding style if performance and memory usage are
negligible either way?

Thanks in advance for your help.
Will

In this case, the variables are allocated on the stack, as they are
"local". I do not think this could lead to out of memory (unless the
collection is gigantic). Personally I never like the idea of declaring
variables within a loop, and have not seen a lot of instances where it
is done.

-cheers,
Manish
 
K

Karl Uppiano

[snip]
In this case, the variables are allocated on the stack, as they are
"local". I do not think this could lead to out of memory (unless the
collection is gigantic). Personally I never like the idea of declaring
variables within a loop, and have not seen a lot of instances where it
is done.

I think it happens a lot, for example if someone calls another method from
within the loop.
 
Z

zhengxianfu

I'm now running into a problem where I'm running out of memory under certain circumstances and have >begun tracing the problem.

Did you know under what circumstances ,are u sure the List-typed
customer will big enough to run out of your memory ,if so how about
the place which create this customer list.Maybe somewhere else.

By the way ,if the you can add/override toString() method to class
Customer ,that can avoid re-creation and assign value to local string
object.
 
L

Lew

...
There are plenty of instances where variables are declared inside a loop, and
there are good, solid engineering reasons to do so.

Declaring the variable inside a loop, or any other block, limits its scope to
that block. If it is not needed outside the block, then its scope matches its
use.

The variable remains in the JVM until the end of the stack frame, even after
it goes out of scope; if it isn't nulled then it won't be gced until the
method ends. It is still inaccessible to code outside its block.

Limiting variable scope is a good principle of defensive programming. If a
variable doesn't linger after its use, nor is declared until needed, it has
less chance to make mischief. (Joshua Bloch touches on this in /Effective Java/.)

Use of the for ( T thing : things ) idiom is an example of scope limitation.
The variable "thing" is only in scope for the loop.

- Lew
 
C

Chris Uppal

bugnthecode said:
Now my original thinking was that I would declare the strings outside
of the loop so that I'm not re-creating a reference each time through
the loop; I would just keep re-assigning it. Then I started to think
that maybe this wasn't doing as I expected, and the realized that
those 3 strings stay in scope until the end of the method meaning the
locations would be unavailable to the gc until the method was finished
processing. In the above snippet that really doesn't matter, but what
if I had much more code below the loop.

So my question is which way would be better on memory or performance.
Which is better coding style if performance and memory usage are
negligible either way?

I don't think that there's any doubt that the best coding style is to declare
variables in the most restricted scope possible. Not only does it make the
code more readable, but it reduces the chances of errors by mistakenly re-using
a value which had been set in an earlier passage of code where it was used for
something else. A counter-point to that view is that if your methods are so
long that it makes any difference where you declare variables, then your
methods are too long anyway. There's some truth in that, but it is difficult
to write sensible Java code where methods /are/ properly short (say 2 or 3
lines, not counting the brackets).

From an efficiency POV, it makes no difference at all. None. Zero. Java
bytecode has no concept of a local variable (or stack slot, actually) at a
narrower scope than a method. So the bytecode generated for:

public void printCustomers(List<Customer> customers)
{
String customerName;
String customerPhone;
String customerLocation;
for(Customer customer : customers)
{
customerName = customer.getName();
customerPhone = customer.getPhone();
customerLocation = customer.getLocation();
System.out.println(customerName+customerPhone+customerLocation);
}

and:

public void printCustomers2(List<Customer> customers)
{
for(Customer customer : customers)
{
String customerName = customer.getName();
String customerPhone = customer.getPhone();
String customerLocation = customer.getLocation();
System.out.println(customerName+customerPhone+customerLocation);
}

are essentially identical (as you can verify with javap, if you feel so
inclined). I tried that with the javac from jdk 1.6.0, and the only difference
(for some reason) was that it assigned the variables to stack slots in a
different order.

Note that javac and/or the JIT is at liberty to generate code to null-out stack
slots once they are no longer live, but in fact (as far as I can tell) the
current implementations do not do so (there's to be some #ifdef-ed-out code in
the 1.6 JVM source, which /seems/ as if it might be an experimental
implementation of that idea, but...). However, since there's no difference in
the bytecode, the JIT will, or won't, do that no matter which way you phrase
your code.

-- chris
 
P

Patricia Shanahan

Chris said:
I don't think that there's any doubt that the best coding style is to declare
variables in the most restricted scope possible. Not only does it make the
code more readable, but it reduces the chances of errors by mistakenly re-using
a value which had been set in an earlier passage of code where it was used for
something else. A counter-point to that view is that if your methods are so
long that it makes any difference where you declare variables, then your
methods are too long anyway. There's some truth in that, but it is difficult
to write sensible Java code where methods /are/ properly short (say 2 or 3
lines, not counting the brackets).

I find the most restricted scope rule helpful for keeping down method
length.

It tends to lead to largely self-contained blocks, with all working
variables that belong to the block declared in it, and no unnecessary
sharing of variables between blocks. Those are the easiest blocks to
convert to separate methods during a refactoring pass.

Patricia
 
E

Esmond Pitt

Manish said:
In this case, the variables are allocated on the stack, as they are
"local".

In this case the *references* are allocated on the stack. The objects to
which they refer are allocated on the heap as always.
I do not think this could lead to out of memory (unless the
collection is gigantic).

The effect as written is that the *last* 3 objects allocated can't be
GC'd until the scope they are declared in exist. All the objects
previously allocated inside the loop can be GC'd as soon as the next set
of assignments takes place. So unless one or more of the 3 customer
objects is gigantic this is probably not a memory hog.
Personally I never like the idea of declaring
variables within a loop, and have not seen a lot of instances where it
is done.

I'm the opposite: I favour declaring all variables in the innermost
scope possible, and I haven't seen a lot of instances where it *isn't* done.
 
S

Stefan Ram

Esmond Pitt said:
In this case the *references* are allocated on the stack.

References are values. »Reference« is just short for
»reference value« of which the JLS says »(...) reference
values (...) are pointers« (JLS3, 4.3.1). They can not be
allocated, because they are not storage locations.

Variables are storage locations (JLS3, 4.12), so they
might be allocated.

Reference variables are those entities, which then contain
reference values.
 
F

firstsql

I don't think that there's any doubt that the best coding style is to declare variables in the most restricted scope possible. Not only does it make the
code more readable, but it reduces the chances of errors by mistakenly re-using
a value which had been set in an earlier passage of code where it wasusedfor
something else. A counter-point to that view is that if your methods are so
long that it makes any difference where you declare variables, then your
methods are too long anyway. There's some truth in that, but it is difficult
to write sensible Java code where methods /are/ properly short (say 2 or 3
lines, not counting the brackets).

From an efficiency POV, it makes no difference at all. None. Zero. Java
bytecode has no concept of a local variable (or stack slot, actually) at a
narrower scope than a method. So the bytecode generated for:

Actually, there is some difference. Sun's javac will reuse slots
between parallel blocks within a method. So, there can be a coupla
efficiency advantages (albeit minor) to most restricted scope:

1) The stack frame for the method will be smaller, perhaps important
for recursive uses, and

2) Reference slots for earlier blocks may be overwritten, allowing
early garbage collection of the referenced objects.
 
C

Chris Uppal

(e-mail address removed) wrote:

[me:]
From an efficiency POV, it makes no difference at all. None. Zero.
Java
bytecode has no concept of a local variable (or stack slot, actually)
at a
narrower scope than a method. So the bytecode generated for:

Actually, there is some difference. Sun's javac will reuse slots
between parallel blocks within a method. So [...]

That's true. I should have mentioned it myself. Thanks.

-- chris
 
D

dagarwal82

How about declaring "String customerName; String customerPhone;
String customerLocation;" outside the loop with String Buffer. I
mean :-

StringBuffer customerName
StringBuffer customerPhone
StringBuffer customerLocation

now even if you have 1tera billion records in the list , there will be
a single instance of these variables.
 
L

Lew

How about declaring "String customerName; String customerPhone;
String customerLocation;" outside the loop with String Buffer. I
mean :-

StringBuffer customerName
StringBuffer customerPhone
StringBuffer customerLocation

now even if you have 1tera billion records in the list , there will be
a single instance of these variables.

Actually, there won't. Variables don't have instances.

There is an engineering principle related to re-use of instances. Compare

public Foo do( Foo foo )
{
Foo fooToo = foo;
fooToo.setProperty( getAValue() );
return fooToo;
}

to

public Foo do( Foo foo )
{
Foo fooToo = foo.clone();
fooToo.setProperty( getAValue() );
return fooToo;
}

The latter produces two instances of Foo, the former uses only the one. The
number of variables is the same.

There are reasons to declare variables outside a loop, either because you need
a wider scope or syntactic reasons.

parallel loop control variables
for ( int i=0, Iterator iter = coll.iterator(); iter.hasNext(); ++i )
not allowed - declare one before the loop.

- Lew
 
C

Chris Uppal

NT said:
Escape Analysis in JSE6 might rule it out, i.e. allocate them on
stack, for small objects.

Do you know if that feature is actually part of JDK 1.6.0 ? I remember hearing
that it was mooted for inclusion in this cycle, but haven't seen anything to
say whether it actually made it in to the final cut. (It's something I've been
looking forward to playing with)

-- chris
 
C

Chris Uppal

Esmond Pitt wrote:

[me:]

Thanks for the link (not a bad article, if somewhat overstated IMO), but
unfortunately it's dated 2005-09-27, so it can only be talking about what was
then /hoped/ would make 1.6 final. I'm looking for some sort of corroboration
that it /did/ make 1.6 final (or, alternatively, that it didn't).

-- chris
 
R

robert maas, see http://tinyurl.com/uh3t

From: Esmond Pitt said:
The effect as written is that the *last* 3 objects allocated
can't be GC'd until the scope they are declared in exist.

Yes, I agree. In a cases where one of those last 3 objects might be
really big, assuming the programmer can anticipate that as a
likelihood, it would seem prudent to explictly null out the
relevant local pointers just before reaching the bottom of the
inside of the loop. That wouldn't cause any practical difference
for any but the last 3, because each group of 3 pointers (except
the last group) immediately gets overwritten by the next group of 3
when the loop repeats, but it'd protect the last three from being
held too long. I'm assuming what somebody else said, that in fact
the stack frame is created upon entry to the method, to hold the
longest chain of local variables in the whole block, so even though
the variables declared in the inner loop-block go out of scope they
are still unavailable to be garbage collected until the whole
method returns.

And I agree that the variables should be declared within the inner
loop-block, rather than out in the main body of the method, for
several reasons (defense against accidently re-using a value later
in the same method, and allowing local allocations in one loop to
overlay local allocations in another loop. Hence you can't null-out
the three pointers just once after the loop exits, because the
variables are out of scope already, still held by the GC yet
untouchable by the programmer! (Hey, maybe loops should allow a
"finally" clause which performs actions after the very last time
through the loop but before local variables go out of scope?? Nah,
too ugly. How about a "finally" or "volatile" clause on the local
declaration itself, such that the pointer-value of the variable is
forcibly nulled-out at the moment just before the variable goes out
of scope the last time through the loop? Nah, impractical. How
about a "volatile" declaration for the entire *loop* (not the block
inside the loop), such that all stack values allocated by any block
within that loop are nulled out upon exit from the loop? I think
that would be logically consistent, clean, and doable.)
I favour declaring all variables in the innermost scope possible,

I agree completely.

But note such a programming style precludes this loop-search pattern:
for (int ix=0; ix<=max && obj.elementAt(ix)!=target; ix++) {}
if (ix>max) then return(<didn'tfindvalue>);
else return(<foundvalue>);
because the post-loop test can't work because ix is already out of scope.

But I always preferred this style anyway:
for (int ix=0; ; ix++) {
if (ix>max) return(<didn'tfindvalue>);
if (obj.elementAt(ix)==target) return(<foundvalue(ix)>);
}

And who would really prefer this half-of-each pattern?
for (int ix=0; ix<=max; ix++) {
if (obj.elementAt(ix)==target) return(<foundvalue(ix)>);
}
return(<didn'tfindvalue>);
What I don't like about that is that one return is after the loop,
whereby maintainers might not notice the *other* return from inside
the loop.

And I never liked this pattern either:
bool success=false; int savedix;
for (int ix=0; ix<=max; ix++) {
if (obj.elementAt(ix)==target) {
success=true; savedix=ix; break;
}
}
if success then return(<foundvalue(savedix)>)
else return(didn'tfindvalue>);

Although I did find myself forced to invent that last horrible
pattern very long ago in another language that is overly (IMO)
restrictive about where you can do what. No, it wasn't pascal.
 
N

NT

Esmond Pitt wrote:

[me:]


Thanks for the link (not a bad article, if somewhat overstated IMO), but
unfortunately it's dated 2005-09-27, so it can only be talking about what was
then /hoped/ would make 1.6 final. I'm looking for some sort of corroboration
that it /did/ make 1.6 final (or, alternatively, that it didn't).

-- chris

You may want to try these switches with mustang XX:
+PrintEscapeAnalysis, -XX:+DoEscapeAnalysis, -XX:+PrintOptoAssembly.

Nirav Thaker
http://niravthaker.blogspot.com
 
C

Chris Uppal

NT wrote:

[me:]
You may want to try these switches with mustang XX:
+PrintEscapeAnalysis, -XX:+DoEscapeAnalysis, -XX:+PrintOptoAssembly.

Great! Thank you very much.

(BTW, only the -XX:+DoEscapeAnalysis is enabled in production builds -- at
least the server JVM rejects the other options when I try them. Not a big
deal...)


The first benchmark I was hoping it would have some significant impact on is
from John Harrop's comparison of various languages at:
http://www.ffconsultancy.com/free/ray_tracer/languages.html
in which I believe Java suffers because the Java expression of the ray-tracing
algorithms creates huge numbers of transient vector objects. Sadly, it seems
that -XX:+DoEscapeAnalysis and -XX:-DoEscapeAnalysis have no measurable effect
on the execution speed of that code. Oh well...

Now I shall have to try to invent some benchmark where it /does/ make a
difference...

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,186
Latest member
vinaykumar_nevatia

Latest Threads

Top