How java passes object references?


P

pek

OK.. Assuming that pass-by-reference means that the following code
would work:

public void swap(Human h) {
h = new Human();
}

Which everybody knows, it doesn't in Java. I'm guessing this works
with C++. Even if it doesn't in C++ (I wouldn't know), I'll be using C+
+ as a language that passes by reference for the sake of my questions.
Now, I know Java passes object references by value, but how exactly
does it do it in memory?

Correct me if I'm wrong anywhere. This is what I think. Suppose we
have the following code:

....
Human h; // (1)
h = new Human(); // (2)
change(h);
....
public void change(Human o) { // (3)
o = new Human(); // (4)
}

In Java:

1. Allocates a small memory block which will be used to point to an
object of type Human

2. Allocates memory for the newly create Human object and changes the
value of the memory block of reference h (which was previously null)
to point to the new Human

3. When called, it allocates memory for the local reference variable o
and copies the pointer of the passed reference, thus, both the called
and the local point at the same object

4. Allocates memory for the newly created Human object and changes the
pointer of the local reference o to it, thus, any changes to o don't
affect the passed reference (h).


So, right before the method change() ends, the memory would allocate:
two memory blocks for each object (the one from main and the one from
change) and two reference variables that have pointers that point to
each object.

Because I don't know C++, I'll use the code from Java and talk about
memory allocation as I think. So, in C++:

1. Nothing happens, I don't know if this actually works. I don't know
how pointers work in memory.

2. Again, memory is allocated for the newly created object and now h
is a pointer pointing at the memory block.

3. When called, the memory location of the passed pointer will copied
to o, thus, both pointing at the same object

4. ??? What happens here???? h and o are both pointers that point to
the same object. If I change o to point to another object, how does h
know about it? What about the previous object?

So, right before change() ends the memory allocates: two objects for h
and o (with no pointers at h, and now it must be garbage
collected....which won't) and....what? Two pointers? Does a pointer
allocate memory?

Am I right about the memory allocations in the Java code? What about
pointers in C++? Do they allocate any memory space? If they don't, how
does it store the pointers memory location? What are pointers in terms
of memory?

I hope I made my questions as clear as possible. Unfortunately, I
can't post an image to illustrate my point. I'm trying to create a
slide about Pass-by-value, Pass-by-reference and Pass-reference-by-
value. So I need this information in order to create a good
illustration of the concepts (which unfortunately I didn't find
anywhere on the internet).

I would appreciate any comments, suggestions, websites... Anything
actually. Thank you very much for taking time to read this and for any
help in advance.

Regards,
Panagiotis
 
Ad

Advertisements

P

pek

First of all, Pete, thank you enormously for your long, enlightening
reply. Loved it. And I found it amazing that you actually understood
what was I talking about! But, unlucky for you, I have some follow up
questions that, if you have time, I would like hear your answers.
The one clarification that I think might help is when you write "allocates
a small memory block", what we are generally talking about is either a
local variable that has already been allocated on the stack when the
method was entered, or a member variable of a class that was already
allocated either when the class was loaded (for static members) or when an
instance of the class was created (for instance members).

In other words, variables can be stored in a variety of places and while
in some sense they are allocated individually, it's usually more correct
to think of them as being a specific location in a larger block of memory
that was allocated for a specific purpose (e.g. stack frame, class,
instance of a class).

The reason I think this is a useful clarification is that when you got to
the part about how passing by reference might work, it seems you went off
track at least partly because you didn't understand the nature of the
above. Specifically (looking at the steps you described for that
hypothetical passing by-reference language):

So, what I understood is memory allocations could be "visualized" as
blocks containing smaller blocks that also contain smaller etc. So,
for instance, The Heap is a block that contains instances of objects,
which are also blocks that contain instance members and method blocks
that contain local variables. At least that is what I understood. If
this is the case then I have a pretty good start for my presentation
slides. ;)

[...]
1. Nothing happens, I don't know if this actually works. I don't know
how pointers work in memory.

Whether a language passes by reference or by value, variables still need
to be allocated somehow. Step 1 is the same here as it would be for
C++ or other languages. If I assume that in your original code example,
"Human h" is a local variable in a method, the storage for that variable
is in a stack frame is created when execution enters the function.
2. Again, memory is allocated for the newly created object and now h
is a pointer pointing at the memory block.

Yes, this step is also the same.
3. When called, the memory location of the passed pointer will copied
to o, thus, both pointing at the same object

This is where you get derailed, because you've made incorrect assumptions
about C++.

One assumption you've made is that C++ handles object construction the
same way as Java. It doesn't. In particular, C++ doesn't have the same
way of looking at dynamically allocated objects that Java does. A
variable of type Human isn't going to be a reference to an instance of
Human. It's going to be an actual Human instance. If you declare that as
a local variable in a function in C++, then the instance will be allocated
on the stack.

Another assumption you've made is that C++ uses passing by reference by
default. C++ does support passing by reference, but it's not the
default. If you want passing by reference, you would need to declare your
method as such:

void change(Human &o)
{
o = new Human();
}

In that case, yes...rather than a copy of the parameter being passed, a
reference to the actual parameter is passed instead. Except that since,
in addition to these other differences, C++ uses the type name to declare
the actual storage for the instance, you'd be saying that you want to pass
a reference to the instance of Human.

A more typical usage in C++ might be something like this:

Type declarations:

class Human
{
// ...
}

// I'm using a typedef to keep the parameter syntax simpler.
// All this does is create a new type that is defined to be
// a pointer to the class Human.
typedef Human *PHuman;

Caller:

PHuman h = new Human();

change(h);

Callee:

void change(PHuman &o)
{
o = new Human();
}

In that example, a reference to the storage used by the "h" variable is
passed to the function, and the compiler translates any usage of the
parameter "o" to dereference that reference and access the storage
directly. Thus when the code assigns a new instance of Human to the local
parameter "o", that reference to the new instance is actually being copied
into the original storage used by "h".

So, if I'm getting this correctly, when C++ compiles the code, it sees
that o points to h which points to the actual instance of the object
in memory. So, every time it does anything to o it follows this route
in order to accomplish this change. So, in the same sense, if change()
would call again another method that also expects a Human class and
passes o, the compiler would have to do one more step to get to the
instance in memory. Am I correct here?

So when allocating local memory for o, it would simply allocate a
pointer pointing at h which is also a pointer pointing at the instance
variable. If this is correct, I'm getting a pretty good picture right
now. :D
A better comparison might be to use C# instead. C# is much more similar
to Java (it in fact borrows quite a lot from Java), but unlike Java it
does support passing by reference. In particular, C# has reference types
the same way that Java does, and so the syntax is a lot more similar.

In particular, in C# the syntax you've shown in your post will do
_exactly_ the same thing in C# as it would in Java. If you want to pass
by reference, you still need to do so explicitly (as in C++). In C#, it
would look something like this though:

void change(ref Human o)
{
o = new Human();
}

The method would be called like this:

change(ref h);

The "ref" keyword tells the compiler to pass the parameter by reference.
It's required not only in the method declaration itself, but also when you
call (this ensures that callers don't find themselves passing something by
reference without knowing it).

So, I'm assuming that C#, as opposed to Java, doesn't have pass-by-
object-reference.
So, here's the crux of your question I guess. :)

As I mentioned above, the parameter passed by reference isn't a pointer to
the object. It's a pointer to a pointer to the object. That is, it's a
pointer to the variable "h". The pointer is always dereferenced when
used; that is, in the code you write you never have direct access to the
pointer itself...only to what the pointer points to.

So when you write "o = new Human()" when passing by reference, you're not
actually changing the local variable in the method. The compiler is
generating code "behind the scenes" that causes the variable that was used
as the parameter to be changed instead.


No new memory allocations are done, other than the one that created the
new instance (i.e. "new Human()").

This is what I was talking about at the top of this article. Assignments
to local variables, or even to class members, do not allocate memory.
They simply copy values from one place to another. In this case, the
value is a reference (pointer) to an instance. In the change() method,
the assignment to "o" has the effect of copying the new instance reference
into the original variable that was used as the parameter.

It doesn't change "o", not in the sense that "o" represents your local
variable. When you pass by reference, the compiler is hiding from you the
fact that when you write "o", you're actually using "o" as an alias for
the actual parameter.

I think I have just learned a fundamental difference about what a
class variable actually is in Java and, probably, values and
variables. So, what I understood here is that "o" in C++ is either:
A) If it's passed by value, it is a variable that its' value is
allocated into its' respective context in memory (namely, local, class
member etc.)
B) If it's passed by reference, it is a variable that it doesn't
actually has a value, rather, a pointer that either points to the
allocated memory or to another pointer

While in Java, class variables aren't either one of C++'s. It's rather
a combination. A class variables' value is a reference. This means
that when I change the value of a variable, I assign it with a new
reference. References allocate memory and are basically pointers
pointing to an object instance in memory. So, Java has something like
an additional "layer" where reference are between a variable and it's
value. And when passing here and there a variable, you are actually
copying its' value, which is a reference. It's something like saying
that all class variables in Java are of type Reference with a value of
an object (super-extraordinary-oversimplified).

Please tell me this is true. I think I'm getting somewhere. :p
If you want to know exactly how C++ works, you're probably better off
posting your question in a C++ newsgroup. That said, C++ isn't really so
different from Java in basic concept. Ignoring the actual implementation,
from a paradigm point of view the main difference is that C++ is always
explicit about its references, whereas Java (and C#) is implicit. C++ can
have a variable that _is_ a class instance, and a variable that points to
a class instance has to be declared explicitly (e.g. "Human *h" would
declare a pointer to an instance of Human). Java cannot have variables
that are class instances; they can only refer to class instance and all
class instances are allocated dynamically (i.e. never on the stack or as a
fully-contained member of some other class, two things that C++ does
support).

I new that defining *h is a reference and forgot to mention it. I
thought you wouldn't understand and stop, but lucky for me, you
explained everything I wanted.
One of the best descriptions I've seen on the topic is Jon Skeet's article
on parameter passing. It's actually written from a C# perspective, but
since Java and C# are so similar, and since C# actually does support
passing by reference, it may be worth looking at for you:http://www.yoda.arachsys.com/csharp/parameters.html

I started reading the link but it started to confuse me, so I stopped
and posted you these questions instead. I hope you are still with me.
I think you cleared some things better than that site. ;)

In fact, given that Java doesn't support passing by reference, I'm a bit
confused as to why the question wound up here. :) But hopefully the
above has given some explanation that's useful, which I guess you wouldn't
have gotten if you hadn't posted here (or at least somewhere that I read
:) ). So I can't really complain too much about it. :)

Pete

Well, I as I tried to explain, I'm trying to write a presentation
about what happens under the hood about pass-by-value, pass-by-
reference and (as I think of a better way to name Java's own) pass-
reference-by-value. I want to create an illustration about memory
allocations in all three situations and what Java does and why.

Once again, I can't thank you enough... Wait.. I think I can. Since
this "research" will also wind up on my blog, how about a reference to
your blog/website? You've helped me extremely. Mines is (and I hope I
don't get filtered for this) http://pekalicious.treazy.com.

Panagiotis
 
J

Joshua Cranmer

pek said:
Correct me if I'm wrong anywhere. This is what I think. Suppose we
have the following code:

...
Human h; // (1)
h = new Human(); // (2)
change(h);
...
public void change(Human o) { // (3)
o = new Human(); // (4)
}

In Java:

Assuming no optimization, either at runtime or compile-time:
1. Allocates a small memory block which will be used to point to an
object of type Human

Saves a space in the local variable registers of the function for h.
2. Allocates memory for the newly create Human object and changes the
value of the memory block of reference h (which was previously null)
to point to the new Human

Constructs a new `Human' object, preinitializes it, calls the
constructor, and then marks the local variable register as pointing to it.
3. When called, it allocates memory for the local reference variable o
and copies the pointer of the passed reference, thus, both the called
and the local point at the same object

The argument causes a space in the local variable register for the
function change. In the process of the call, this register is set to the
same value of h, i.e., the handle to the actual `Human' object.
4. Allocates memory for the newly created Human object and changes the
pointer of the local reference o to it, thus, any changes to o don't
affect the passed reference (h).

Does the same thing as Step 2 to the local variable register in the
function change. Since the register is merely a handle to the object,
the two registers here (`h' and `o') now point to different objects.
So, right before the method change() ends, the memory would allocate:
two memory blocks for each object (the one from main and the one from
change) and two reference variables that have pointers that point to
each object.

The "reference variables" are probably allocated on the stack and not
the heap, so the term `allocation' doesn't apply there, but otherwise
correct.
Because I don't know C++, I'll use the code from Java and talk about
memory allocation as I think. So, in C++:

C++ version (I am assuming you want the incorrect translation that
changes Java's pass-by-value to a pass-by-reference, from your subject;
I'm also assuming stack variables for pass-by-reference as opposed to
heap-allocated variables):
....
Human h;
change(h);
....
void change(Human &o) {
// Not quite the same, since I'm invoking a copy constructor here.
o = Human();
}
1. Nothing happens, I don't know if this actually works. I don't know
how pointers work in memory.

`Human h;' declares a variable h, on the stack, of the class `Human' and
calls the default constructor of h.
2. Again, memory is allocated for the newly created object and now h
is a pointer pointing at the memory block.

[ Folded into #1 under my assumed C++ version. ]
Stack -> no memory allocation, unless Human does so.
3. When called, the memory location of the passed pointer will copied
to o, thus, both pointing at the same object

Technically, what happens is that the argument `o' is made an alias to
object `h'. Practically, the references are translated to pointers at
runtime, so that `o' in some way refers to the address of `h'. ISO C++
gives compilers a lot of latitude in terms of how references are
actually implemented.
4. ??? What happens here???? h and o are both pointers that point to
the same object. If I change o to point to another object, how does h
know about it? What about the previous object?

`h' and `o' are effectively different names for the same objects. So `&h
== &o' should be true throughout the entire function `change'.
Effectively, what the translated C++ version is doing is creating a new
`Human' object, and assigning it via the copy constructor to the slot to
which `h' and `o' both point.
So, right before change() ends the memory allocates: two objects for h
and o (with no pointers at h, and now it must be garbage
collected....which won't) and....what? Two pointers? Does a pointer
allocate memory?

C++, at least without external libraries, doesn't have garbage
collection. Anyways, two constructions of `Human' occur (since I've
elicited pointers from the C++ example, allocation per se doesn't
occur), whereas the two variables are both referring to the same
location of memory. Both of the variables are on the stack, however, and
typically don't count as allocation.
Am I right about the memory allocations in the Java code? What about
pointers in C++? Do they allocate any memory space? If they don't, how
does it store the pointers memory location? What are pointers in terms
of memory?

Modulo the difference between stack and heap, you are right.
I hope I made my questions as clear as possible. Unfortunately, I
can't post an image to illustrate my point. I'm trying to create a
slide about Pass-by-value, Pass-by-reference and Pass-reference-by-
value. So I need this information in order to create a good
illustration of the concepts (which unfortunately I didn't find
anywhere on the internet).

Roedy's site is probably the best online explanation of
pass-by-value/pass-by-reference, at least for Java, if not for most
languages in general.
 
C

Chase Preuninger

everything that extends Object is passed by reference and all
primitive types are passed by value.
 
A

Arne Vajhøj

Chase said:
everything that extends Object is passed by reference and all
primitive types are passed by value.

No.

Objects are references passed by value.

In many cases the difference does not matter,
but it is easy to come up with examples where it does
matter.

And I believe it has been explained in previous
posts, which you should study.

Arne
 
Ad

Advertisements

P

pek

OK. Let my start by saying some assumptions I made:
A)I thought that everything is stored in memory while a program is
executing. I just found out that there are two different "places", the
stack and the heap. So heap is the actually memory. What is stack?
B)That C++ has only one way of pass-by-reference. A friend of mine
told me that C++ has two: using * and using &. He said something about
pointers and references. I didn't quite get it.
Saves a space in the local variable registers of the function for h.
When you say registers, are you talking about CPU registers? Or am I
going too far?
Java doesn't have pass by reference, and "pass-by-object-reference" is not a
technical term. If you're going to use approximate or fanciful terms, how
about, "Java has pass by pointer"? At least then you won't risk confusing
people who think you're saying, "pass by reference", which Java doesn't have.
OK, I totally agree. No fanciful terms.
-- pass non-pointer by value
-- pass pointer by value
-- pass non-pointer by reference
-- pass pointer by reference
This opened my eyes.. :p

OK. I think everything is clear now, other than my first two
questions, I believe everything is clear now. Thank so much.
everything that extends Object is passed by reference and all
primitive types are passed by value.

Did you even read anything from this post other than the subject?
 
M

Mark Space

pek said:
OK. Let my start by saying some assumptions I made:
A)I thought that everything is stored in memory while a program is
executing. I just found out that there are two different "places", the
stack and the heap. So heap is the actually memory. What is stack?


A stack is exactly the data structure you learned about in your basic
data structures class. CPUs have for a very long time supported this
type of structure directly with hardware.

When you get into compilers and linkers, you'll see how the stack is
used to store return values and arguments for methods (subroutines).

It's simple, but would take a bit of time to actually type out and
explain. Can you ask a professor for a quick look ahead lecture? Or
maybe borrow a book (library?) on compliers and linker programs and read
through the basics?

B)That C++ has only one way of pass-by-reference. A friend of mine
told me that C++ has two: using * and using &. He said something about
pointers and references. I didn't quite get it.


Your friend is basically correct. Technically, pointers are passed by
value. A pointer *is* a reference, however, so you can accomplish the
same thing as pass by reference with a pointer.

And to complicate further, you can pass a pointer to a pointer, or a
reference to a pointer.... (but not in Java... at least not directly).

When you say registers, are you talking about CPU registers? Or am I
going too far?


Yup, literal CPU registers. Or maybe some close equivalent -- modern
designs use cache as a near equivalent of a register.

So, for memory bits we have: stack, heap, text segment (code), register
file, cache, ... how about memory mapped segments ... virtual memory
(backing store), .... io buffers (I mean in hardware, not main memory)
.... IO space ... then there's networked storage.... :). Having fun yet?

OK, I totally agree. No fanciful terms.

This opened my eyes.. :p

And Java only uses the first two.
 
Ad

Advertisements


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top