Objects are slow ?

T

Thomas Hawtin

ali said:
i need to build a huge 2 dimention Array of an object

this Object contain : int,
int,
boolean,
boolean ;

and since the array is huge we need the processes on the array be very
fast

so instead of doing it as one 2dimention array of that object

i do 4 2dimention arrays int, int , boolean, boolean
and do the process on them

will that be faster ???

It will almost definitely be slower.

Java doesn't really have two-dimensional arrays. Instead it has arrays
of arrays. So instead of having around n objects, for huge n, you now
have 2n small arrays. Not so good.

If you want to compact you data, use large single dimension arrays. And
profile your application of course.

Tom Hawtin
 
T

Thomas Hawtin

of arrays. So instead of having around n objects, for huge n, you now
have 2n small arrays. Not so good.

Oops. 2D array of objects vs multiple 2D arrays of primitives. Still, if
I was going for compactness, I'd go for multiple 1D arrays (or probably
1 1D array).

Tom Hawtin
 
A

ali

well i always read people writting about objects make software slow

umm currently i ma having this problem

i need to build a huge 2 dimention Array of an object

this Object contain : int,
int,
boolean,
boolean ;

and since the array is huge we need the processes on the array be very
fast

so instead of doing it as one 2dimention array of that object

i do 4 2dimention arrays int, int , boolean, boolean
and do the process on them

will that be faster ???
 
A

AndrewMcDonagh

ali said:
well i always read people writting about objects make software slow

Using objects wont impact the speed - that argument is a red herring for
your problem.
umm currently i ma having this problem

i need to build a huge 2 dimention Array of an object

this Object contain : int,
int,
boolean,
boolean ;

and since the array is huge we need the processes on the array be very
fast

so instead of doing it as one 2dimention array of that object

i do 4 2dimention arrays int, int , boolean, boolean
and do the process on them

will that be faster ???

Q) how long is a piece of string?

A) Measure it and you will find out.



Other than that, do consider that depending upon what you want to do
with the collection once its populated, different types of collections
can give you better performance.
 
P

Patricia Shanahan

AndrewMcDonagh said:
Using objects wont impact the speed - that argument is a red herring for
your problem.

Not necessarily. The overhead for the object, object alignment, and
references will add a few bytes per element. Considering that the
payload is only 10 bytes, that may be a significant cost in cache space
and memory access bandwidth.

Q) how long is a piece of string?

A) Measure it and you will find out.



Other than that, do consider that depending upon what you want to do
with the collection once its populated, different types of collections
can give you better performance.

The most important step at this point is to isolate the structure in its
own class. That way, it will be possible to experiment with alternative
designs, without rewriting the whole program.

Patricia
 
A

AndrewMcDonagh

Patricia said:
Not necessarily. The overhead for the object, object alignment, and
references will add a few bytes per element. Considering that the
payload is only 10 bytes, that may be a significant cost in cache space
and memory access bandwidth.

True, but such micro optimizations are rarely useful. Performance
problems are typically in other areas.

The only true way to find out is to measure.
The most important step at this point is to isolate the structure in its
own class. That way, it will be possible to experiment with alternative
designs, without rewriting the whole program.

Agreed, and once you have a dataholder for the 4 values, it can be used
within different collections to see which is appropriate.
 
P

Patricia Shanahan

AndrewMcDonagh said:
True, but such micro optimizations are rarely useful. Performance
problems are typically in other areas.

It depends. I've done enough work with large arrays, especially in
Fortran, that I've seen situations in which micro-optimization changed a
three day job to a few hours.

If 99.9% of the time in the unoptimized program really is in a few array
tasks, optimizing array access does matter.
The only true way to find out is to measure.

Absolutely, totally agree.
Agreed, and once you have a dataholder for the 4 values, it can be used
within different collections to see which is appropriate.

I would not even externalize the idea of the 4 value data holder.

It may be that there are a significant number of operations that need to
work along a particular dimension of the first int array. In that case,
four primitive arrays may be most efficient.

Patricia
 
B

blmblm

It depends. I've done enough work with large arrays, especially in
Fortran, that I've seen situations in which micro-optimization changed a
three day job to a few hours.

If 99.9% of the time in the unoptimized program really is in a few array
tasks, optimizing array access does matter.

Seconded. It can indeed make a huge difference.

The real issue, as I understand it, is making effective use of cache,
and a program that accesses memory words in contiguous order (rather
than jumping around) is likely to do better there.

The example that's commonly used in other languages (Patricia,
you almost surely know this, but the OP might not) is a nested loop
over a 2D array; typically one way of nesting the loops gives much
much better performance than the other. Seems to work that way
in Java too, based on a quick experiment. (Exactly how much
almost surely depends on the details of the system, but my little
test program showed performance differences of over an order of
magnitude. !)

[ snip ]
I would not even externalize the idea of the 4 value data holder.

It may be that there are a significant number of operations that need to
work along a particular dimension of the first int array. In that case,
four primitive arrays may be most efficient.

Agreed. It might not be as pretty as an array of objects, but your
suggestion to hide the details in a way that makes it easy to change
them may help with that.
 
A

AndrewMcDonagh

Patricia said:
It depends. I've done enough work with large arrays, especially in
Fortran, that I've seen situations in which micro-optimization changed a
three day job to a few hours.

Sure, me too (coming from a background of real-time application and
embedded apps), but I measure the program first, then I perform what
might initially appear to be the micro-optimization.
 
J

Joshua

well i always read people writting about objects make software slow

umm currently i ma having this problem

i need to build a huge 2 dimention Array of an object

this Object contain : int,
int,
boolean,
boolean ;

and since the array is huge we need the processes on the array be very
fast

so instead of doing it as one 2dimention array of that object

i do 4 2dimention arrays int, int , boolean, boolean
and do the process on them

will that be faster ???

As per Thomas's comments, it'll most likely be slower for these reasons:

1. Proximity: the four data types are going to be in the same frame of
reference with the Object, but with four arrays they may be outside of
the processor's cache.

2. Array seek penalty: you're retrieving the Object but once (if you don't
program it in, the JIT will likely do it for you anyways), so you get one
array seek penalty instead of for (actually two instead of eight...).

But the fastest way is going to be to turn it all into one 1D array
because you get 1 seek penalty and 1 reference resolution...
 
D

dsjoblom

ali said:
well i always read people writting about objects make software slow

umm currently i ma having this problem

i need to build a huge 2 dimention Array of an object

this Object contain : int,
int,
boolean,
boolean ;

and since the array is huge we need the processes on the array be very
fast

so instead of doing it as one 2dimention array of that object

i do 4 2dimention arrays int, int , boolean, boolean
and do the process on them

will that be faster ???

Nobody knows what will be faster until someone measures and compares
all options.

A first tip, mentioned by others already is to not use multidimensional
arrays, but instead use singledimensional arrays and do your own index
calculations. Multidimensional arrays in java are practically unusable
for any performance intensive task.

As to whether one vs. many arrays is better, that is truly impossible
to answer without knowing your data. It depends a lot on how you access
the data. If you usually access only one field from many objects at a
time, multiple arrays may be more efficient, but if you usually access
one whole object at a time, a single array is likely to work better.

As a totally different solution, I'll give a slightly eccentric
suggestion. Store everything into a single int[] array, using the
values 0 and 1 to represent boolean false and true, respectively. If
you need to convert the last two ints back to booleans, use something
like boolean b = integer & 1 == 0 ? 0 : 1. If the JIT optimizer has any
brains, this operation will be compiled to something very simple
(probably integer & 1). This adds a 6 byte overhead to each object [*]
, but this solution has optimal memory locality and alignment on most
platforms, if we assume you are going to access a whole object at a
time. But, alas, this is very ugly ;-)

As always, you /must/ benchmark any solutions before you can draw any
definitive conclusions.

Regards,
Daniel Sjöblom

* But don't forget, if you were using object references the overhead
would be even higher per object
 
P

Patricia Shanahan

As a totally different solution, I'll give a slightly eccentric
suggestion. Store everything into a single int[] array, using the
values 0 and 1 to represent boolean false and true, respectively. If
you need to convert the last two ints back to booleans, use something
like boolean b = integer & 1 == 0 ? 0 : 1. If the JIT optimizer has any
brains, this operation will be compiled to something very simple
(probably integer & 1). This adds a 6 byte overhead to each object [*]
, but this solution has optimal memory locality and alignment on most
platforms, if we assume you are going to access a whole object at a
time. But, alas, this is very ugly ;-)

I would be very concerned about the wasted space when doing this for a
large, frequently accessed array.

All data is going to be aligned anyway, in the sense that no field will
cross a word or cache line boundary, the alignment question that
actually affects performance. Modern processors can pull a single byte
from a word very fast, so I would not worry about that.

The real problem with large, frequently accessed arrays is getting the
data to where it needs to be. The objective should be to make the best
possible use of every scrap of memory bandwidth and cache space.

Depending on how the data is accessed, the fastest might be a linear
array for each field, so that each array can be dense.

However, until the program has been written and can be measured, the
design objective should be to avoid precluding any of these solutions.
That means minimizing the amount of code that knows how the data is
arranged in memory.

Patricia
 
E

Eric Sosman

AndrewMcDonagh wrote On 08/03/06 18:01,:
Sure, me too (coming from a background of real-time application and
embedded apps), but I measure the program first, then I perform what
might initially appear to be the micro-optimization.

Let's spend a few moments estimating how "micro" the
optimization is. To summarize the O.P., there's a "huge"
array (apparently two-dimensional) of objects each with a
payload of two ints and two booleans. The O.P. wonders
whether he'd be better off using four parallel arrays of
primitives.

That's not much information to go on, so we'll need to
augment it with some assumptions. I'll assume

- The arrays are [N][N] in size, where N is sufficiently
large for N*N to be called "huge."

- An instance of the O.P.'s object occupies 24 bytes:
ten bytes of payload plus eight bytes of bookkeeping
used by the JVM, rounded up to a multiple of eight.

- An array occupies 16 bytes plus the space for its
payload elements: an array is an object and thus has
the JVM bookkeeping, and an array also probably has
some bookkeeping of its own (e.g., its length is
probably stored somewhere).

- An object reference occupies four bytes.

The latter three assumptions obviously depend on the JVM
and will vary from one to another; if you've got more specific
information about the O.P.'s JVM, by all means use it.

With these assumptions, the SomeClass[N][N] approach uses

- 24*N*N bytes for the SomeClass object instances, plus

- N*(4*N+16) bytes for N one-dimensional arrays containing
N SomeClass references apiece, plus

- 4*N+16 bytes for one one-dimensional array containing N
references to the other arrays.

Grand total for SomeClass[N][N]: 28*N*N + 20*N + 16 bytes.

The alternative of two int[N][N] arrays and two boolean[N][N]
arrays uses

- 2*N*(4*N+16) bytes for two sets of N int[N] arrays, plus

- 2*(4*N+16) bytes for two N-element arrays containing
references to the int[N] arrays, plus

- 2*N*(N+16) bytes for two sets of N boolean[N] arrays, plus

- 2*(4*N+16) bytes for two N-element arrays containing
references to the boolean[N] arrays.

Grand total for parallel arrays: 10*N*N + 80*N + 32 bytes.

Memory saved by using parallel arrays: 18*N*N - 60*N - 16
bytes -- roughly speaking, 18*"huge" bytes. A savings of
18*"huge" bytes isn't a "micro"-optimization.

This isn't to say that the optimization *should* be done,
of course. But the size of the potential savings is large
enough to influence the design from the outset: You might
decide to start with SomeClass[N][N], but if you think you may
want to change to parallel arrays or to a single long[N][N] or
even to a byte[HUGE], you won't let those SomeClass objects
escape "into the wild." You'll hide them behind accessors so
you can change representations easily if it proves necessary.
This is the sort of choice that's easy to make before coding,
much harder to make after the code is deployed and found to
be performing poorly.
 
C

Chris Uppal

If you usually access only one field from many objects at a
time, multiple arrays may be more efficient, but if you usually access
one whole object at a time, a single array is likely to work better.

Not necessarily even then. For instance if you have an array of objects which
you access sequentially, then depending on your luck the resulting access
pattern to real memory may not be very linear -- it depends on where the object
are placed (which in turn depends on when they were created and their history
since). If you stored the same data in N arrays of primitives, then the access
to each array would be linear (assuming a JVM implementation which is not
actually insane ;-). And memory subsystems /like/ linear access...

-- chris
 
D

dsjoblom

As a totally different solution, I'll give a slightly eccentric
suggestion. Store everything into a single int[] array, using the
values 0 and 1 to represent boolean false and true, respectively. If
you need to convert the last two ints back to booleans, use something
like boolean b = integer & 1 == 0 ? 0 : 1.

That should read 'boolean b = integer & 1 != 0', which actually
compiles ;-) I was thinking more about how the bytecode looks when I
wrote the above line.

Daniel
 
D

dsjoblom

Chris said:
Not necessarily even then. For instance if you have an array of objects which
you access sequentially, then depending on your luck the resulting access
pattern to real memory may not be very linear -- it depends on where the object
are placed (which in turn depends on when they were created and their history
since).

Yes. This is why I suggested the offbeat solution of using a single int
array for storing the objects, because java doesn't support value
objects.
If you stored the same data in N arrays of primitives, then the access
to each array would be linear (assuming a JVM implementation which is not
actually insane ;-). And memory subsystems /like/ linear access...

All true :)

Regards,
Daniel Sjöblom
 
P

Patricia Shanahan

Eric Sosman wrote:
....
Memory saved by using parallel arrays: 18*N*N - 60*N - 16
bytes -- roughly speaking, 18*"huge" bytes. A savings of
18*"huge" bytes isn't a "micro"-optimization.

Note, also, where those bytes are. The 8 bytes per object overhead, and
the padding to bring the object size up to a multiple of 8 bytes are
likely to be interleaved with payload data. They will occupy cache
space and use memory bandwidth, much more expensive than the same
overhead in a separate structure.

This isn't to say that the optimization *should* be done,
of course. But the size of the potential savings is large
enough to influence the design from the outset: You might
decide to start with SomeClass[N][N], but if you think you may
want to change to parallel arrays or to a single long[N][N] or
even to a byte[HUGE], you won't let those SomeClass objects
escape "into the wild." You'll hide them behind accessors so
you can change representations easily if it proves necessary.
This is the sort of choice that's easy to make before coding,
much harder to make after the code is deployed and found to
be performing poorly.

Completely agree.

Patricia
 
D

dsjoblom

Patricia said:
As a totally different solution, I'll give a slightly eccentric
suggestion. Store everything into a single int[] array, using the
values 0 and 1 to represent boolean false and true, respectively. If
you need to convert the last two ints back to booleans, use something
like boolean b = integer & 1 == 0 ? 0 : 1. If the JIT optimizer has any
brains, this operation will be compiled to something very simple
(probably integer & 1). This adds a 6 byte overhead to each object [*]
, but this solution has optimal memory locality and alignment on most
platforms, if we assume you are going to access a whole object at a
time. But, alas, this is very ugly ;-)

I would be very concerned about the wasted space when doing this for a
large, frequently accessed array.

All data is going to be aligned anyway, in the sense that no field will
cross a word or cache line boundary, the alignment question that
actually affects performance. Modern processors can pull a single byte
from a word very fast, so I would not worry about that.

True, although I did have a couple of other things in mind:

I was thinking of the fact that if the data is packed as tight as
possible, say into a byte array, then the 32-bit values would be on 2
byte boundaries, which is not optimal. But this consideration is moot
anyway in java, since you can't read integers directly from byte arrays
like you can in other languages.

The other thing is that this way every object is 16 bytes, probably
aligned at 16 byte boundaries (assuming the VM allocates large objects
at page boundaries.) This is optimal for vectorization on many
processors, although this may just be wishful thinking, I don't know
how good typical VMs are at vectorizing array accesses.
The real problem with large, frequently accessed arrays is getting the
data to where it needs to be. The objective should be to make the best
possible use of every scrap of memory bandwidth and cache space.

Depending on how the data is accessed, the fastest might be a linear
array for each field, so that each array can be dense.

However, until the program has been written and can be measured, the
design objective should be to avoid precluding any of these solutions.
That means minimizing the amount of code that knows how the data is
arranged in memory.

Agreed. It would be insane to not wrap this in some implementation
independent interface.

Regards,
Daniel Sjöblom
 
A

AndrewMcDonagh

Patricia said:
Eric Sosman wrote:
...
Memory saved by using parallel arrays: 18*N*N - 60*N - 16
bytes -- roughly speaking, 18*"huge" bytes. A savings of
18*"huge" bytes isn't a "micro"-optimization.

Note, also, where those bytes are. The 8 bytes per object overhead, and
the padding to bring the object size up to a multiple of 8 bytes are
likely to be interleaved with payload data. They will occupy cache
space and use memory bandwidth, much more expensive than the same
overhead in a separate structure.

This isn't to say that the optimization *should* be done,
of course. But the size of the potential savings is large
enough to influence the design from the outset: You might
decide to start with SomeClass[N][N], but if you think you may
want to change to parallel arrays or to a single long[N][N] or
even to a byte[HUGE], you won't let those SomeClass objects
escape "into the wild." You'll hide them behind accessors so
you can change representations easily if it proves necessary.
This is the sort of choice that's easy to make before coding,
much harder to make after the code is deployed and found to
be performing poorly.

Completely agree.

I completely agree...unless the usage of the array is a one time
occurrence and the real bottle neck in the system is a simple loop
somewhere that gets executed continually, which really should be a hash
table lookup.

Measure to find the problem area - dont guess.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,431
Messages
2,571,677
Members
48,796
Latest member
Greg L.

Latest Threads

Top