Bulk Array Element Allocation, is it faster?

Discussion in 'Java' started by Jan Burse, Sep 25, 2011.

  1. Jan Burse

    Jan Burse Guest

    Dear All,

    I just wonder wether modern JIT do optimize
    Code of the following form:

    Bla[] bla = new Bla[n];
    for (int i=0; i<n; i++) {
    bla = new Bla();
    }

    When I use the above in my code, my application
    spends 8800 ms in the benchmark I have.

    When I then change the code to:

    Bla[] bla = new Bla[n];

    ...

    if (bla==null)
    bla = new Bla();

    ..

    So instead of allocating all elements at once,
    I will allocate them on demand in the subsequent
    flow of my application.

    When I use the above in my code, my application
    now suddently sends 9600 ms in the benchmark
    I have.

    So I wonder whether eventually the JIT has
    a way to detect the bulk allocate of the first
    version and transform it into something more
    efficient than my lazy allocation.

    Any clues?

    Bye
    Jan Burse, Sep 25, 2011
    #1
    1. Advertising

  2. Jan Burse

    Eric Sosman Guest

    On 9/24/2011 9:17 PM, Jan Burse wrote:
    > Dear All,
    >
    > I just wonder wether modern JIT do optimize
    > Code of the following form:
    >
    > Bla[] bla = new Bla[n];
    > for (int i=0; i<n; i++) {
    > bla = new Bla();
    > }
    >
    > When I use the above in my code, my application
    > spends 8800 ms in the benchmark I have.
    >
    > When I then change the code to:
    >
    > Bla[] bla = new Bla[n];
    >
    > ...
    >
    > if (bla==null)
    > bla = new Bla();
    >
    > ..
    >
    > So instead of allocating all elements at once,
    > I will allocate them on demand in the subsequent
    > flow of my application.


    This could be a win if you will actually use only a few of
    the array elements, and if constructing a Bla is expensive (for
    example, if it involves I/O). If Bla's are cheap to make, or if
    you'll wind up using most of them anyhow, there's no obvious
    reason to conditionalize.

    > When I use the above in my code, my application
    > now suddently sends 9600 ms in the benchmark
    > I have.


    Nine percent longer. But I'm highly suspicious of those nice,
    round numbers: It seems an unlikely coincidence that two rather
    different chunks of code should both execute in even tenths of
    seconds. Are you sure you've measured what you want to measure?
    Are you sure that what you want to measure is the important thing?

    > So I wonder whether eventually the JIT has
    > a way to detect the bulk allocate of the first
    > version and transform it into something more
    > efficient than my lazy allocation.


    Based on your (suspect) timings, "just leave it be" would be
    exactly the kind of improvement you ask for. The JIT very likely
    does something like that already :)

    --
    Eric Sosman
    d
    Eric Sosman, Sep 25, 2011
    #2
    1. Advertising

  3. Jan Burse

    Roedy Green Guest

    On Sun, 25 Sep 2011 03:17:19 +0200, Jan Burse <>
    wrote, quoted or indirectly quoted someone who said :

    >Any clues


    new does two things:
    1. allocates X bytes of space
    2. initialises that block of space to a pattern.

    A clever jit may notice that some objects are simple, and always
    generate the exact same pattern. It can then replace the constructor
    code with a block move.

    A clever jit may notice that some objects are simple, except for a few
    computed bytes. It might be able to compose a fast constructor that
    does a block move then call the methods to compute the few computed
    bytes.

    Moving 0 to a group of bytes is even faster than a block move, since
    the source is a register or the instruction itself.

    So another idea is the jit could reorder the fields so that the ones
    needing the same initialisation byte pattern live side by side, so you
    can do it with a REP constant move.

    The flat footed way to do a constructor is to zap the entire block to
    0, then go field by field initialised to non-null not-0 and plop that
    value in on top.

    Because constructors are called so often, even a very minor
    optimisation would pay off.
    --
    Roedy Green Canadian Mind Products
    http://mindprod.com
    It should not be considered an error when the user starts something
    already started or stops something already stopped. This applies
    to browsers, services, editors... It is inexcusable to
    punish the user by requiring some elaborate sequence to atone,
    e.g. open the task editor, find and kill some processes.
    Roedy Green, Sep 25, 2011
    #3
  4. Jan Burse

    Lew Guest

    Roedy Green wrote:
    > The flat footed way to do a constructor is to zap the entire block to
    > 0, then go field by field initialised to non-null not-0 and plop that
    > value in on top.


    This "flat footed [sic] way" is what Java constructors do, and must.

    Member fields are set to their default values prior to any constructor assignment, even to the same value as the default.

    public class Foo
    {
    private Bar bar = null;
    }

    Foo#bar will be cleared to 'null' upon first construction, then initialized to 'null' by the explicit initialization. Both steps happen. Both steps are required to happen, and they're required to happen separately.

    > Because constructors are called so often, even a very minor
    > optimisation would pay off.


    Perhaps, depending on what you mean by "pay off". Some have argued that the optimization achieved by omitting explicit initialization to default values is not worth the supposed reduction in clarity.

    Since we all know that member fields are constructed with their default values, I don't think it's any less clear to omit their explicit (and redundant) initialization to default values.

    I guess it depends on whether that's your opinion, and whether omitting the redundant initialization is an optimization that would pay off.

    --
    Lew
    Lew, Sep 25, 2011
    #4
  5. Jan Burse

    Roedy Green Guest

    On Sat, 24 Sep 2011 23:22:45 -0700 (PDT), Lew <>
    wrote, quoted or indirectly quoted someone who said :

    >Foo#bar will be cleared to 'null' upon first construction, then initialized to 'null' by the explicit initialization.
    > Both steps happen. Both steps are required to happen, and they're required to happen separately.


    I doubt that. The JLS never says exactly how something works, just
    what the net effect must be. It is part of what makes it so difficult
    to follow. A JLS for programmers rather than implementors might
    describe a simple reference implementation, and say, "As a programmer
    you can presume it works like this, but actual implementations will be
    more sophisticated and may not give exactly the same behaviour as this
    simple explanation. For those fine points, see the implementors JLS."

    So for example it could describe how a simple GC works.

    In most cases the second init to null could be dropped by a clever
    optimiser because it would have the same net effect.

    Of course I defer to you and Patricia on divining the meaning of the
    JLS.
    --
    Roedy Green Canadian Mind Products
    http://mindprod.com
    It should not be considered an error when the user starts something
    already started or stops something already stopped. This applies
    to browsers, services, editors... It is inexcusable to
    punish the user by requiring some elaborate sequence to atone,
    e.g. open the task editor, find and kill some processes.
    Roedy Green, Sep 25, 2011
    #5
  6. Jan Burse

    Jan Burse Guest

    Eric Sosman schrieb:
    > This could be a win if you will actually use only a few of
    > the array elements, and if constructing a Bla is expensive (for
    > example, if it involves I/O). If Bla's are cheap to make, or if
    > you'll wind up using most of them anyhow, there's no obvious
    > reason to conditionalize.


    The bla <init> does nothing. And yes it could
    be that a bla is not needed, that is why I
    invented the lazy scheme. But I see the slowdown
    especially for when all the bla's are needed.

    > Nine percent longer. But I'm highly suspicious of those nice,
    > round numbers: It seems an unlikely coincidence that two rather
    > different chunks of code should both execute in even tenths of
    > seconds.


    Repeated measurements give different figures,
    which lead me to see an accuracy of measurement
    of around 100ms, so instead of telling you that
    the application needs one time 9538 ms and another
    time 9622 ms etc.. I gave only the order of the
    time spend rounded to hundreds.

    Bye
    Jan Burse, Sep 25, 2011
    #6
  7. Jan Burse

    Jan Burse Guest

    Roedy Green schrieb:
    > new does two things:
    > 1. allocates X bytes of space
    > 2. initialises that block of space to a pattern.


    So a dumb translation of the loop would be:

    1. for each i between 0 and n-1
    1.1 allocates X bytes of space
    1.2 call the initializer

    Then I was more thinking that the loop does the
    following, i.e. is cleverly compiled to:

    1. allocates n*X bytes of space
    2. for each i between 0 and n-1
    2.1 call the initializer for location X*i

    At least I would hand assembler it like this.
    Saving n-1 calls to the heap allocator.

    But maybe positive effect is not based on such
    a clever allocation and only on what Patricia said.

    I was already googling a little bit about the
    compilation techniques found for JIT today, but
    did not yet hit any evidence for clever loop
    allocation, but it is so obvious I really it
    must be done. But maybe it is not done because
    of multi-threading or GC, who knows...

    Bye
    Jan Burse, Sep 25, 2011
    #7
  8. On 09/25/2011 03:41 AM, Patricia Shanahan wrote:
    > On 9/24/2011 6:17 PM, Jan Burse wrote:
    >> Dear All,
    >>
    >> I just wonder wether modern JIT do optimize
    >> Code of the following form:
    >>
    >> Bla[] bla = new Bla[n];
    >> for (int i=0; i<n; i++) {
    >> bla = new Bla();
    >> }
    >>
    >> When I use the above in my code, my application
    >> spends 8800 ms in the benchmark I have.
    >>
    >> When I then change the code to:
    >>
    >> Bla[] bla = new Bla[n];
    >>
    >> ...
    >>
    >> if (bla==null)
    >> bla = new Bla();
    >>
    >> ..
    >>
    >> So instead of allocating all elements at once,
    >> I will allocate them on demand in the subsequent
    >> flow of my application.
    >>
    >> When I use the above in my code, my application
    >> now suddently sends 9600 ms in the benchmark
    >> I have.
    >>
    >> So I wonder whether eventually the JIT has
    >> a way to detect the bulk allocate of the first
    >> version and transform it into something more
    >> efficient than my lazy allocation.
    >>
    >> Any clues?

    >
    > You also need to consider the general optimization of processors in
    > favor of doing efficiently those things they have done in the recent past.
    >
    > When you do the allocation all at once, the code and data for "new
    > Bla()" is in cache on the second and subsequent calls. There may be
    > cache conflicts between "new Bla()" and the actual work, leading to
    > many more cache misses when you interleave them.
    >
    > Doing the initialization on demand may be adding an unpredictable
    > conditional branch to the subsequent flow.


    This and the fact that lazy initialization has concurrency issues when
    used in a multi threading context (which e.g. final members do not have)
    has made me use this approach less frequently. Also, for short lived
    objects in total it might be much more efficient to just allocate the
    object even if it is not used because it won't survive new generation
    anyway. I think the lazy init idiom only really pays off if
    construction is expensive (e.g. because it involves IO or time consuming
    calculations). In all other cases it's better to just unconditionally
    create and let GC work. And because of improvements in JVM technology
    the balance has moved a lot away from the lazy side because allocation
    and deallocation overhead became smaller than in earlier Java versions.

    Kind regards

    robert
    Robert Klemme, Sep 25, 2011
    #8
  9. Jan Burse

    Jan Burse Guest

    Robert Klemme schrieb:
    > On 09/25/2011 03:41 AM, Patricia Shanahan wrote:
    >> On 9/24/2011 6:17 PM, Jan Burse wrote:
    >>> Dear All,
    >>>
    >>> I just wonder wether modern JIT do optimize
    >>> Code of the following form:
    >>>
    >>> Bla[] bla = new Bla[n];
    >>> for (int i=0; i<n; i++) {
    >>> bla = new Bla();
    >>> }
    >>>
    >>> When I use the above in my code, my application
    >>> spends 8800 ms in the benchmark I have.
    >>>
    >>> When I then change the code to:
    >>>
    >>> Bla[] bla = new Bla[n];
    >>>
    >>> ...
    >>>
    >>> if (bla==null)
    >>> bla = new Bla();
    >>>
    >>> ..
    >>>
    >>> So instead of allocating all elements at once,
    >>> I will allocate them on demand in the subsequent
    >>> flow of my application.
    >>>
    >>> When I use the above in my code, my application
    >>> now suddently sends 9600 ms in the benchmark
    >>> I have.
    >>>
    >>> So I wonder whether eventually the JIT has
    >>> a way to detect the bulk allocate of the first
    >>> version and transform it into something more
    >>> efficient than my lazy allocation.
    >>>
    >>> Any clues?

    >>
    >> You also need to consider the general optimization of processors in
    >> favor of doing efficiently those things they have done in the recent
    >> past.
    >>
    >> When you do the allocation all at once, the code and data for "new
    >> Bla()" is in cache on the second and subsequent calls. There may be
    >> cache conflicts between "new Bla()" and the actual work, leading to
    >> many more cache misses when you interleave them.
    >>
    >> Doing the initialization on demand may be adding an unpredictable
    >> conditional branch to the subsequent flow.

    >
    > This and the fact that lazy initialization has concurrency issues when
    > used in a multi threading context (which e.g. final members do not have)
    > has made me use this approach less frequently. Also, for short lived
    > objects in total it might be much more efficient to just allocate the
    > object even if it is not used because it won't survive new generation
    > anyway. I think the lazy init idiom only really pays off if construction
    > is expensive (e.g. because it involves IO or time consuming
    > calculations). In all other cases it's better to just unconditionally
    > create and let GC work. And because of improvements in JVM technology
    > the balance has moved a lot away from the lazy side because allocation
    > and deallocation overhead became smaller than in earlier Java versions.
    >
    > Kind regards
    >
    > robert


    Yes, really seems so. Looks that the infinite heap idea
    works (notion borrowed from a talk by Cliff Click found
    on the net, should indicate that we just do this, allocate
    and don't care much).

    I made some additional benchmarks, in the present case the lazy
    does not so much good in that it saves allocates. I got the
    following figures for the application at hand:

    Bulk: 84'393'262 allocates
    Lazy: 81'662'843 allocates

    So the application does not have many exit points in the flow
    following the array creation, except for the last exit point
    when anyway all objects were needed.

    But still I have some doubts! The array I allocate are only
    4 elements long or so? Why is there still such a big
    difference between allocating in bulk only 4 elements and
    doing them one after the other?

    Overhead by the lazy detection itself? I doubt this also
    since the array is anyway accessed, and the lazy check
    is a small null pointer check.

    I am still puzzled.

    Bye
    Jan Burse, Sep 25, 2011
    #9
  10. Jan Burse

    Eric Sosman Guest

    On 9/25/2011 4:47 AM, Jan Burse wrote:
    > Roedy Green schrieb:
    >> new does two things:
    >> 1. allocates X bytes of space
    >> 2. initialises that block of space to a pattern.

    >
    > So a dumb translation of the loop would be:
    >
    > 1. for each i between 0 and n-1
    > 1.1 allocates X bytes of space
    > 1.2 call the initializer
    >
    > Then I was more thinking that the loop does the
    > following, i.e. is cleverly compiled to:
    >
    > 1. allocates n*X bytes of space
    > 2. for each i between 0 and n-1
    > 2.1 call the initializer for location X*i
    >
    > At least I would hand assembler it like this.
    > Saving n-1 calls to the heap allocator.


    Since the heap allocator probably consists of one compare-and-
    branch ("Is there enough room?") and one addition ("Advance the
    nextFreeAddress pointer"), the savings are unlikely to be large.
    This isn't malloc(), you know.

    > I was already googling a little bit about the
    > compilation techniques found for JIT today, but
    > did not yet hit any evidence for clever loop
    > allocation, but it is so obvious I really it
    > must be done. But maybe it is not done because
    > of multi-threading or GC, who knows...


    Note that the semantics of the bulk and lazy allocations
    in your original example are quite different. An "optimization"
    that changes the meaning of the code is only a benefit if the
    code was wrong to begin with. ;-)

    --
    Eric Sosman
    d
    Eric Sosman, Sep 25, 2011
    #10
  11. On 09/25/2011 01:38 PM, Jan Burse wrote:
    > Robert Klemme schrieb:
    >> On 09/25/2011 03:41 AM, Patricia Shanahan wrote:
    >>> On 9/24/2011 6:17 PM, Jan Burse wrote:
    >>>> Dear All,
    >>>>
    >>>> I just wonder wether modern JIT do optimize
    >>>> Code of the following form:
    >>>>
    >>>> Bla[] bla = new Bla[n];
    >>>> for (int i=0; i<n; i++) {
    >>>> bla = new Bla();
    >>>> }
    >>>>
    >>>> When I use the above in my code, my application
    >>>> spends 8800 ms in the benchmark I have.
    >>>>
    >>>> When I then change the code to:
    >>>>
    >>>> Bla[] bla = new Bla[n];
    >>>>
    >>>> ...
    >>>>
    >>>> if (bla==null)
    >>>> bla = new Bla();
    >>>>
    >>>> ..
    >>>>
    >>>> So instead of allocating all elements at once,
    >>>> I will allocate them on demand in the subsequent
    >>>> flow of my application.
    >>>>
    >>>> When I use the above in my code, my application
    >>>> now suddently sends 9600 ms in the benchmark
    >>>> I have.
    >>>>
    >>>> So I wonder whether eventually the JIT has
    >>>> a way to detect the bulk allocate of the first
    >>>> version and transform it into something more
    >>>> efficient than my lazy allocation.
    >>>>
    >>>> Any clues?
    >>>
    >>> You also need to consider the general optimization of processors in
    >>> favor of doing efficiently those things they have done in the recent
    >>> past.
    >>>
    >>> When you do the allocation all at once, the code and data for "new
    >>> Bla()" is in cache on the second and subsequent calls. There may be
    >>> cache conflicts between "new Bla()" and the actual work, leading to
    >>> many more cache misses when you interleave them.
    >>>
    >>> Doing the initialization on demand may be adding an unpredictable
    >>> conditional branch to the subsequent flow.

    >>
    >> This and the fact that lazy initialization has concurrency issues when
    >> used in a multi threading context (which e.g. final members do not have)
    >> has made me use this approach less frequently. Also, for short lived
    >> objects in total it might be much more efficient to just allocate the
    >> object even if it is not used because it won't survive new generation
    >> anyway. I think the lazy init idiom only really pays off if construction
    >> is expensive (e.g. because it involves IO or time consuming
    >> calculations). In all other cases it's better to just unconditionally
    >> create and let GC work. And because of improvements in JVM technology
    >> the balance has moved a lot away from the lazy side because allocation
    >> and deallocation overhead became smaller than in earlier Java versions.
    >>
    >> Kind regards
    >>
    >> robert

    >
    > Yes, really seems so. Looks that the infinite heap idea
    > works (notion borrowed from a talk by Cliff Click found
    > on the net, should indicate that we just do this, allocate
    > and don't care much).
    >
    > I made some additional benchmarks, in the present case the lazy
    > does not so much good in that it saves allocates. I got the
    > following figures for the application at hand:
    >
    > Bulk: 84'393'262 allocates
    > Lazy: 81'662'843 allocates
    >
    > So the application does not have many exit points in the flow
    > following the array creation, except for the last exit point
    > when anyway all objects were needed.
    >
    > But still I have some doubts! The array I allocate are only
    > 4 elements long or so? Why is there still such a big
    > difference between allocating in bulk only 4 elements and
    > doing them one after the other?
    >
    > Overhead by the lazy detection itself? I doubt this also
    > since the array is anyway accessed, and the lazy check
    > is a small null pointer check.


    Yes, but the cost is not in the check but in the branching on processor
    level (see what Patricia wrote).

    Kind regards

    robert
    Robert Klemme, Sep 25, 2011
    #11
  12. Jan Burse

    Jan Burse Guest

    Robert Klemme schrieb:
    > Yes, but the cost is not in the check but in the branching on processor
    > level (see what Patricia wrote).


    Depends on the processor and on the branch. If
    new is just heap -= size, what some papers suggest,
    then it might not be important.

    But if new is much more, then sure the branch
    interrupts the normal code flow so much that
    instruction piplining gets out of sync. And
    the speed gain by instruction overlapping
    is lost.

    But my hypothesis is more that something
    algorithmically on a higher level happens than
    something on the lower hardware level.

    So I also found something about "Lock
    coarsening"(*), so if the new needs some lock
    this lock could be aquired before the initialization
    loop and released after the initialization
    loop. So that:

    Bla[] bla = new Bla[n];
    for (int i=0; i<n; i++) {
    bla = new Bla();
    }

    Would only need 1 locking instruction pair,
    where as:

    Bla[] bla = new Bla[n];

    ...

    if (bla==null)
    bla = new Bla();

    ..

    Would need n locking instruction pairs. But I
    would still need some confirmation that JITs
    are able to do such an optimization on a higher
    level in the present case.

    I am also not sure whether the bounded for
    counts as a looping control flow, so that "Lock
    coarsening" would not be applicable. But
    I guess it can be deemed non looping, so that
    the technique would be applicable.

    Currently I am planning to change the loop
    to something like that:

    Bla[] bla = new Bla[n];
    for (int i=0; i<n; i++) {
    bla = createBla();
    }

    So that the JIT has less information on what
    the loop is about, and to do some new
    measurements. To be fair I would also use
    createBla() in the lazy scenario. Lets see
    what happens.

    Best Regards

    (*)
    http://java.sun.com/performance/reference/whitepapers/6_performance.html#2.1.2
    Jan Burse, Sep 25, 2011
    #12
  13. Jan Burse

    Jan Burse Guest

    Jan Burse schrieb:
    > Currently I am planning to change the loop
    > to something like that:
    >
    > Bla[] bla = new Bla[n];
    > for (int i=0; i<n; i++) {
    > bla = createBla();
    > }
    >
    > So that the JIT has less information on what
    > the loop is about, and to do some new
    > measurements. To be fair I would also use
    > createBla() in the lazy scenario. Lets see
    > what happens.


    Maybe there is a bug somewhere else in the
    application that leads to the performance loss.
    There are also points in the application where
    some of the bla elements are forcefully set
    to null so as to release Bla objects:

    ...
    bla[j] = null;
    ...

    I am not sure whether these releases work
    the same in the bulk and the lazy version of
    the code. Need to check first.

    So measurements have already shown that the
    release gives some gain, measurements where
    16000 ms vs. 9600 ms. The release condition
    is more complicated than the lazy new
    condition, but the overhead is compensated
    the overall gain.

    Lets see.

    Best Regards
    Jan Burse, Sep 25, 2011
    #13
  14. Jan Burse

    Eric Sosman Guest

    On 9/25/2011 10:04 AM, Jan Burse wrote:
    > Robert Klemme schrieb:
    >> Yes, but the cost is not in the check but in the branching on processor
    >> level (see what Patricia wrote).

    >
    > Depends on the processor and on the branch. If
    > new is just heap -= size, what some papers suggest,
    > then it might not be important.
    >
    > But if new is much more, then sure the branch
    > interrupts the normal code flow so much that
    > instruction piplining gets out of sync. And
    > the speed gain by instruction overlapping
    > is lost.
    >
    > But my hypothesis is more that something
    > algorithmically on a higher level happens than
    > something on the lower hardware level.
    >
    > So I also found something about "Lock
    > coarsening"(*), so if the new needs some lock
    > this lock could be aquired before the initialization
    > loop and released after the initialization
    > loop. [...]


    I'm speculating almost as wildly as you are, but I strongly
    doubt that a lock is acquired. Object creation happens so often
    that I'm sure the JVM implementors will use something like compare-
    and-swap on any platform that provides it (which I think means "all"
    nowadays).

    Even the check for "Should I wait for the garbage collector to
    finish?" uses no lock, on at least some platforms. Their JVM's
    dedicate an entire memory page to serve as a flag, whose state is
    not recorded in its content but in its MMU protection bits. To see
    if it's safe to allocate, an allocating thread just stores something
    in the special page; if the store works allocation is safe. If GC
    has raised its STOP sign, the page is write-protected and the store
    generates a hardware trap -- very high overhead, to be sure, but
    extremely low (not even a conditional branch!) in the common case.

    > Would need n locking instruction pairs. But I
    > would still need some confirmation that JITs
    > are able to do such an optimization on a higher
    > level in the present case.


    Again, I point out that the bulk and lazy variants do not do
    the same thing. Consider, for example

    class Bla {
    private static int master_seqno = 0;
    public final int seqno = ++master_seqno;
    }

    Observe that the value of bla[42].seqno differs between the two
    variants; it would therefore be an error to "optimize" either by
    transforming it into the other.

    --
    Eric Sosman
    d
    Eric Sosman, Sep 25, 2011
    #14
  15. Jan Burse

    Jan Burse Guest

    Eric Sosman schrieb:
    > Again, I point out that the bulk and lazy variants do not do
    > the same thing. Consider, for example
    >
    > class Bla {
    > private static int master_seqno = 0;
    > public final int seqno = ++master_seqno;
    > }
    >
    > Observe that the value of bla[42].seqno differs between the two
    > variants; it would therefore be an error to "optimize" either by
    > transforming it into the other.


    Not really. You could only see a difference in the order
    that Bla() gets a number. Since in the bulk variant
    the Bla() will be allocated from 0 .. n-1, but the lazy
    might allocate in an arbitrary order, which is the case
    in my application.

    So if your application does not depend on the order
    the seqno's are created there is no functional problem.
    Main question I have here is not about relative
    correctness of bulk versus lazy. But why bulk is
    counterintuitively much faster?

    Since the bulk is much faster I am assuming some
    optimization is happening in this particular loop:

    Bla[] bla = new Bla[n];
    for (int i=0; i<n; i++) {
    bla = new Bla();
    }

    Actually this loop is something that shocked me
    when I started working with Java many years ago.
    What there is no way to make a "new" of an array
    with all its elements?

    This is not seen if one works with int[] or
    double[] arrays. But as soon as you work with some
    other array, either a higher dimensional of int[]
    or double[] or an array of some class X. You are
    responsible for populating the elements.

    The advantage is that you can create diagonal
    matrices, sparse arrays, polymorphic data in
    arrays, etc.. But I have the feeling that a loop
    as above is happening in many many applications
    exactly because initializing the elements of an
    array is left to the application programmer.

    So I am really focusing on this array init, and
    asking what it does under the hood, and the lazy
    is only a reference for the performance. I am
    not interessted in a general theory between going
    from bulk to lazy and back and forth. Forget
    about the lazy and explain the bulk!

    Bye
    Jan Burse, Sep 25, 2011
    #15
  16. Jan Burse <> wrote:
    > Main question I have here is not about relative
    > correctness of bulk versus lazy. But why bulk is
    > counterintuitively much faster?
    >
    > Since the bulk is much faster I am assuming ...


    I understand perfectly well, that you *were* resorting
    to those assumptions when you first noticed the effect,
    but after Patricia's and others' answers I wouldn't
    expect those assumptions to be still necessary to
    explain your observations.
    Andreas Leitgeb, Sep 25, 2011
    #16
  17. Jan Burse

    Jan Burse Guest

    Eric Sosman schrieb:
    > Note that the semantics of the bulk and lazy allocations
    > in your original example are quite different. An "optimization"
    > that changes the meaning of the code is only a benefit if the
    > code was wrong to begin with. ;-)


    What I do in the code has nothing to do with my question. My
    question circles only around the following code fragment:

    Bla[] bla = new Bla[n];
    for (int i=0; i<n; i++) {
    bla = new Bla();
    }

    And what the JIT does. And not what I am doing in a comparison
    with lazy. That is why I started my post with:

    "I just wonder wether modern JIT do optimize
    Code of the following form:"

    Please focus on that.

    Bye
    Jan Burse, Sep 25, 2011
    #17
  18. Jan Burse

    Jan Burse Guest

    Jan Burse schrieb:
    > "I just wonder wether modern JIT do optimize
    > Code of the following form:"
    >
    > Please focus on that.


    Focus is also on JIT and not on JVM. So more
    on compilation techniques than on execution
    techniques (where draw the line?). Answer
    could be balantly that JIT does nothing, and
    then it must be something in the JVM.

    Andreas Leitgreb;
    > I understand perfectly well, that you *were* resorting
    > to those assumptions when you first noticed the effect,
    > but after Patricia's and others' answers I wouldn't
    > expect those assumptions to be still necessary to
    > explain your observations.


    But when the JIT would do something, then it
    is not necessary that the JVM does something,
    and would be a technique that also works on
    old CPUs without pipelining super cache
    whatever. I did not not yet have time to test
    the phaenomen on old CPU.

    Probably when I switch to old CPUs and also
    old JDKs I will also end up with old JITs.
    Anyway would be interested into JIT insights
    old and new.

    Bye
    Jan Burse, Sep 25, 2011
    #18
  19. Jan Burse

    Lew Guest

    Jan Burse wrote:
    > Actually this loop is something that shocked me
    > when I started working with Java many years ago.
    > What there is no way to make a "new" of an array
    > with all its elements?
    >
    > This is not seen if one works with int[] or
    > double[] arrays. But as soon as you work with some
    > other array, either a higher dimensional of int[]
    > or double[] or an array of some class X. You are
    > responsible for populating the elements.


    This is equally true of primitive arrays. Why do you claim otherwise?

    The JLS, §15.10.1, states, "... if a single DimExpr appears, a single-dimensional array is created of the specified length, and each component of the array is initialized to its default value (§4.12.5)." This is true whether the array is of a primitive or reference type.

    In either case, you are responsible for populating any non-default values.

    --
    Lew
    Lew, Sep 25, 2011
    #19
  20. Jan Burse

    Lew Guest

    Roedy Green wrote:
    > Lew wrote, quoted or indirectly quoted someone who said :
    >> Foo#bar will be cleared to 'null' upon first construction, then initialized to 'null' by the explicit initialization.
    >> Both steps happen. Both steps are required to happen, and they're required to happen separately.

    >
    > I doubt that. The JLS never says exactly how something works, just


    > what the net effect must be. It is part of what makes it so difficult


    Let me settle your doubt, then, because the JLS does say exactly that.

    JLS §12.5:
    http://java.sun.com/docs/books/jls/third_edition/html/execution.html#12.5
    "Whenever a new class instance is created, memory space is allocated for itwith room for all the instance variables declared in the class type and all the instance variables declared in each superclass of the class type, including all the instance variables that may be hidden (§8.3). If there is not sufficient space available to allocate memory for the object, then creation of the class instance completes abruptly with an OutOfMemoryError. Otherwise, all the instance variables in the new object, including those declared in superclasses, are initialized to their default values (§4.12.5).
    "Just before a reference to the newly created object is returned as the result, the indicated constructor is processed to initialize the new object using the following procedure:
    "...
    "4. Execute the instance initializers and instance variable initializers for this class, assigning the values of instance variable initializers to thecorresponding instance variables, in the left-to-right order in which theyappear textually in the source code for the class. If execution of any of these initializers results in an exception, then no further initializers are processed and this procedure completes abruptly with that same exception.Otherwise, continue with step 5. (In some early implementations, the compiler incorrectly omitted the code to initialize a field if the field initializer expression was a constant expression whose value was equal to the default initialization value for its type.)"


    > to follow. A JLS for programmers rather than implementors might
    > describe a simple reference implementation, and say, "As a programmer
    > you can presume it works like this, but actual implementations will be
    > more sophisticated and may not give exactly the same behaviour as this
    > simple explanation. For those fine points, see the implementors JLS."


    The JLS is extremely explicit about the order and result of steps to load, allocate and initialize classes and instances. In this particular, non-hypothetical case, it requires members to get their default values, then separately to process explicit initializers, even (and especially) if the initialization is to the same default value.


    > So for example it could describe how a simple GC works.
    >
    > In most cases the second init to null could be dropped by a clever
    > optimiser because it would have the same net effect.


    Except that that is, as you see from the cited passage, explicitly forbidden.

    > Of course I defer to you and Patricia on divining the meaning of the
    > JLS.


    You don't need to. Just read the cited passage for yourself.

    --
    Lew
    Lew, Sep 25, 2011
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ken
    Replies:
    24
    Views:
    3,834
    Ben Bacarisse
    Nov 30, 2006
  2. chris
    Replies:
    6
    Views:
    971
    chris
    Oct 28, 2005
  3. HANM
    Replies:
    2
    Views:
    695
    Joseph Kesselman
    Jan 29, 2008
  4. Bjarke Hammersholt Roune
    Replies:
    14
    Views:
    1,170
    Bjarke Hammersholt Roune
    Mar 6, 2011
  5. Brent Patroch

    CDO Bulk Email Help - Need to make it faster

    Brent Patroch, Sep 16, 2004, in forum: ASP General
    Replies:
    0
    Views:
    305
    Brent Patroch
    Sep 16, 2004
Loading...

Share This Page