Managed-Code Bloat

  • Thread starter Lawrence D'Oliveiro
  • Start date
B

BGB

That is not the case. I have actually patched the source code to
SpiderMonkey myself, I have literally sat next to the people who work on
the engine, SpiderMonkey is garbage-collected. Mark-and-trace, although
the plan is to move to generational GC. I'm not so sure about V8, but
the page I linked to explicitly mentions generational garbage
collection, so I'm sure it's in the same boat.

If you don't believe that, what would it take to get you to believe the
truth? A signed note from Brendan Eich himself?

yep... and my own language (partly derived from JavaScript) also uses
GC, but it is based on conservative mark/sweep (similar to the Boehm GC).

sadly, the problem with traditional generation GC strategies is that
they would depend on having a precise GC, which has the major drawback
of being notably painful to work with (apart from having to
"pin"/"defile" pretty much any object which may be potentially
referenced by "unsafe" C code).

the tradeoff though is that precise generational GC's can get much
better performance than conservative mark/sweep.


however, my GC is used by nearly all of the C code as well, and with
care, most GC stalls can be largely avoided (I am using it successfully
with a 3D engine, doing an FPS style game).

part of the trick though is that I am mostly treating the GC as if it
were a manual MM, as in, freeing stuff when it is known no longer needed
(and the script VM also has a few tricks to reduce garbage production as
well...).


or such...
 
S

Silvio

Maybe that’s the point: such skills are less common among corporate types.

Not per se. I have programmed in C since 1985 and took up C++ when the
cfront thingy became available on our (university) PDP-11 and Sun UNIX
machines a year later. I dare to say that I am quite capable of writing
high quality code in C/C++.

BUT, when writing large multi-threaded systems I experienced that the
effort needed to keep everything working correctly typically grew at a
more-than-linear rate in relation to code base size.

Using C++ I created a ref<T> templated type that exposed T* like
semantics but implemented reference counting to simplify memory
management of objects that where shared in such a way that it was
difficult to determine who should control their lifespan.
Although that worked fine it was very easy to get bitten by reference
cycles so it only solved part of the problem. The extra memory
allocation/deallocation needed per object and the negative impact it had
on code inlining did not help performance either.

Being pragmatic I decided to switch to Java for specific projects even
though I had rejected it earlier as being a "fake" language with
inferior performance characteristics.
Soon I discovered that the portion of effort needed for correct explicit
memory management (and tracing bugs caused by the lack thereof) in C/C++
was so large that development times dropped drastically by using Java.
Very soon I was convinced that only a very small part of the systems I
wrote actually justified opting for C/C++.

A few years ago I switched to Scala. Although it runs on the same JVM
that Java uses a similar return on investment was my reward. The reasons
for this are very different though.
And yet most mass-market apps avoid them.

That was true a couple of years ago when hand-held devices where only
just capable of doing fancy stuff. Native code was required to squeeze
out that last x% of performance. (Which will work for such mostly small
applications)
The early Android phones suffered from the lack of high-performing
devices and delivered poor performance. That problem has been solved
since then as I have experienced with my last two Android phones for
almost two years.

Silvio
 
B

BGB

JavaScript objects are basically hashmaps. The delete statement is the
JS equivalent of map.remove.

yep, and also IMO the Java idea that GC=="inability to free crap" is IMO
stupid...

delete can basically also allow a VM to free stuff early, and thus
potentially improve overall performance.


also, possible though are some garbage reduction tricks:
ref-counting (as they can detect earlier objects that have died);
(heap-based) value types (because their lifespan behavior is trivial to
determine, and so one can allocate/free them aggressively);
....

ref-counting has the drawback though that it is very difficult to write
"general purpose" code and not screw up the ref-counts somewhere (which
can easily blow up the program), causing me to generally not use them.

value types are simpler to work with, but (like ref-counts) involve lots
of operations which may add overhead, are not as general-purpose (since
they reflect particular usage semantics), and involve in some cases
"policy" decisions (basically, who "owns" a value-object, since
internally they are passed as a reference, with operations which copy
and free them as necessary to cause their behaviors).

a smarter VM (or a JIT) could probably use the stack-frame to store
value-types.


or such...
 
A

Arved Sandstrom

So you concede my point.

I concede your observation about C++ and "corporate" programmers,
because it's very likely true. Corporate programmers tend not to use C
or C++. Read Silvio's post and you may arrive at a glimmer of
understanding as to why. You may also have noticed along the way that
C++ CORBA never took off - it is of relevance to this discussion.

I don't concede your point, because I believe your point is that the
people who have become corporate programmers are the mediocre types who
would never be capable of C or C++. That's patently untrue, and it's
stupid and naive to suggest it.
I have published code in a whole bunch of different languages, just in this
noisegroup alone. And there’s more in my GitHub area, as well as a patch or
two floating around elsewhere. I will happily listen to criticism from you
... the day that you can do the same.

You misunderstood. I'm making the point that you, just like any other
programmer, can only be hotshit at a handful of languages at any given
point in time. I guarantee, for example, that if you had been a
wonderful C++ programmer up until, say, 1995, spent the past 16 years
writing Python and all that good stuff, and now all of a sudden in 2011
were hit with a large app that had to be written in C++, you'd be
flailing for a while, dude. No ifs and buts, and don't insult our common
sense.

AHS
 
A

Arved Sandstrom

And yet managed code has failed to take off in the mass market. Why is that?

Dude, what do you consider to be managed code? It's not just .NET and
Java. Fact is, any language system that takes care of some details for
you that you need to deal with yourself in C or assembly has aspects of
management. This is a continuous spectrum, not an all or nothing.

As Spolsky puts it, if your language lets you concatenate strings and
not worry about how it happens, you've got managed code.
And yet it is the “managed†apps that tend to be the memory hogs.

Really? So C and C++ programs have never been accused of having memory
management issues. Interesting.

There is actually some truth to your observation however. Again, not for
the reasons you think. Since when you said "managed" you really meant
Java and .NET, I'll confine my remarks to those also. Anyway, Java and
C#, among others, can be abused by incompetent or unschooled or ignorant
programmers just like any other language can. If a programmer is shabby
at programming, and more specifically, shabby at OOP, they'll write crap
in C++ *and* Java *and* C#. With a C++ app it'll probably crash within
minutes and very possibly never ship. With Java and C# a mediocre coder
is much more likely to be able to release his poor code - humans being
humans, such a coder will blame the language for bloat and slowness and
errors.

If you run across a Java app that's a memory hog, why do you think it's
the fault of the language? It never occurred to you that it might be the
programmers? It's not like more than 25% (being charitable) of all
programmers working today should even be let near a keyboard, after all.

AHS
 
L

Lawrence D'Oliveiro

delete can basically also allow a VM to free stuff early, and thus
potentially improve overall performance.

Isn’t that conceding the point that automatic garbage collection saps
performance?
 
B

BGB

Isn’t that conceding the point that automatic garbage collection saps
performance?

I wasn't claiming it doesn't...


the merit of GC is that it can be easier to use, as it can serve as a
"safety net" for all of those objects which fail to get freed correctly
(or, rigged with some additional machinery, serve as a leak-detector and
provide partial diagnosis...).

the downside though is, of course, that performance can be lost, and if
the GC has to do its thing (GC cycles), this is not free either.

it depends though, as heavy use of RAII/Smart-Pointers and Pass-by-Copy,
which is "common" in a lot of C++ code, can actually manage to be slower
(a lot of C++ devs though use this in an attempt to reduce leaks without
going through the more costly process of determining exactly when and
where to free things as part of their code design, setting up "who owns
what" policies, and so on...).


personally, I am just thinking here that GC + the ability to free things
(basically, when one can determine for themselves when it is no longer
needed) allows combining the good points (combining the relative ease of
GC with a little more of the performance of manual MM).

I think Java just sort of left out delete due more to ideological
reasons though, when instead they could have treated it like a hint (if
the compiler or VM has good reason to doubt that the delete is valid, it
can make it no-op and/or raise an exception if used incorrectly).

say, program crashes with an exception
"java.lang.AccessFollowingDeleteError" or similar...


so, it is a tradeoff...
 
B

BGB

Dude, what do you consider to be managed code? It's not just .NET and
Java. Fact is, any language system that takes care of some details for
you that you need to deal with yourself in C or assembly has aspects of
management. This is a continuous spectrum, not an all or nothing.

As Spolsky puts it, if your language lets you concatenate strings and
not worry about how it happens, you've got managed code.


Really? So C and C++ programs have never been accused of having memory
management issues. Interesting.

There is actually some truth to your observation however. Again, not for
the reasons you think. Since when you said "managed" you really meant
Java and .NET, I'll confine my remarks to those also. Anyway, Java and
C#, among others, can be abused by incompetent or unschooled or ignorant
programmers just like any other language can. If a programmer is shabby
at programming, and more specifically, shabby at OOP, they'll write crap
in C++ *and* Java *and* C#. With a C++ app it'll probably crash within
minutes and very possibly never ship. With Java and C# a mediocre coder
is much more likely to be able to release his poor code - humans being
humans, such a coder will blame the language for bloat and slowness and
errors.

If you run across a Java app that's a memory hog, why do you think it's
the fault of the language? It never occurred to you that it might be the
programmers? It's not like more than 25% (being charitable) of all
programmers working today should even be let near a keyboard, after all.

well, I think it actually partly goes both ways.

while many programmers do suck... both Java and C# implement things in
many cases, in ways which are fairly costly...


for example, in C, a string is just a glob of 8-bit characters in
memory, and so doesn't really take much more memory than the space to
store these characters.

in Java, a "String" is a class instance containing an array of 16-bit
characters...

just at the outset, this is a good deal more expensive (I calculated for
my own technology, reaching an approx 7x difference for the memory cost
of storing the string "Hello"). granted, the JVM may have a lower base
overhead, and it will drop and approach 2x as the string gets longer (a
lot of the overhead was mostly related to the cost of the object
instance and array headers).

but, even 2x is still a significant space overhead... (due to UTF-16 vs
UTF-8...).


also, there are many places internally where "new" will be used in
copious amounts due to the basic design of the VM architecture, ...

a lot of this is still likely not exactly free either, and a lot of this
may add up...


or such...
 
A

Arved Sandstrom

well, I think it actually partly goes both ways.

while many programmers do suck... both Java and C# implement things in
many cases, in ways which are fairly costly...


for example, in C, a string is just a glob of 8-bit characters in
memory, and so doesn't really take much more memory than the space to
store these characters.

in Java, a "String" is a class instance containing an array of 16-bit
characters...

just at the outset, this is a good deal more expensive (I calculated for
my own technology, reaching an approx 7x difference for the memory cost
of storing the string "Hello"). granted, the JVM may have a lower base
overhead, and it will drop and approach 2x as the string gets longer (a
lot of the overhead was mostly related to the cost of the object
instance and array headers).

but, even 2x is still a significant space overhead... (due to UTF-16 vs
UTF-8...).

also, there are many places internally where "new" will be used in
copious amounts due to the basic design of the VM architecture, ...

a lot of this is still likely not exactly free either, and a lot of this
may add up...

or such...
No argument from me, but in over a decade of working with Java I've yet
to see a "memory hog" bloated application that couldn't have been
improved to make it acceptable. Which means it could have been written
that way to start with.

Good design helps a great deal - minimize coupling and you minimize the
number of references that are held, keeping other objects around. Cut
down on object lifetimes by creating them when definitely needed, and
make sure they are cut loose as soon as possible after their usefulness
is done. Re-use immutable value objects when possible (flyweight), or
singletons - try to recycle. Use the right data structures. Use pools.
Understand GC and the Reference API.

Etc etc etc. Jack Shirazi's "Java Performance Tuning" book came out in
2000, and it ought to have been a must read for every Java programmer
from the gitgo. I wonder what percentage of Java programmers ever read
it. A lot of it still holds true; there's plenty other updated material
to cover newer Java.

AHS
 
M

Michal Kleczek

BGB said:
also, possible though are some garbage reduction tricks:
ref-counting (as they can detect earlier objects that have died);
(heap-based) value types (because their lifespan behavior is trivial to
determine, and so one can allocate/free them aggressively);
...

ref-counting has the drawback though that it is very difficult to write
"general purpose" code and not screw up the ref-counts somewhere (which
can easily blow up the program), causing me to generally not use them.

There has been some work done to implement ref-counting GC in (Sun) JVM.
See:
http://www.cs.technion.ac.il/~erez/Papers/refcount.ps
The results were promissing but it has not been incorporated into production
JVM.
 
B

BGB

There has been some work done to implement ref-counting GC in (Sun) JVM.
See:
http://www.cs.technion.ac.il/~erez/Papers/refcount.ps
The results were promissing but it has not been incorporated into production
JVM.

yeah...

this is along with several other major features.


one of my own prior VMs used ref-counting, but this, combined with other
factors, made the VM in question very painful to work on (or interface
with).

my current VMs are much easier to work with, at a cost of being somewhat
less efficient.


another partial point of controversy in my current architecture, is that
the core type-system is based mostly on strings and "strcmp()". again it
was another tradeoff: strings were generally the least-effort option,
and eventually largely won out in their battle against tagged references
(which were technically "better", but also generally more of a pain to
work with, vs using raw pointers and "magic" types).


or such...
 
G

Gene Wirchenko

[snip]
for example, in C, a string is just a glob of 8-bit characters in
memory, and so doesn't really take much more memory than the space to
store these characters.

It can be, but it need not be. Some systems have a different
CHARBITS value. Some systems have each character in a larger data
chunk.

[snip]

Sincerely,

Gene Wirchenko
 
M

Michael Wojcik

Alessio said:
[*] I mean languages in which new projects are actively being written;
maintenance of old COBOL and FORTRAN code does not count.

New projects are actively being written in COBOL and Fortran. (I don't
know about FORTRAN; the name of that language went mixed-case in 1990.)

But, of course, on Usenet no one is expected to do any research or
know anything about the industry outside of their own fiefdom.
 
M

Michael Wojcik

Martin said:
[*] I mean languages in which new projects are actively being written;
maintenance of old COBOL and FORTRAN code does not count.
Garbage collection and stack management are largely irrelevant for COBOL
- all data space is declared statically and the ways in which PERFORM can
be used more or less guarantees that its use will not involve the stack.

That hasn't been true, in general, since COBOL-85. While there
certainly are still old COBOL applications with fixed-size memory
requirements, a great many written over the past quarter-century make
use of arbitrary subroutine call (ie, out-of-line perform and call)
patterns, and a smaller number use reentrancy and/or threading.

And there have been garbage-collected COBOLs at least since the first
OO COBOLs appeared in the 1990s.

There are also managed-code COBOLs. Fujitsu has .NET COBOL; so do we,
and we also have JVM COBOL.
I've never had much to do with Fortran, but from what I've seen of it
much the same would be true: no dynamically declared off-stack data and
the stack only used for subroutine calls, parameter passing and a
subroutine's local variables.

Fortran 90 added dynamic memory allocation to the standard. Prior to
that I believe some implementations offered it as an extension.

I don't know if anyone has a garbage-collected Fortran implementation.
 
M

Michael Wojcik

Lawrence said:
In message


Maybe it’s because the former do reference-counting (freeing up most things
the moment they become unreachable) while the latter don’t.

That's because reference counting has undesirable performance
characteristics (cache misses, pipeline stalls) due to reference
updates, and generational collectors often outperform reference counters.

There are algorithms which appear to have superior performance to
either pure generational collectors or naive hybrids (ie, a reference
counter with an occasional sweeping collector to catch dead
circular-referenced objects), such as ulterior reference counting.
(The basic idea is to pick an algorithm based on the expected
demographics of different groups of objects.) But they haven't caught
on widely; it's likely that the advantage isn't compelling for typical
applications.

In any case, I'm highly dubious of perceptions of languages as
"bloated" or otherwise. That's not likely to be a useful evaluation.
 
M

Michael Wojcik

Lawrence said:
Isn’t that conceding the point that automatic garbage collection saps
performance?

Research shows it does not, as a general rule.

See for example Blackburn & McKinley's paper on ulterior reference
counting. The generational GC in that study outperforms the
ref-counting GC in total test execution time. The incentive to
hybridize is reducing the GC pause time.

(And incidentally, reference-counting garbage collection is still
automatic garbage collection. And the "automatic" is redundant, too.)
 
M

Martin Gregorie

Martin said:
[*] I mean languages in which new projects are actively being written;
maintenance of old COBOL and FORTRAN code does not count.
Garbage collection and stack management are largely irrelevant for
COBOL - all data space is declared statically and the ways in which
PERFORM can be used more or less guarantees that its use will not
involve the stack.

That hasn't been true, in general, since COBOL-85. While there certainly
are still old COBOL applications with fixed-size memory requirements, a
great many written over the past quarter-century make use of arbitrary
subroutine call (ie, out-of-line perform and call) patterns, and a
smaller number use reentrancy and/or threading.
OK, I'd agree about dynamically loaded subroutines, though those scarcely
need a GC since the allocation/deallocation points are well known, but
are you saying that the content of WORKING-STORAGE can be dynamic now?
Last time I looked (that would have been COBOL-85 probably) you could
specify a variable OCCURS clause in a table declaration but memory was
always allocated for the maximum table size, in the implementations I
knew detail for, anyway.
And there have been garbage-collected COBOLs at least since the first OO
COBOLs appeared in the 1990s.
OK - but I've never used or seen those flavours.
There are also managed-code COBOLs. Fujitsu has .NET COBOL; so do we,
and we also have JVM COBOL.
I did use multi-threading in COBOL, but we managed all that in non-POSIX
C threads (on an NCR UNIX box with microFocus COBOL for the business
logic). I don't know when the code was initially written, though, so a
lot of than could have been a hang-over due to legacy code (the system
was Shared Financial's ON/X, a UNIX port of their ON/2 ATM management
system, which was originally written in Stratus COBOL).

Fortran 90 added dynamic memory allocation to the standard. Prior to
that I believe some implementations offered it as an extension.
I've had a very brief exposure to Fortran 77 - little more than pumping
XFOIL through GNU Fortran so I could run it - and that's about it. I was
very pleased to see that the nasty IF A=B 40,50,60 conditionals had at
last grown up into proper IF..ELSE conditionals.
 
B

BGB

[snip]
for example, in C, a string is just a glob of 8-bit characters in
memory, and so doesn't really take much more memory than the space to
store these characters.

It can be, but it need not be. Some systems have a different
CHARBITS value. Some systems have each character in a larger data
chunk.

yes, but the issue can be restated as "on pretty much any HW either
programmers or users are likely to deal with".

in this case, it is very unlikely to need to worry about HW with
characters with non-8-bit characters.

much like, the vast majority of normal computers also run x86, and on
x86, bytes are 8-bit (likewise goes for PPC, ARM, ...). everything else?
mostly irrelevant.

relatively under-used features, such as wide-character strings ("wchar_t
*str=L"...";" and likewise), are also being disregarded.


this means, in a general case:
C will need 8 bits per character, with little overhead apart from
in-memory storage;
Java will need 16 bits per character.


generally, Java stores String's as a class instance, where the class
holds an array. so, one also has to add in the overhead of storing a
instance and an array (for example, an 'Object', and the respective
memory headers for an instance and an array).

combining all of these, a Java implementation will generally have a
somewhat higher overhead in the cost of storing a string, as per the
number of in-memory bytes.


not that this may be a killer in itself, but it may add up...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top