Container library (continued)

J

jacob navia

Any container will add *some* complexity and overhead to the data it stores.

A container object in C will be never as fast as managing each individual
datum by hand, i.e. giving to each datum an address and managing each
datum individually, as it is done in assembly language.

A container allows for scalability precisely by simplifying data management.
So, we pay an overhead.

How much of an overhead?

Let's take lcc-win's strings. They encapsulate a String object, reduced
to the bare essentials:

typedef struct _StringA {
size_t count; // Elements
char *content;
size_t capacity; // Allocated space
} String;

In a 64 bit system, an 8 character string will take:

24+8 --> 32 bytes, i.e. an overhead of 300%.

In a 32 bit system we would have

12+8 --> 20 bytes, an overhead of 150%.

Let's compare this to Java's string class. There, a character string of 8
characters will take 64 bytes, and there is nothing that the programmer
can do about it.

In C++ the size of a string of 89 characters appears to be 40 bytes. I
used the following program:

D:\temp>type str.cpp
#include <string>
#include <iostream>

using namespace std;

int main(void)
{
string m("12345678");
cout << sizeof(m) << endl;
}

This outputs

40


In a C container library, it is possible to design a "Small string type"
(maybe in Java too, I do not know). That type can be restricted to
strings shorter than 65535 characters (probably 99.999999% of the
strings used in a program) If we do that, we can reduce the overhead from
24 to 16 bytes. The alignments requirements in 64 bits still hunt us.

If we get rid of the pointer however, and store the characters in a
structure with a variable "tail" as introduced by C99, we can curtail
the alignment requirements and we have an overhead of just 16 bytes:
two unsigned shorts that specify the length and the capacity, followed
by the actual data.

In C (99) we would have
typedef struct _smallString {
unsigned short count; // Elements
unsigned short capacity; // Allocated space
char contents[];
} smString;

In this case the overhead is 4 bytes, i.e. only 50%.

By making apparent the workings of the container, C has the advantage
of making the programmer aware of what he/she is paying for each
container. The big problem in Java and other Java-like languages
(like C#) is that programmers are not used (and even not supposed to)
design their own data types but to reuse some package that will do
the job without caring about possible overhead costs.

For a container library in C, fighting overhead and giving versions of
smaller data types that have less overhead will be an important point.

In a typical Java heap we have tens/hundreds of thousands, even millions
of live collections. [1]

Java heaps have grown from 500 MB to 2-3GB now, without supporting
more features or users. It is increasingly common to require 1GB of
memory just to support a few hundred users, saving 500K session state
PER USER, or requiring 2MB for a text index per simple document, or
creating 100K temporary objects per web hit.

The consequences are clear: scalability disappears, at several
thousand users, the Java solution will require more than the 16GB
installed RAM, and will start swapping, killing performance. Power
usage goes up, more machines need to be bought to support the
bloat. With more machines, more communications and more overhead,
etc.

It is common to propose here that C can't be used for normal
workstation applications. I am convinced that this is wrong. C
can be used for web servers and web applications, and with a reasonable
container library it could have a three "killer arguments" for its use:

Scalability, low overhead, performance.

jacob

(yes, I am biased)

[1]
http://domino.research.ibm.com/comm...ILE/oopsla08 memory-efficient java slides.pdf
 
J

jacob navia

jacob navia a écrit :
In C++ the size of a string of 89 characters appears to be 40 bytes.

Sorry that is a typo. Of course I am speaking about a character string
of 8 (eight) characters.

My finger hit the 9 besides the 8 key and I did not see it.

sorry
 
S

spinoza1111

Any container will add *some* complexity and overhead to the data it stores.

A container object in C will be never as fast as managing each individual
datum by hand, i.e. giving to each datum an address and managing each
datum individually, as it is done in assembly language.

A container allows for scalability precisely by simplifying data management.
So, we pay an overhead.

How much of an overhead?

Let's take lcc-win's strings. They encapsulate a String object, reduced
to the bare essentials:

typedef struct _StringA {
         size_t count; // Elements
         char *content;
         size_t capacity; // Allocated space

} String;

In a 64 bit system, an 8 character string will take:

24+8 --> 32 bytes, i.e. an overhead of 300%.

In a 32 bit system we would have

12+8 --> 20 bytes, an overhead of 150%.

Let's compare this to Java's string class. There, a character string of 8
characters will take 64 bytes, and there is nothing that the programmer
can do about it.

In C++ the size of a string of 89 characters appears to be 40 bytes. I
used the following program:

D:\temp>type str.cpp
#include <string>
#include <iostream>

using namespace std;

int main(void)
{
         string m("12345678");
         cout << sizeof(m) << endl;

}

This outputs

40

In a C container library, it is possible to design a "Small string type"
(maybe in Java too, I do not know). That type can be restricted to
strings shorter than 65535 characters (probably 99.999999% of  the
strings used in a program) If we do that, we can reduce the overhead from
24 to 16 bytes. The alignments requirements in 64 bits still hunt us.

If we get rid of the pointer however, and store the characters in a
structure with a variable "tail" as introduced by C99, we can curtail
the alignment requirements and we have an overhead of just 16 bytes:
two unsigned shorts that specify the length and the capacity, followed
by the actual data.

In C (99) we would have
typedef struct _smallString {
     unsigned short count; // Elements
     unsigned short capacity; // Allocated space
     char contents[];

} smString;

In this case the overhead is 4 bytes, i.e. only 50%.

By making apparent the workings of the container, C has the advantage
of making the programmer aware of what he/she is paying for each
container. The big problem in Java and other Java-like languages
(like C#) is that programmers are not used (and even not supposed to)
design their own data types but to reuse some package that will do
the job without caring about possible overhead costs.

For a container library in C, fighting overhead and giving versions of
smaller data types that have less overhead will be an important point.

In a typical Java heap we have tens/hundreds of thousands, even millions
of live collections. [1]

Java heaps have grown from 500 MB to 2-3GB now, without supporting
more features or users. It is increasingly common to require 1GB of
memory just to support a few hundred users, saving 500K session state
PER USER, or requiring 2MB for a text index per simple document, or
creating 100K temporary objects per web hit.

The consequences are clear: scalability disappears, at several
thousand users, the Java solution will require more than the 16GB
installed RAM, and will start swapping, killing performance. Power
usage goes up, more machines need to be bought to support the
bloat. With more machines, more communications and more overhead,
etc.

It is common to propose here that C can't be used for normal
workstation applications. I am convinced that this is wrong. C
can be used for web servers and web applications, and with a reasonable
container library it could have a three "killer arguments" for its use:

Scalability, low overhead, performance.

jacob

(yes, I am biased)

[1]http://domino.research.ibm.com/comm/research_people.nsf/pages/sevitsk....

Excellent post, Jacob, but I do have these comments.

Reuse is always optional in full programming language like Java and C
Sharp, and one can always reinvent the wheel.

I don't buy the argument that using C in and of itself gives the
programmer any special insight, because it's a variant of the old
argument that using assembler language was more manly. Both are
variants of the ethnocentric belief that one's own language is purer
and gooder. This barbaric belief is shared by many here, but I don't
think you subscribe to the sort of ethnocentrism that causes a German
man to attack Theodore Adorno on a tram in Frankfurt for speaking
"pretentious" High German.

Indeed, my sense here, based on the behavior of many posters (not you)
is that use of C creates low level psychological problems which if
untreated become asocial behavior (such as maliciously lying about
people).

Perhaps (and I'm going out on a limb) the daily use of a deontological
language, in which one barks commands, kills the human spirit. Perhaps
when I learn F Sharp I will be a better person. I want to make
assertions, I don't want to tell people or machines what to do. I want
to program like a poet writes poetry.

Sorry, I'm away with the fairies. Seriously: every time in the past I
have had to program in C I wind up redoing very basic facilities such
as strings. Why not just pay a performance penalty? Why does software
have to be so doggone fast, especially when its speed is being used by
cyberthugs to cheat other people (as in the case of investment
software that evades oversight by regulators by being hyperfast, and
American health insurance software that denies claims at warp speed)?

The greatest software pioneers, such as Grace Murray Hopper, were
willing to sacrifice speed so that they wouldn't have to stand in
front of plugboards for 14 hours at a stretch. John von Neumann was
horrified by their "misuse" of the computer to save themselves time,
for as an Hungarian aristocrat he believed that servants should mind
their place.

The macho approach to software is in part this constant effort (in
which I've participated) to go back to the immediate past, where the
past software is mythologized as more efficient.

But let me try something, right now. I'm gonna write a recursive
factorial with long long arithmetic in C (Microsoft C/C++
unfortunately) and C Sharp with the equivalent long arithmetic. Both
versions shall in a matched fashion calculate the factorial both
iteratively and recursively and do so one million times on my HP Mini
(Intel Atom) laptop.

Here is the C code.

#include <stdio.h>
#include <time.h>

long long factorial(long long N)
{
long long nFactorialRecursive;
long long nFactorialIterative;
long long Nwork;
if (N <= 2) return N;
for ( nFactorialIterative = 1, Nwork = N;
Nwork > 1;
Nwork-- )
nFactorialIterative *= Nwork;
nFactorialRecursive = N * factorial(N-1);
if (nFactorialRecursive != nFactorialIterative)
printf("The iterative factorial of %I64d is %I64d but its
recursive factorial is %I64d\n",
N,
nFactorialIterative,
nFactorialRecursive);
return nFactorialRecursive;
}

int main(void)
{
long long N;
long long Nfactorial;
double dif;
long long i;
long long K;
time_t start;
time_t end;
N = 19;
K = 1000000;
time (&start);
for (i = 0; i < K; i++)
Nfactorial = factorial(N);
time (&end);
dif = difftime (end,start);
printf("The factorial of %I64d is %I64d: %.2f seconds to calculate
%I64d times\n",
N, Nfactorial, dif, K);
return 0;
}


Here is the C Sharp code.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace N_factorial
{
class Program
{
static void Main(string[] args)
{
long N;
long Nfactorial = 0;
TimeSpan dif;
long i;
long K;
DateTime start;
DateTime end;
N = 19;
K = 1000000;
start = DateTime.Now;
for (i = 0; i < K; i++)
Nfactorial = factorial(N);
end = DateTime.Now;
dif = end - start;
Console.WriteLine
("The factorial of " +
N.ToString() + " is " +
Nfactorial.ToString() + ": " +
dif.ToString() + " " +
"seconds to calculate " +
K.ToString() + " times");
return;
}

static long factorial(long N)
{
long nFactorialRecursive;
long nFactorialIterative;
long Nwork;
if (N <= 2) return N;
for ( nFactorialIterative = 1, Nwork = N;
Nwork > 1;
Nwork-- )
nFactorialIterative *= Nwork;
nFactorialRecursive = N * factorial(N-1);
if (nFactorialRecursive != nFactorialIterative)
Console.WriteLine
("The iterative factorial of " +
N.ToString() + " " +
"is " +
nFactorialIterative.ToString() + " " +
"but its recursive factorial is " +
nFactorialRecursive.ToString());
return nFactorialRecursive;
}
}
}

Guess what. The C code takes 8 seconds. The C Sharp code takes 8.75
seconds. In varying the number whose factorial is calculated you
discover that the .75 seconds is a ONE-TIME cost in setting up the C
Sharp environment, not "interpretive overhead", since the C Sharp code
is threaded, and doesn't have an interpretive overhead.

I claim this is a small, small price to pay for a solution to C's
"forensic" problem, by which I mean the necessity, often ignored, of
investigating EACH AND EVERY LINE of a C code for portability and
modification issues when porting or modifying it. Hopefully, the
"forensic" time has a zero result, which is why it's ignored in
corporations: no dollar value can be assigned to labour that finds
that everything is OK.

So quite apart from your great compiler. Quite apart from the fact
that you are one of the few grownups here. I think you may have a
certain Chauvin-isme with respect to C, a certain nostalgie de la boue.
 
T

tea cup

jacob navia said:
The consequences are clear: scalability disappears, at several thousand
users, the Java solution will require more than the 16GB installed RAM,
and will start swapping, killing performance. Power usage goes up, more
machines need to be bought to support the bloat. With more machines,
more communications and more overhead, etc.

It is common to propose here that C can't be used for normal workstation
applications. I am convinced that this is wrong. C can be used for web
servers and web applications, and with a reasonable container library it
could have a three "killer arguments" for its use:

Scalability, low overhead, performance.

This is bullshit! Truly you know nothing and are a Naiva.

You don't seem to realize that *any* container object must have *some*
overhead! In Java (as in C++ or C#) the objects in the standard library
will be *heavily* optimized. The overhead will be *tiny* if you are
creating say a balanced tree with a million items and there will be a
similar overhead in C. If you're just creating a dozen items, who cares
what the overhead is?

Why choose 8 character strings as your example? Of course that is too
small for asymptotically good algorithms to be a win - you are cheating
like a politiican. Ask instead what is the overhead for strings with 1000
or 1000000 characters!

I think you should stick to hawking your compiler - more accurately the
compiler someone else created which you took and called your own, then
introduced a whole lot of bugs into. That compiler that you keep spamming
this newsgroup about. The compiler that you call a C compiler even though
it doesn't conform to any standard.

Naiva, too stupid to understand the benefits of Java and how to get them
by writing Java in the right way.
 
B

bartc

tea cup said:
jacob navia said:
Why choose 8 character strings as your example? Of course that is too
small for asymptotically good algorithms to be a win - you are cheating
like a politiican. Ask instead what is the overhead for strings with 1000
or 1000000 characters!

Probably 8 characters is a more typical size for a string than 1000 or
1000000 characters, if you are going to be storing large numbers of them.
 
I

Ian Collins

jacob said:
Any container will add *some* complexity and overhead to the data it
stores.

A container object in C will be never as fast as managing each individual
datum by hand, i.e. giving to each datum an address and managing each
datum individually, as it is done in assembly language.

A container allows for scalability precisely by simplifying data
management.
So, we pay an overhead.

How much of an overhead?

Let's take lcc-win's strings. They encapsulate a String object, reduced
to the bare essentials:

typedef struct _StringA {
size_t count; // Elements
char *content;
size_t capacity; // Allocated space
} String;

In a 64 bit system, an 8 character string will take:

24+8 --> 32 bytes, i.e. an overhead of 300%.

In a 32 bit system we would have

12+8 --> 20 bytes, an overhead of 150%.

Let's compare this to Java's string class. There, a character string of 8
characters will take 64 bytes, and there is nothing that the programmer
can do about it.

In C++ the size of a string of 89 characters appears to be 40 bytes. I
used the following program:

D:\temp>type str.cpp
#include <string>
#include <iostream>

using namespace std;

int main(void)
{
string m("12345678");
cout << sizeof(m) << endl;
}

This outputs

40

This is one library's implementation. g++ would give you 4 in 32 bit
mode. The compiler/library combination I normally use outputs 12. A
typical C++ standard string is very similar to your representation.
In a C container library, it is possible to design a "Small string type"
(maybe in Java too, I do not know). That type can be restricted to
strings shorter than 65535 characters (probably 99.999999% of the
strings used in a program) If we do that, we can reduce the overhead from
24 to 16 bytes. The alignments requirements in 64 bits still hunt us.

For very short strings, the data can overload the data pointer.
Java heaps have grown from 500 MB to 2-3GB now, without supporting
more features or users. It is increasingly common to require 1GB of
memory just to support a few hundred users, saving 500K session state
PER USER, or requiring 2MB for a text index per simple document, or
creating 100K temporary objects per web hit.

I'm not sure what you mean by a "Java heaps", but yes, server side Java
applications can require a lot of system memory. It's a good thing RAM
is almost inexpensive enough to give away in cornflakes packets these days!
The consequences are clear: scalability disappears, at several
thousand users, the Java solution will require more than the 16GB
installed RAM, and will start swapping, killing performance. Power
usage goes up, more machines need to be bought to support the
bloat. With more machines, more communications and more overhead,
etc.

Your string implementation would acquire more baggage to scale well in a
multi-threaded environment, which is included in the design of Java
strings. It's easier to to observe and manage the baggage in C and C++
(easiest in C++ with the help of templates), but it will still have to
be there if required.
 
J

jacob navia

Ian Collins a écrit :
>
> This is one library's implementation. g++ would give you 4 in 32 bit
> mode.

Wait. You mean I can store an 8 character string in 4 bytes?
Excuse me, I know C++ is great but it can't do that sorry...

> The compiler/library combination I normally use outputs 12. A
> typical C++ standard string is very similar to your representation.
>

Can you tell me what compiler? In 64 bit mode?
>
> For very short strings, the data can overload the data pointer.
>

Yes, see the implementation I propose a few lines after that sentence in my
message.

>
> I'm not sure what you mean by a "Java heaps", but yes, server side Java
> applications can require a lot of system memory. It's a good thing RAM
> is almost inexpensive enough to give away in cornflakes packets these days!
>

This is precisely the problem. You are a very common Java programmer. And
my dear friend, that is plainly NOT TRUE.

The biggest chinese PCs this days can hold at most 12 GB of RAM.

Any mother board/server with more than that will cost you a LOT of money since
you will need a high end mother board. This means that unless you spend much
more $$$ in hardsware you have around 12GB memory limit just because the slots
in your motherboard aren't extensible at will.

OF COURSE you can spend $$$ in another server, and put more disk/administration
time into YET ANOTHER SERVER, but that is surely not free.

But this kind of mentality (let's forget about memory costs) is what is
bringing the whole Java ecosystem down. Just go to the referenced slides and
read them and you will see that memory footprint is NOT AT ALL THAT CHEAP,
specially when you want to scale your application server to be able to serve
3000 users and not just 300.

That was the whole point of my article. Your answer confirms me that you (and
many other Java programmers) do not understand the problem at all.

>
> Your string implementation would acquire more baggage to scale well in a
> multi-threaded environment, which is included in the design of Java
> strings.

lcc-win proposes a multi threaded GC. This exists for C and doesn't add
much baggage.
 
J

jacob navia

bartc a écrit :
Probably 8 characters is a more typical size for a string than 1000 or
1000000 characters, if you are going to be storing large numbers of them.

Do not worry about that idiot. Each time I post something it will posts some
insults.

Just forget him.

:)
 
J

jacob navia

spinoza1111 a écrit :
>
>
> Indeed, my sense here, based on the behavior of many posters (not you)
> is that use of C creates low level psychological problems which if
> untreated become asocial behavior (such as maliciously lying about
> people).
>

Look spinoza111: You have absolutely no data to confirm your sayings.

C programming leads to asocial behavior?

And why not painting?

Or playing poker?

Or whatever?
>
> But let me try something, right now. I'm gonna write a recursive
> factorial with long long arithmetic in C (Microsoft C/C++
> unfortunately) and C Sharp with the equivalent long arithmetic. Both
> versions shall in a matched fashion calculate the factorial both
> iteratively and recursively and do so one million times on my HP Mini
> (Intel Atom) laptop.
>
> Here is the C code.
>
> #include <stdio.h>
> #include <time.h>
>
> long long factorial(long long N)
> {
> long long nFactorialRecursive;
> long long nFactorialIterative;
> long long Nwork;
> if (N <= 2) return N;
> for ( nFactorialIterative = 1, Nwork = N;
> Nwork > 1;
> Nwork-- )
> nFactorialIterative *= Nwork;
> nFactorialRecursive = N * factorial(N-1);
> if (nFactorialRecursive != nFactorialIterative)
> printf("The iterative factorial of %I64d is %I64d but its
> recursive factorial is %I64d\n",
> N,
> nFactorialIterative,
> nFactorialRecursive);
> return nFactorialRecursive;
> }
>
> int main(void)
> {
> long long N;
> long long Nfactorial;
> double dif;
> long long i;
> long long K;
> time_t start;
> time_t end;
> N = 19;
> K = 1000000;
> time (&start);
> for (i = 0; i < K; i++)
> Nfactorial = factorial(N);
> time (&end);
> dif = difftime (end,start);
> printf("The factorial of %I64d is %I64d: %.2f seconds to calculate
> %I64d times\n",
> N, Nfactorial, dif, K);
> return 0;
> }
>
>
> Here is the C Sharp code.
>
> using System;
> using System.Collections.Generic;
> using System.Linq;
> using System.Text;
>
> namespace N_factorial
> {
> class Program
> {
> static void Main(string[] args)
> {
> long N;
> long Nfactorial = 0;
> TimeSpan dif;
> long i;
> long K;
> DateTime start;
> DateTime end;
> N = 19;
> K = 1000000;
> start = DateTime.Now;
> for (i = 0; i < K; i++)
> Nfactorial = factorial(N);
> end = DateTime.Now;
> dif = end - start;
> Console.WriteLine
> ("The factorial of " +
> N.ToString() + " is " +
> Nfactorial.ToString() + ": " +
> dif.ToString() + " " +
> "seconds to calculate " +
> K.ToString() + " times");
> return;
> }
>
> static long factorial(long N)
> {
> long nFactorialRecursive;
> long nFactorialIterative;
> long Nwork;
> if (N <= 2) return N;
> for ( nFactorialIterative = 1, Nwork = N;
> Nwork > 1;
> Nwork-- )
> nFactorialIterative *= Nwork;
> nFactorialRecursive = N * factorial(N-1);
> if (nFactorialRecursive != nFactorialIterative)
> Console.WriteLine
> ("The iterative factorial of " +
> N.ToString() + " " +
> "is " +
> nFactorialIterative.ToString() + " " +
> "but its recursive factorial is " +
> nFactorialRecursive.ToString());
> return nFactorialRecursive;
> }
> }
> }
>
> Guess what. The C code takes 8 seconds. The C Sharp code takes 8.75
> seconds. In varying the number whose factorial is calculated you
> discover that the .75 seconds is a ONE-TIME cost in setting up the C
> Sharp environment, not "interpretive overhead", since the C Sharp code
> is threaded, and doesn't have an interpretive overhead.
>

You are measuring just function calls. Not container overhead.

Anyway, the C solution uses 800K, the C# solution 3652K.

I am speaking about container overhead and code bloat. You haven't
answere anything about what I said. Your program is trivial
and will be probably optimized by the C sharp compiler...

If you use an optimizing compiler like microsoft compiler C
takes approx 50% of the time of C#...
 
K

Keith Thompson

jacob navia said:
In C++ the size of a string of 89 characters appears to be 40 bytes. I
used the following program:

(As you acknowledged in a followup, "89" was a typo for "8").
D:\temp>type str.cpp
#include <string>
#include <iostream>

using namespace std;

int main(void)
{
string m("12345678");
cout << sizeof(m) << endl;
}

This outputs

40
[...]

Applying sizeof to a std::string object doesn't appear to be a
reliable way to determine how much space is allocated for it.
On my system (Ubuntu, g++; I'm not sure where the C++ library
implementation is from), the following program:

#include <string>
#include <iostream>

using namespace std;

int main(void)
{
string m8("12345678");
string m256("12345678901234567890123456789012345678901234567890"
"12345678901234567890123456789012345678901234567890"
"12345678901234567890123456789012345678901234567890"
"12345678901234567890123456789012345678901234567890"
"12345678901234567890123456789012345678901234567890"
"123456");
cout << "sizeof m8 = " << sizeof m8 << endl;
cout << "sizeof m256 = " << sizeof m256 << endl;
}

produces this output:

sizeof m8 = 4
sizeof m256 = 4

Obviously the actual data is stored elsewhere. I suspect the same is
true for whatever implementation you're using; 40 bytes is enough to
store your 8-character string, but I doubt that's what it's actually
being used for. In C, sizeof yields only a compile-time value except
for VLAs. C++ doesn't have VLAs, and I think its sizeof operator
behaves similarly, so I don't think sizeof *can* tell you anything
about the total allocated size of a string object.

I suggest trying my program on your implementation.

(Yes, C++ is generally off-topic, but we're discussing how a C
container-like thing compares to similar constructs in other
languages.)

[jacob: I haven't kept close track, but I don't think you've replied
to anything I've posted recently. Are you ignoring my posts?
Of course you have every right to do so, but I'd like to know.]

Merry Christmas!
 
I

Ian Collins

jacob said:
Ian Collins a écrit :

Wait. You mean I can store an 8 character string in 4 bytes?
Excuse me, I know C++ is great but it can't do that sorry...

sizeof calculates the size of the struct/class (in C and C++), not the
size of the data a pointer member points to! Your string class would
output 12 or 24 in your sample code.
Can you tell me what compiler? In 64 bit mode?

Sun CC, using stlport. It outputs 24 in 64 bit mode.
Yes, see the implementation I propose a few lines after that sentence in my
message.

I was proposing doing this with the string struct, not in a different
one. This would add the overhead of a flag.
This is precisely the problem. You are a very common Java programmer. And
my dear friend, that is plainly NOT TRUE.

I have never written a non-trivial Java application, so no, I'm not a
"common Java programmer". I prefer C, C++ and PHP for my web
applications. Anyone who has done battle with the likes of Tomcat
probably has less hair left then me...
The biggest chinese PCs this days can hold at most 12 GB of RAM.

That's the limit on most core i7 motherboards.
Any mother board/server with more than that will cost you a LOT of money
since
you will need a high end mother board. This means that unless you spend
much
more $$$ in hardsware you have around 12GB memory limit just because the
slots
in your motherboard aren't extensible at will.

Dual Xeon 5500 series motherboards are certainly affordable and these
can hold a lot more RAM. Even a Mac Pro (which uses one) isn't that
expansive these days.
But this kind of mentality (let's forget about memory costs) is what is
bringing the whole Java ecosystem down. Just go to the referenced slides
and
read them and you will see that memory footprint is NOT AT ALL THAT CHEAP,
specially when you want to scale your application server to be able to
serve
3000 users and not just 300.

You won't hear me argue that one. See this link for what a modern
system can do:

http://www.theregister.co.uk/2009/12/09/sun_sap_tests/
That was the whole point of my article. Your answer confirms me that you
(and
many other Java programmers) do not understand the problem at all.

Oh but I do, which is one reason I'm not a Java programmer! My work is
split between systems like the one in the ling and embedded devices.
lcc-win proposes a multi threaded GC. This exists for C and doesn't add
much baggage.

No, but adding thread safety will add some baggage to a string struct
even if it's not very much.
 
J

jacob navia

Flash Gordon a écrit :
No, you don't have to go that far up market to have more than 12GB of
RAM in a server. I know that because we have a lot more than that in ours.

If you use a cheap motherboard you have 4 slots. 4GB chips are still expensive
but not SO expenseive. Your limit is 16GB then.

Obviously there are much better motherboards. A Dell Poweredge server M710
has 18 slots, you can put 72GB of RAM in it. The server is around
US$ 2000 and the RAM would cost you around US$ 6 000. You would have
costs of US$ 8000.

Now, if for a hundred connections JAVA uses 1GB, you
*could* run up to 7000 connections in JAVA...

If you have your server in C, and you can get 1000 connections in 1GB
you can buy a machine like this one, with 12GB, and be done with US$ 500.

But it is not ONLY the RAM usage that kills JAVA. As you know, RAM is
very slow compared to the processor speed. Huge programs kill the caches
of the processor and slowdown the program even more.

Let's speak with an example: [1]

To transform a year with 4 digits (i.e. YYYY format) into a number, you
call in Java the atoi equivalent DecimalFormat.Parse()

This is a VERY general routine, that

(1) takes each digits and builds a LIST of digits. It builds a list
container object.

(2) Then, it copies the digits into a StringBuffer object.

(3) Then it copies the buffer into a String object.

(4) Then, it parses the string and produces a 64 bit long result.

Cost: 4 calls, 3 objects, 600 instructions (!!!)

THEN,

It "boxes" the value into a boxed long, that needs to be
unboxed to yield (at last) the int that the user is expecting.

We have in total 11 calls and 5 temporary objects created.

Google has proposed a "C" like language last month, and this is
not a coincidence. Java (and its clone C#) is just not scalable!

jacob

[1]
http://domino.research.ibm.com/comm...ges/nickmitchell.pubs.html/$FILE/lcsd2005.pdf
 
F

Flash Gordon

jacob said:
Flash Gordon a écrit :

If you use a cheap motherboard you have 4 slots. 4GB chips are still
expensive
but not SO expenseive. Your limit is 16GB then.

Obviously there are much better motherboards. A Dell Poweredge server M710
has 18 slots, you can put 72GB of RAM in it. The server is around
US$ 2000 and the RAM would cost you around US$ 6 000. You would have
costs of US$ 8000.

There is stuff in between those two extremes.
Now, if for a hundred connections JAVA uses 1GB, you
*could* run up to 7000 connections in JAVA...

<snip>

You snipped without marking the bit where I *agreed* with you that not
caring about RAM usage was a problem.
 
S

spinoza1111

jacob navia said:




This is bullshit! Truly you know nothing and are a Naiva.

You don't seem to realize that *any* container object must have *some*
overhead! In Java (as in C++ or C#) the objects in the standard library
will be *heavily* optimized. The overhead will be *tiny* if you are
creating say a balanced tree with a million items and there will be a
similar overhead in C. If you're just creating a dozen items, who cares
what the overhead is?

Why choose 8 character strings as your example? Of course that is too
small for asymptotically good algorithms to be a win - you are cheating
like a politiican. Ask instead what is the overhead for strings with 1000
or 1000000 characters!

I think you should stick to hawking your compiler - more accurately the
compiler someone else created which you took and called your own, then
introduced a whole lot of bugs into. That compiler that you keep spamming
this newsgroup about. The compiler that you call a C compiler even though
it doesn't conform to any standard.

Naiva, too stupid to understand the benefits of Java and how to get them
by writing Java in the right way.

Don't mock people's names. Furthermore, evaluate his ideas on their
own merits, not based on second-hand information which you repeat
because the "cool" guys say them.
 
S

spinoza1111

spinoza1111 a écrit :
 >
 >
 > Indeed, my sense here, based on the behavior of many posters (not you)
 > is that use of C creates low level psychological problems which if
 > untreated become asocial behavior (such as maliciously lying about
 > people).
 >

Look spinoza111: You have absolutely no data to confirm your sayings.

C programming leads to asocial behavior?

My data is the behavior of many (not all) C experts here
And why not painting?

Or playing poker?

Depends on whether you're a compulsive gambler
Or whatever?

 >
 > But let me try something, right now. I'm gonna write a recursive
 > factorial with long long arithmetic in C (Microsoft C/C++
 > unfortunately) and C Sharp with the equivalent long arithmetic. Both
 > versions shall in a matched fashion calculate the factorial both
 > iteratively and recursively and do so one million times on my HP Mini
 > (Intel Atom) laptop.
 >
 > Here is the C code.
 >
 > #include <stdio.h>
 > #include <time.h>
 >
 > long long factorial(long long N)
 > {
 >     long long nFactorialRecursive;
 >     long long nFactorialIterative;
 >     long long Nwork;
 >     if (N <= 2) return N;
 >     for ( nFactorialIterative = 1, Nwork = N;
 >           Nwork > 1;
 >           Nwork-- )
 >          nFactorialIterative *= Nwork;
 >     nFactorialRecursive = N * factorial(N-1);
 >     if (nFactorialRecursive != nFactorialIterative)
 >        printf("The iterative factorial of %I64d is %I64d but its
 > recursive factorial is %I64d\n",
 >         N,
 >               nFactorialIterative,
 >               nFactorialRecursive);
 >     return nFactorialRecursive;
 > }
 >
 > int main(void)
 > {
 >     long long N;
 >     long long Nfactorial;
 >     double dif;
 >     long long i;
 >     long long K;
 >     time_t start;
 >     time_t end;
 >     N = 19;
 >     K = 1000000;
 >     time (&start);
 >     for (i = 0; i < K; i++)
 >         Nfactorial = factorial(N);
 >     time (&end);
 >     dif = difftime (end,start);
 >     printf("The factorial of %I64d is %I64d: %.2f seconds to calculate
 > %I64d times\n",
 >            N, Nfactorial, dif, K);
 >     return 0;
 > }
 >
 >
 > Here is the C Sharp code.
 >
 > using System;
 > using System.Collections.Generic;
 > using System.Linq;
 > using System.Text;
 >
 > namespace N_factorial
 > {
 >     class Program
 >     {
 >         static void Main(string[] args)
 >         {
 >             long N;
 >       long Nfactorial = 0;
 >             TimeSpan dif;
 >       long i;
 >       long K;
 >             DateTime start;
 >             DateTime end;
 >             N = 19;
 >       K = 1000000;
 >             start = DateTime.Now;
 >             for (i = 0; i < K; i++)
 >           Nfactorial = factorial(N);
 >             end = DateTime.Now;
 >             dif = end - start;
 >       Console.WriteLine
 >                 ("The factorial of " +
 >                  N.ToString() + " is " +
 >                  Nfactorial.ToString() + ": " +
 >                  dif.ToString() + " " +
 >                  "seconds to calculate " +
 >                  K.ToString() + " times");
 >             return;
 >         }
 >
 >         static long factorial(long N)
 >         {
 >       long nFactorialRecursive;
 >       long nFactorialIterative;
 >       long Nwork;
 >       if (N <= 2) return N;
 >       for ( nFactorialIterative = 1, Nwork = N;
 >             Nwork > 1;
 >             Nwork-- )
 >           nFactorialIterative *= Nwork;
 >       nFactorialRecursive = N * factorial(N-1);
 >       if (nFactorialRecursive != nFactorialIterative)
 >           Console.WriteLine
 >                 ("The iterative factorial of " +
 >                  N.ToString() + " " +
 >                  "is " +
 >                  nFactorialIterative.ToString() + " " +
 >                  "but its recursive factorial is " +
 >                  nFactorialRecursive.ToString());
 >       return nFactorialRecursive;
 >         }
 >     }
 > }
 >
 > Guess what. The C code takes 8 seconds. The C Sharp code takes 8.75
 > seconds. In varying the number whose factorial is calculated you
 > discover that the .75 seconds is a ONE-TIME cost in setting up the C
 > Sharp environment, not "interpretive overhead", since the C Sharp code
 > is threaded, and doesn't have an interpretive overhead.
 >

You are measuring just function calls. Not container overhead.

Anyway, the C solution uses 800K, the C# solution 3652K.

Good point. I misunderstood your original article. So you want to save
storage, I see that now.
I am speaking about container overhead and code bloat. You haven't
answere anything about what I said. Your program is trivial
and will be probably optimized by the C sharp compiler...

If you use an optimizing compiler like microsoft compiler C
takes approx 50% of the time of C#...

No, this isn't my experience.

But my question is why worry about speed. Or memory "bloat".

CPUs and memories are getting cheap, and Moore's law applies.

By wanting to stay in C, you take upon yourself the forensic
responsibility that your code will be changed by others to a
pathological state. This risk is lower in C Sharp.

I didn't understand your point: you don't understand mine.
"Optimizing" should not include converting to a legacy language like
C...in my opinion.
 
J

jacob navia

Flash Gordon a écrit :
You snipped without marking the bit where I *agreed* with you that not
caring about RAM usage was a problem.

Yes, precisely because we agreeed on that. There was no point in
discussing that issue further.

And my remarks weren't mean as an attack or whatever!

-:)
 
F

Flash Gordon

jacob said:
Flash Gordon a écrit :

Yes, precisely because we agreeed on that. There was no point in
discussing that issue further.

And my remarks weren't mean as an attack or whatever!

-:)

Fine. That was not clear based on what you left in. No harm no foul, as
they say.
 
J

jacob navia

C++ strings have at least 40 bytes overhead in the implementation I used.
A string of 8 characters would take 48 at least, not counting
malloc overhead.

The implementation of lcc-win takes much less overhead (24).

If it would implement the small string feature it would be only
4 bytes overhead.
 
J

jacob navia

Gareth Owen a écrit :
In C++, however, one can do this:

print_sorted_list (const vector<string>& strvec)
{
vector<string> mycopy = strvec;
sort(mycopy.begin(),mycopy.end());
for(size_t idx=0;idx < strvec.size(); ++idx) cout << mycopy[idx];
}

It's not the *most* efficient, but it works *exactly* as expected...
I'd be interested to see your implementation of the same function.

int main(void)
{
StringCollection *c = newStringCollection(4);
c[0] = "First";
c[1] = "Second";
c[2] = "Third";
c[3] = "Fourth";
StringCollection *sorted = c->lpVtbl->Sort(c);
for (int i=0; i<sorted->count;i++)
printf("%s\n",sorted);
return 0;
}
 
J

jacob navia

Gareth Owen a écrit :
jacob navia said:
int main(void)
{
StringCollection *c = newStringCollection(4);
c[0] = "First";
c[1] = "Second";
c[2] = "Third";
c[3] = "Fourth";
StringCollection *sorted = c->lpVtbl->Sort(c);
for (int i=0; i<sorted->count;i++)
printf("%s\n",sorted);
return 0;
}


That's a program, not a function.
Yes.

Does this keep the original
(i.e. unsorted) string collection around?
Yes.

If not, it doesn't do what
the original function did.


I think I will add a destructive sort later, that will sort
the string collection in place.
If so, does it copy the strings "First", "Second" etc, or just
manipulate the pointers?

No, it copies the strings also. This should be user configurable but
at this time it is not. The problem with NOT copying the
strings is that the freeing of the allocated memory becomes too complex
unless there is a package to "intern" strings into a fixed string pool,
using reference counts.
If not, can it be made to work with dynamically allocated strings, and
if so, which StringCollection is responsible for deallocating those
strings when the collection goes out of scope?

Strings of the collection are owned by the collection,
and will be destroyed by the collection when you call the "Finalize" method.


For all containers, the "Finalize" method deallocates ALL memory associated
with the container (unless you are using the GC).
Can you rephrase this as a function that takes a logically-const
StringCollection*, i.e. that prints the list without mutating original
string collection?

You have just to use "Apply". This function calls a user defined
function that takes one argument: a member of the collection, and
calls it for each element in the collection. Printing the collection
is just "Apply" with a funbction that prints the given element.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top