Static-duration array objects beefing up the executable file size

  • Thread starter Tomás Ó hÉilidhe
  • Start date
T

Tomás Ó hÉilidhe

I'm currently writing a program that contains a large static-duration
object, about 240 kilobytes in size, that I use for storing
information gathered at runtime.

I've been playing around with GCC and I found something interesting.

If you compile the following:

int arr[999999];

int main(void)
{
return 0;
}

Then you get a small executable. Similarly you get a small executable
for the following:

int arr[999999] = { 0,0,0,0,0,0,0 };

However, if you set even one of the elements to something other than
zero, then you end up with a big executable:

int arr[999999] = { 1 };

So of course you'd be better off to do:

int arr[999999];
arr[0] = 1;
 
V

vippstar

I'm currently writing a program that contains a large static-duration
object, about 240 kilobytes in size,

Your program is allowed to be rejected by all implementations. The
maximum object size an implementation must accept once in a program is
65535 in C99 and 32767 in C90.
that I use for storing information gathered at runtime.

I've been playing around with GCC and I found something interesting.

If you compile the following:

int arr[999999];

int main(void)
{
return 0;

}

Then you get a small executable. Similarly you get a small executable
for the following:

int arr[999999] = { 0,0,0,0,0,0,0 };

However, if you set even one of the elements to something other than
zero, then you end up with a big executable:

int arr[999999] = { 1 };

So of course you'd be better off to do:

int arr[999999];
arr[0] = 1;

What's your query? If there's no query, what made you think
comp.lang.c could make use of an informative post about gcc?
 
R

Richard

I'm currently writing a program that contains a large static-duration
object, about 240 kilobytes in size,

Your program is allowed to be rejected by all implementations. The
maximum object size an implementation must accept once in a program is
65535 in C99 and 32767 in C90.
that I use for storing information gathered at runtime.

I've been playing around with GCC and I found something interesting.

If you compile the following:

int arr[999999];

int main(void)
{
return 0;

}

Then you get a small executable. Similarly you get a small executable
for the following:

int arr[999999] = { 0,0,0,0,0,0,0 };

However, if you set even one of the elements to something other than
zero, then you end up with a big executable:

int arr[999999] = { 1 };

So of course you'd be better off to do:

int arr[999999];
arr[0] = 1;

What's your query? If there's no query, what made you think
comp.lang.c could make use of an informative post about gcc?

Because its about real world C. And in the real world people use real
compilers. And those real compilers include gcc. And of those real
programmers using real compilers in the real world they come to real
news groups to learn about C. And this newsgroup is called comp.lang.c.

I will leave you to join the dots .....
 
P

Phil Carmody

Richard said:
I'm currently writing a program that contains a large static-duration
object, about 240 kilobytes in size,

Your program is allowed to be rejected by all implementations. The
maximum object size an implementation must accept once in a program is
65535 in C99 and 32767 in C90.
that I use for storing information gathered at runtime.

I've been playing around with GCC and I found something interesting.

If you compile the following:

int arr[999999];

int main(void)
{
return 0;

}

Then you get a small executable. Similarly you get a small executable
for the following:

int arr[999999] = { 0,0,0,0,0,0,0 };

However, if you set even one of the elements to something other than
zero, then you end up with a big executable:

int arr[999999] = { 1 };

So of course you'd be better off to do:

int arr[999999];
arr[0] = 1;

What's your query? If there's no query, what made you think
comp.lang.c could make use of an informative post about gcc?

Because its about real world C. And in the real world people use real
compilers. And those real compilers include gcc. And of those real
programmers using real compilers in the real world they come to real
news groups to learn about C.

But information specifically about gcc doesn't tell them anything
about C. Real world programmers interface with windowing systems,
but here's not a place to discuss windowing systems. Real world
programmers interface with make, but here's not a place to discuss
make. Real world programmers interface with version control systems,
but here's not a place to discuss version control systems. Etc.
And this newsgroup is called comp.lang.c.

Yes, not comp.lang.c.gcc .
I will leave you to join the dots .....

You appear to be quacking like a troll, so I'll add 2 'i's to that:

You will leave us to join the idiots:

*plonk*

Phil
 
R

Richard

Phil Carmody said:
Richard said:
I'm currently writing a program that contains a large static-duration
object, about 240 kilobytes in size,

Your program is allowed to be rejected by all implementations. The
maximum object size an implementation must accept once in a program is
65535 in C99 and 32767 in C90.

that I use for storing information gathered at runtime.

I've been playing around with GCC and I found something interesting.

If you compile the following:

int arr[999999];

int main(void)
{
return 0;

}

Then you get a small executable. Similarly you get a small executable
for the following:

int arr[999999] = { 0,0,0,0,0,0,0 };

However, if you set even one of the elements to something other than
zero, then you end up with a big executable:

int arr[999999] = { 1 };

So of course you'd be better off to do:

int arr[999999];
arr[0] = 1;

What's your query? If there's no query, what made you think
comp.lang.c could make use of an informative post about gcc?

Because its about real world C. And in the real world people use real
compilers. And those real compilers include gcc. And of those real
programmers using real compilers in the real world they come to real
news groups to learn about C.

But information specifically about gcc doesn't tell them anything
about C. Real world programmers interface with windowing systems,
but here's not a place to discuss windowing systems. Real world

Get a life. It was nothing more than a factual anecdote that C
programmers might be interested in. I was.
 
K

Keith Thompson

Anthony Fremont said:
Tomás Ó hÉilidhe wrote:
However, if you set even one of the elements to something other than
zero, then you end up with a big executable:

int arr[999999] = { 1 };

So of course you'd be better off to do:

int arr[999999];
arr[0] = 1;

But that doesn't initialize the rest of the array elements with 0.

Static objects are implicitly initialized to zero.
 
K

Keith Thompson

Anthony Fremont said:
Keith said:
Anthony Fremont said:
Tomás Ó hÉilidhe wrote:
<snip>
However, if you set even one of the elements to something other than
zero, then you end up with a big executable:

int arr[999999] = { 1 };

So of course you'd be better off to do:

int arr[999999];
arr[0] = 1;

But that doesn't initialize the rest of the array elements with 0.

Static objects are implicitly initialized to zero.

That's true, I was completely overlooking the fact that the OP specified
static. That seems odd that the compiler would generate the huge
initializer then.

It doesn't seem odd to me. The object is initialized to a non-zero
value; the obvious way to implement that is to generate a copy if the
initial value in the generated code.

The compiler *could* optimize cases where the initial value is mostly
zero; apparently it doesn't. It's not clear that such initializations
are common enough to be worth the effort of detecting them and
treating them specially.
 
T

Tomás Ó hÉilidhe

The compiler *could* optimize cases where the initial value is mostly
zero; apparently it doesn't.  It's not clear that such initializations
are common enough to be worth the effort of detecting them and
treating them specially.


And if you work for Microsoft you'll be scorned for working on
compiler features when you /should/ be getting the bounce just right
for when the user maximises a window.
 
N

Nate Eldredge

Tomás Ó hÉilidhe said:
I'm currently writing a program that contains a large static-duration
object, about 240 kilobytes in size, that I use for storing
information gathered at runtime.

I've been playing around with GCC and I found something interesting.

If you compile the following:

int arr[999999];

int main(void)
{
return 0;
}

Then you get a small executable. Similarly you get a small executable
for the following:

int arr[999999] = { 0,0,0,0,0,0,0 };

However, if you set even one of the elements to something other than
zero, then you end up with a big executable:

int arr[999999] = { 1 };

So of course you'd be better off to do:

int arr[999999];
arr[0] = 1;

Right. At least on Unix, the executable format includes (among others)
two different "sections" for data, one called .data and the other called
..bss. The initial contents of .data are loaded from the executable,
while .bss is initialized to zero by the operating system. So the
executable needs to contain the whole contents of .data, but only a
size for .bss.

Naturally the compiler will arrange that any object that's initialized
to all zeros goes in the .bss section, including objects of static
duration which get the default zero initialization. (This works best
when NULL pointers and floating-point zeros are also all zero, which is
typically true on Unix systems.) Any object that isn't completely
initialized to zero, like in your example, has to go in .data, which
means the whole thing gets stored in the executable, even if most of
that happens to be zeros.

You can certainly imagine an optimization like the one you mention,
where an object that's mostly zero could go in .bss and have the nonzero
parts filled in automatically as the program starts. This turns out to
be tricky to implement. You might imagine that the initialization code
could be invisibly inserted at the beginning of `main', but this is hard
to do if the object is defined in a different source file from `main'
which is compiled separately and linked later. One option might be for
the compiler to take advantage of the "constructor" features that the
linker usually supplies to support C++, where static-duration objects
may have constructors that need to run before `main'. But there may be
code that runs earlier still that needs the values of initialized
static-duration objects to be correct: the library's startup code, or
the programmer taking advantage of the same "constructor" feature, or
doing some other system-specific magic.

So in short, you might be able to make this optimization work if you
only needed to support ISO C programs. But real compilers generally
prefer to allow programmers to take advantage of non-portable features
of the system, and that makes it much harder to get this optimization
right. So therefore it's normally not attempted. If you want it
done, you need to do it yourself, and that leaves it up to you to make
sure the initialization is done at a time that works for you.
 
C

CBFalconer

.... snip ...
However, if you set even one of the elements to something other
than zero, then you end up with a big executable:

int arr[999999] = { 1 };

So of course you'd be better off to do:

int arr[999999];
arr[0] = 1;

What's your query? If there's no query, what made you think
comp.lang.c could make use of an informative post about gcc?

Tomas could have found that out by examining the manuals for gcc.
However, that is unimportant. What does seem important is why you
think that informative posts are not suitable for c.l.c.
 
R

Richard

CBFalconer said:
... snip ...
However, if you set even one of the elements to something other
than zero, then you end up with a big executable:

int arr[999999] = { 1 };

So of course you'd be better off to do:

int arr[999999];
arr[0] = 1;

What's your query? If there's no query, what made you think
comp.lang.c could make use of an informative post about gcc?

Tomas could have found that out by examining the manuals for gcc.
However, that is unimportant. What does seem important is why you
think that informative posts are not suitable for c.l.c.

Because it would appear that in my absence vippstar has completed his
metamorphosis into a group reg. This involves picking at any and all
posts and generally bullying anyone that does not meet his small minded
view of what this group is for. I suspect he has been saying "indeed" a
lot too but can't be arsed to Google.

The post by Thomas was C related and perfectly on topic for C
programmers to be aware of.
 
V

vippstar

Tomás Ó hÉilidhe <[email protected]> wrote:

... snip ...
However, if you set even one of the elements to something other
than zero, then you end up with a big executable:
int arr[999999] = { 1 };
So of course you'd be better off to do:
int arr[999999];
arr[0] = 1;
What's your query? If there's no query, what made you think
comp.lang.c could make use of an informative post about gcc?

Tomas could have found that out by examining the manuals for gcc.
However, that is unimportant. What does seem important is why you
think that informative posts are not suitable for c.l.c.

These were not my words. I said "informative post about gcc". I did
not say "informative post". An informative post on x86 is informative,
but not topical. An informative post on C is informative & topical. I
only implied that informative posts about gcc are NOT topical.

You seem to have developed the habit of replying before fully reading
the post. If you don't change your ways I'll ignore any similar
messages from you in the future.
 
R

Richard

Tomás Ó hÉilidhe <[email protected]> wrote:

... snip ...
However, if you set even one of the elements to something other
than zero, then you end up with a big executable:
int arr[999999] = { 1 };
So of course you'd be better off to do:
int arr[999999];
arr[0] = 1;
What's your query? If there's no query, what made you think
comp.lang.c could make use of an informative post about gcc?

Tomas could have found that out by examining the manuals for gcc.
However, that is unimportant. What does seem important is why you
think that informative posts are not suitable for c.l.c.

These were not my words. I said "informative post about gcc". I did
not say "informative post". An informative post on x86 is informative,
but not topical. An informative post on C is informative & topical. I
only implied that informative posts about gcc are NOT topical.

gcc is C related. You do know what gcc is I suppose? And this issue
might WELL relate to how people choose Standard C constructs to build a
program. So, basically, you fail the interview.
You seem to have developed the habit of replying before fully reading
the post. If you don't change your ways I'll ignore any similar
messages from you in the future.

ROTFLM

Take the stick out you pompous little man.
 
F

Flash Gordon

Nate Eldredge wrote, On 21/12/08 18:41:
Tomás Ó hÉilidhe said:
I'm currently writing a program that contains a large static-duration
object, about 240 kilobytes in size, that I use for storing
information gathered at runtime.

I've been playing around with GCC and I found something interesting.

If you compile the following:

int arr[999999];

int main(void)
{
return 0;
}

Then you get a small executable. Similarly you get a small executable
for the following:

int arr[999999] = { 0,0,0,0,0,0,0 };

However, if you set even one of the elements to something other than
zero, then you end up with a big executable:

int arr[999999] = { 1 };

So of course you'd be better off to do:

int arr[999999];
arr[0] = 1;

Right. At least on Unix, the executable format includes (among others)

There was more than one format used by some Unix systems last I looked...
two different "sections" for data, one called .data and the other called
.bss. The initial contents of .data are loaded from the executable,
while .bss is initialized to zero by the operating system. So the
executable needs to contain the whole contents of .data, but only a
size for .bss.

Certainly true a lot of the time.

You can certainly imagine an optimization like the one you mention,
where an object that's mostly zero could go in .bss and have the nonzero
parts filled in automatically as the program starts. This turns out to
be tricky to implement.

No, it's not hard. I've used implementations that do it.
You might imagine that the initialization code
could be invisibly inserted at the beginning of `main', but this is hard
to do if the object is defined in a different source file from `main'
which is compiled separately and linked later. One option might be for

<snip>

Ah well, if you want to do it the hard way...

The easy way is you have another section which contains an initialiser
list for all not 0 bit-pattern initialisers. The startup code then just
scans runs through the list doing the initialisation. There is no reason
this could not be done by the program loader for a hosted implementation
in the same way that some loaders do (did) code patch-ups to relocate
absolute references which would deal with your other objections to this
solution. The implementation I used which did this was for an embedded
target, and I read the part of the startup code that did this and it was
simple.
 
D

David Thompson

Nate Eldredge wrote, On 21/12/08 18:41:

There was more than one format used by some Unix systems last I looked...


Certainly true a lot of the time.



No, it's not hard. I've used implementations that do it.
The easy way is you have another section which contains an initialiser
list for all not 0 bit-pattern initialisers. The startup code then just
scans runs through the list doing the initialisation. There is no reason
this could not be done by the program loader for a hosted implementation
in the same way that some loaders do (did) code patch-ups to relocate
absolute references which would deal with your other objections to this
solution. The implementation I used which did this was for an embedded
target, and I read the part of the startup code that did this and it was
simple.

Another approach, on one (non-Unix VM) system I've used, is:
- store code page-aligned and use as swap (always)
- for each page of data, if it contains any nonzero, store it
(aligned) and use that as initial swap, determined by a bitmap;
otherwise (allzero) don't store it and use allzero as initial 'swap'.
Uninit='bss' isn't a separate section, but merely the last N data
pages which are all unstored and bitmap=0. And readonly (static
duration) data is in the code area, not the data area.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,602
Members
45,184
Latest member
ZNOChrista

Latest Threads

Top