Static-duration array objects beefing up the executable file size

Tomás Ó hÉilidhe · Dec 21, 2008

I'm currently writing a program that contains a large static-duration
object, about 240 kilobytes in size, that I use for storing
information gathered at runtime.

I've been playing around with GCC and I found something interesting.

If you compile the following:

int arr[999999];

int main(void)
{
return 0;
}

Then you get a small executable. Similarly you get a small executable
for the following:

int arr[999999] = { 0,0,0,0,0,0,0 };

However, if you set even one of the elements to something other than
zero, then you end up with a big executable:

int arr[999999] = { 1 };

So of course you'd be better off to do:

int arr[999999];
arr[0] = 1;

vippstar · Dec 21, 2008

I'm currently writing a program that contains a large static-duration
object, about 240 kilobytes in size,

Your program is allowed to be rejected by all implementations. The
maximum object size an implementation must accept once in a program is
65535 in C99 and 32767 in C90.

that I use for storing information gathered at runtime.

I've been playing around with GCC and I found something interesting.

If you compile the following:

int arr[999999];

int main(void)
{
return 0;

}

Then you get a small executable. Similarly you get a small executable
for the following:

int arr[999999] = { 0,0,0,0,0,0,0 };

However, if you set even one of the elements to something other than
zero, then you end up with a big executable:

int arr[999999] = { 1 };

So of course you'd be better off to do:

int arr[999999];
arr[0] = 1;

What's your query? If there's no query, what made you think
comp.lang.c could make use of an informative post about gcc?

Richard · Dec 21, 2008

I'm currently writing a program that contains a large static-duration
object, about 240 kilobytes in size,

Click to expand...

Your program is allowed to be rejected by all implementations. The
maximum object size an implementation must accept once in a program is
65535 in C99 and 32767 in C90.

that I use for storing information gathered at runtime.

I've been playing around with GCC and I found something interesting.

If you compile the following:

int arr[999999];

int main(void)
{
return 0;

}

Then you get a small executable. Similarly you get a small executable
for the following:

int arr[999999] = { 0,0,0,0,0,0,0 };

However, if you set even one of the elements to something other than
zero, then you end up with a big executable:

int arr[999999] = { 1 };

So of course you'd be better off to do:

int arr[999999];
arr[0] = 1;

Click to expand...

What's your query? If there's no query, what made you think
comp.lang.c could make use of an informative post about gcc?

Because its about real world C. And in the real world people use real
compilers. And those real compilers include gcc. And of those real
programmers using real compilers in the real world they come to real
news groups to learn about C. And this newsgroup is called comp.lang.c.

I will leave you to join the dots .....

Phil Carmody · Dec 21, 2008

Richard said:
I'm currently writing a program that contains a large static-duration
object, about 240 kilobytes in size,

Click to expand...

Your program is allowed to be rejected by all implementations. The
maximum object size an implementation must accept once in a program is
65535 in C99 and 32767 in C90.

that I use for storing information gathered at runtime.

I've been playing around with GCC and I found something interesting.

If you compile the following:

int arr[999999];

int main(void)
{
return 0;

}

Then you get a small executable. Similarly you get a small executable
for the following:

int arr[999999] = { 0,0,0,0,0,0,0 };

However, if you set even one of the elements to something other than
zero, then you end up with a big executable:

int arr[999999] = { 1 };

So of course you'd be better off to do:

int arr[999999];
arr[0] = 1;

Click to expand...

What's your query? If there's no query, what made you think
comp.lang.c could make use of an informative post about gcc?

Click to expand...

Because its about real world C. And in the real world people use real
compilers. And those real compilers include gcc. And of those real
programmers using real compilers in the real world they come to real
news groups to learn about C.

But information specifically about gcc doesn't tell them anything
about C. Real world programmers interface with windowing systems,
but here's not a place to discuss windowing systems. Real world
programmers interface with make, but here's not a place to discuss
make. Real world programmers interface with version control systems,
but here's not a place to discuss version control systems. Etc.

And this newsgroup is called comp.lang.c.

Yes, not comp.lang.c.gcc .

I will leave you to join the dots .....

You appear to be quacking like a troll, so I'll add 2 'i's to that:

You will leave us to join the idiots:

*plonk*

Phil

Richard · Dec 21, 2008

Phil Carmody said:
Richard said:

I'm currently writing a program that contains a large static-duration
object, about 240 kilobytes in size,

Your program is allowed to be rejected by all implementations. The
maximum object size an implementation must accept once in a program is
65535 in C99 and 32767 in C90.

that I use for storing information gathered at runtime.

I've been playing around with GCC and I found something interesting.

If you compile the following:

int arr[999999];

int main(void)
{
return 0;

}

Then you get a small executable. Similarly you get a small executable
for the following:

int arr[999999] = { 0,0,0,0,0,0,0 };

However, if you set even one of the elements to something other than
zero, then you end up with a big executable:

int arr[999999] = { 1 };

So of course you'd be better off to do:

int arr[999999];
arr[0] = 1;

What's your query? If there's no query, what made you think
comp.lang.c could make use of an informative post about gcc?

Click to expand...

Because its about real world C. And in the real world people use real
compilers. And those real compilers include gcc. And of those real
programmers using real compilers in the real world they come to real
news groups to learn about C.

Click to expand...

But information specifically about gcc doesn't tell them anything
about C. Real world programmers interface with windowing systems,
but here's not a place to discuss windowing systems. Real world

Get a life. It was nothing more than a factual anecdote that C
programmers might be interested in. I was.

Keith Thompson · Dec 21, 2008

Anthony Fremont said:
Tomás Ó hÉilidhe wrote:

However, if you set even one of the elements to something other than
zero, then you end up with a big executable:

int arr[999999] = { 1 };

So of course you'd be better off to do:

int arr[999999];
arr[0] = 1;

Click to expand...

But that doesn't initialize the rest of the array elements with 0.

Static objects are implicitly initialized to zero.

Keith Thompson · Dec 21, 2008

Anthony Fremont said:
Keith said:

Anthony Fremont said:

Tomás Ó hÉilidhe wrote:
<snip>
However, if you set even one of the elements to something other than
zero, then you end up with a big executable:

int arr[999999] = { 1 };

So of course you'd be better off to do:

int arr[999999];
arr[0] = 1;

But that doesn't initialize the rest of the array elements with 0.

Click to expand...

Static objects are implicitly initialized to zero.

Click to expand...

That's true, I was completely overlooking the fact that the OP specified
static. That seems odd that the compiler would generate the huge
initializer then.

It doesn't seem odd to me. The object is initialized to a non-zero
value; the obvious way to implement that is to generate a copy if the
initial value in the generated code.

The compiler *could* optimize cases where the initial value is mostly
zero; apparently it doesn't. It's not clear that such initializations
are common enough to be worth the effort of detecting them and
treating them specially.

Tomás Ó hÉilidhe · Dec 21, 2008

The compiler *could* optimize cases where the initial value is mostly
zero; apparently it doesn't. It's not clear that such initializations
are common enough to be worth the effort of detecting them and
treating them specially.

And if you work for Microsoft you'll be scorned for working on
compiler features when you /should/ be getting the bounce just right
for when the user maximises a window.

Nate Eldredge · Dec 21, 2008

Tomás Ó hÉilidhe said:
I'm currently writing a program that contains a large static-duration
object, about 240 kilobytes in size, that I use for storing
information gathered at runtime.

I've been playing around with GCC and I found something interesting.

If you compile the following:

int arr[999999];

int main(void)
{
return 0;
}

Then you get a small executable. Similarly you get a small executable
for the following:

int arr[999999] = { 0,0,0,0,0,0,0 };

However, if you set even one of the elements to something other than
zero, then you end up with a big executable:

int arr[999999] = { 1 };

So of course you'd be better off to do:

int arr[999999];
arr[0] = 1;

Right. At least on Unix, the executable format includes (among others)
two different "sections" for data, one called .data and the other called
..bss. The initial contents of .data are loaded from the executable,
while .bss is initialized to zero by the operating system. So the
executable needs to contain the whole contents of .data, but only a
size for .bss.

Naturally the compiler will arrange that any object that's initialized
to all zeros goes in the .bss section, including objects of static
duration which get the default zero initialization. (This works best
when NULL pointers and floating-point zeros are also all zero, which is
typically true on Unix systems.) Any object that isn't completely
initialized to zero, like in your example, has to go in .data, which
means the whole thing gets stored in the executable, even if most of
that happens to be zeros.

You can certainly imagine an optimization like the one you mention,
where an object that's mostly zero could go in .bss and have the nonzero
parts filled in automatically as the program starts. This turns out to
be tricky to implement. You might imagine that the initialization code
could be invisibly inserted at the beginning of `main', but this is hard
to do if the object is defined in a different source file from `main'
which is compiled separately and linked later. One option might be for
the compiler to take advantage of the "constructor" features that the
linker usually supplies to support C++, where static-duration objects
may have constructors that need to run before `main'. But there may be
code that runs earlier still that needs the values of initialized
static-duration objects to be correct: the library's startup code, or
the programmer taking advantage of the same "constructor" feature, or
doing some other system-specific magic.

So in short, you might be able to make this optimization work if you
only needed to support ISO C programs. But real compilers generally
prefer to allow programmers to take advantage of non-portable features
of the system, and that makes it much harder to get this optimization
right. So therefore it's normally not attempted. If you want it
done, you need to do it yourself, and that leaves it up to you to make
sure the initialization is done at a time that works for you.

CBFalconer · Dec 21, 2008

.... snip ...

However, if you set even one of the elements to something other
than zero, then you end up with a big executable:

int arr[999999] = { 1 };

So of course you'd be better off to do:

int arr[999999];
arr[0] = 1;

Click to expand...

What's your query? If there's no query, what made you think
comp.lang.c could make use of an informative post about gcc?

Tomas could have found that out by examining the manuals for gcc.
However, that is unimportant. What does seem important is why you
think that informative posts are not suitable for c.l.c.

Richard · Dec 21, 2008

CBFalconer said:
... snip ...

However, if you set even one of the elements to something other
than zero, then you end up with a big executable:

int arr[999999] = { 1 };

So of course you'd be better off to do:

int arr[999999];
arr[0] = 1;

Click to expand...

What's your query? If there's no query, what made you think
comp.lang.c could make use of an informative post about gcc?

Click to expand...

Tomas could have found that out by examining the manuals for gcc.
However, that is unimportant. What does seem important is why you
think that informative posts are not suitable for c.l.c.

Because it would appear that in my absence vippstar has completed his
metamorphosis into a group reg. This involves picking at any and all
posts and generally bullying anyone that does not meet his small minded
view of what this group is for. I suspect he has been saying "indeed" a
lot too but can't be arsed to Google.

The post by Thomas was C related and perfectly on topic for C
programmers to be aware of.

vippstar · Dec 21, 2008

[email protected] said:
[email protected] said:

Tomás Ó hÉilidhe <[email protected]> wrote:

Click to expand...

... snip ...

However, if you set even one of the elements to something other
than zero, then you end up with a big executable:
int arr[999999] = { 1 };
So of course you'd be better off to do:
int arr[999999];
arr[0] = 1;

Click to expand...

Click to expand...

What's your query? If there's no query, what made you think
comp.lang.c could make use of an informative post about gcc?

Click to expand...

Tomas could have found that out by examining the manuals for gcc.
However, that is unimportant. What does seem important is why you
think that informative posts are not suitable for c.l.c.

These were not my words. I said "informative post about gcc". I did
not say "informative post". An informative post on x86 is informative,
but not topical. An informative post on C is informative & topical. I
only implied that informative posts about gcc are NOT topical.

You seem to have developed the habit of replying before fully reading
the post. If you don't change your ways I'll ignore any similar
messages from you in the future.

Richard · Dec 21, 2008

[email protected] said:
[email protected] said:

TomÃ¡s Ã“ hÃ‰ilidhe <[email protected]> wrote:

Click to expand...

... snip ...

However, if you set even one of the elements to something other
than zero, then you end up with a big executable:

Click to expand...

int arr[999999] = { 1 };

Click to expand...

So of course you'd be better off to do:

Click to expand...

int arr[999999];
arr[0] = 1;

Click to expand...

What's your query? If there's no query, what made you think
comp.lang.c could make use of an informative post about gcc?

Click to expand...

Tomas could have found that out by examining the manuals for gcc.
However, that is unimportant. What does seem important is why you
think that informative posts are not suitable for c.l.c.

Click to expand...

These were not my words. I said "informative post about gcc". I did
not say "informative post". An informative post on x86 is informative,
but not topical. An informative post on C is informative & topical. I
only implied that informative posts about gcc are NOT topical.

gcc is C related. You do know what gcc is I suppose? And this issue
might WELL relate to how people choose Standard C constructs to build a
program. So, basically, you fail the interview.

You seem to have developed the habit of replying before fully reading
the post. If you don't change your ways I'll ignore any similar
messages from you in the future.

ROTFLM

Take the stick out you pompous little man.

Flash Gordon · Dec 22, 2008

Nate Eldredge wrote, On 21/12/08 18:41:

Tomás Ó hÉilidhe said:
Tomás Ó hÉilidhe said:

I'm currently writing a program that contains a large static-duration
object, about 240 kilobytes in size, that I use for storing
information gathered at runtime.

I've been playing around with GCC and I found something interesting.

If you compile the following:

int arr[999999];

int main(void)
{
return 0;
}

Then you get a small executable. Similarly you get a small executable
for the following:

int arr[999999] = { 0,0,0,0,0,0,0 };

However, if you set even one of the elements to something other than
zero, then you end up with a big executable:

int arr[999999] = { 1 };

So of course you'd be better off to do:

int arr[999999];
arr[0] = 1;

Click to expand...

Right. At least on Unix, the executable format includes (among others)

There was more than one format used by some Unix systems last I looked...

two different "sections" for data, one called .data and the other called
.bss. The initial contents of .data are loaded from the executable,
while .bss is initialized to zero by the operating system. So the
executable needs to contain the whole contents of .data, but only a
size for .bss.

Certainly true a lot of the time.

You can certainly imagine an optimization like the one you mention,
where an object that's mostly zero could go in .bss and have the nonzero
parts filled in automatically as the program starts. This turns out to
be tricky to implement.

No, it's not hard. I've used implementations that do it.

You might imagine that the initialization code
could be invisibly inserted at the beginning of `main', but this is hard
to do if the object is defined in a different source file from `main'
which is compiled separately and linked later. One option might be for

<snip>

Ah well, if you want to do it the hard way...

The easy way is you have another section which contains an initialiser
list for all not 0 bit-pattern initialisers. The startup code then just
scans runs through the list doing the initialisation. There is no reason
this could not be done by the program loader for a hosted implementation
in the same way that some loaders do (did) code patch-ups to relocate
absolute references which would deal with your other objections to this
solution. The implementation I used which did this was for an embedded
target, and I read the part of the startup code that did this and it was
simple.

David Thompson · Jan 5, 2009

Nate Eldredge wrote, On 21/12/08 18:41:

There was more than one format used by some Unix systems last I looked...

Certainly true a lot of the time.

No, it's not hard. I've used implementations that do it.

The easy way is you have another section which contains an initialiser
list for all not 0 bit-pattern initialisers. The startup code then just
scans runs through the list doing the initialisation. There is no reason
this could not be done by the program loader for a hosted implementation
in the same way that some loaders do (did) code patch-ups to relocate
absolute references which would deal with your other objections to this
solution. The implementation I used which did this was for an embedded
target, and I read the part of the startup code that did this and it was
simple.

Another approach, on one (non-Unix VM) system I've used, is:
- store code page-aligned and use as swap (always)
- for each page of data, if it contains any nonzero, store it
(aligned) and use that as initial swap, determined by a bitmap;
otherwise (allzero) don't store it and use allzero as initial 'swap'.
Uninit='bss' isn't a separate section, but merely the last N data
pages which are all unstored and bitmap=0. And readonly (static
duration) data is in the code area, not the data area.

Size of a compound literal array	1	Sep 12, 2013
Static array with #defined # of "extra" members	4	Aug 4, 2012
Assigning an array to another array using C's assignment operator	0	Feb 1, 2013
Assigning an array to another array using C's assignment operator	13	Jan 31, 2013
Assigning an array to another array using C's assignment operator	0	Feb 1, 2013
Assigning an array to another array using C's assignment operator	1	Feb 1, 2013
Size of an executable for locally created arrays	3	Jul 28, 2006
Declaration vs definition of array	18	Mar 27, 2013

Static-duration array objects beefing up the executable file size

Tomás Ó hÉilidhe

vippstar

Richard

Phil Carmody

Richard

Keith Thompson

Keith Thompson

Tomás Ó hÉilidhe

Nate Eldredge

CBFalconer

Richard

vippstar

Richard

Flash Gordon

David Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads