Uninitialised fields in structures

U

Ulrich Eckhardt

Greetings!

I was recently surprised by the compiler's warning concerning this code:

struct text {
char* s;
size_t len;
};
int main() {
struct text t = {"hello world!"};
}

The compiler actually claimed that t.len was uninitialised. Okay, I don't
explicitly initialise it, but I was under the impression that it should be
initialised to zero then (i.e. all fields after the last one are
initialised with zero). Okay, it's just a warning, so I tended to ignore
it. Now, when I ran the code through Valgrind, it also complained that an
uninitialised value was used, which got me thinking. Lastly, I used gdb to
step through the code and explicitly shredded the value of t.len before
that line and - lo and behold - it was correctly (IMHO) reset to zero!

Now, I'm pretty sure about the rule with the additional fields, but I'm
wondering nonetheless. Can someone confirm or deny whether t.len above is
initialised or not?

thank you

Uli
 
V

vippstar

Greetings!

I was recently surprised by the compiler's warning concerning this code:

struct text {
char* s;
size_t len;
};
int main() {
struct text t = {"hello world!"};
}

The compiler actually claimed that t.len was uninitialised. Okay, I don't
explicitly initialise it, but I was under the impression that it should be
initialised to zero then (i.e. all fields after the last one are
initialised with zero). Okay, it's just a warning, so I tended to ignore
That is true only for an array.
it. Now, when I ran the code through Valgrind, it also complained that an
uninitialised value was used, which got me thinking. Lastly, I used gdb to
step through the code and explicitly shredded the value of t.len before
that line and - lo and behold - it was correctly (IMHO) reset to zero!

Now, I'm pretty sure about the rule with the additional fields, but I'm
wondering nonetheless. Can someone confirm or deny whether t.len above is
initialised or not?
The standard doesn't say anything about initializing the rest members
to 0.
Using gdb to determine whether a program is correct or not is bad
practise, as it is to run the program and come to a conclusion by the
output.
Gdb is not a 'C99 tool'.
 
S

Spiros Bousbouras

Greetings!

I was recently surprised by the compiler's warning concerning this code:

  struct text {
    char* s;
    size_t len;
  };
  int main() {
    struct text t = {"hello world!"};
  }

The compiler actually claimed that t.len was uninitialised. Okay, I don't
explicitly initialise it, but I was under the impression that it should be
initialised to zero then (i.e. all fields after the last one are
initialised with zero).

Indeed. Paragraph 21 of 6.7.8 of n1256 states:

If there are fewer initializers in a
brace-enclosed list than there are elements
or members of an aggregate, or fewer
characters in a string literal used to
initialize an array of known size than
there are elements in the array, the remainder
of the aggregate shall be initialized implicitly
the same as objects that have static storage
duration.

and paragraph 10 of the same clause states that
arithmetic types of static storage are initialized to 0.
Okay, it's just a warning, so I tended to ignore
it. Now, when I ran the code through Valgrind, it also complained that an
uninitialised value was used, which got me thinking.

Regarding the compiler perhaps it just warns you about
the lack of explicit initialization. Regarding valgrind I
don't know how it deals with such issues.
Now, I'm pretty sure about the rule with the additional fields, but I'm
wondering nonetheless. Can someone confirm or deny whether t.len above is
initialised or not?

According to the standard it should. Perhaps the person
who wrote your compiler was ignorant about that part of
the standard ? ;-)
 
B

Ben Bacarisse

The standard doesn't say anything about initializing the rest members
to 0.

I think it does. 6.7.8 para. 21:

If there are fewer initializers in a brace-enclosed list than there
are elements or members of an aggregate, or fewer characters in a
string literal used to initialize an array of known size than there
are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage
duration.

Further up (para. 10) explains that objects of static duration are
initialised to arithmetic zero or NULL by default.
 
U

Ulrich Eckhardt

[Not explicitly initialised fields of a struct that is partially
initialised are implicitly initialised to zero. ]
Now, I'm pretty sure about the rule with the additional fields, but I'm
wondering nonetheless. Can someone confirm or deny whether t.len above is
initialised or not?
The standard doesn't say anything about initializing the rest members
to 0.

Well, that's where a few others and I disagree with you. Also, the compiler
only warns but still does the initialisation.
Using gdb to determine whether a program is correct or not is bad
practise, as it is to run the program and come to a conclusion by the
output. Gdb is not a 'C99 tool'.

Well, surely it is not a way to prove that the program is correct C.
However, I have four different opinions on whether the program is buggy:

1. Compiler
The compiler said something was uninitialised, which often causes errant
runtime behaviour, in my case it would have lead to calling free() with a
pointer to a string literal. Of course that is not a proof, since
uninitialised storage can contain anything, including a very deterministic
value.

2. Running
Running the program didn't cause any runtime errors, as free()ing a string
literal usually does. Same uncertainty as above though, but glibc otherwise
correctly detects this error.

3. Valgrind
Typically valgrind doesn't care too much about C but operates directly on
the generated machine code. The fact that it detects use of uninitialised
memory was what first prompted me to wonder whether there was something
behind the warnings emitted by the compiler.

4. GDB (a debugger)
I used the debugger to manually shred the memory of the variable that was
claimed to be not initialised. I observed that code was executed to lateron
initialise the variable to zero.

I'm aware that none of these tests are in any way mandated by any C
standard, but that's real life. ;)


Just for the interest of those that are reading this: the actual problem was
with my use of Valgrind. If I had paid a bit more attention to its output,
I would have seen that the errors are in fact detected in /lib/ld-2.3.6.so
and not in my executable. Actually using an uninitialised struct in my code
also causes it to be detected and reported as occurring in my code, so the
compiler output can be taken as "just a [stupid] warning".

Uli
 
U

user923005

[Not explicitly initialised fields of a struct that is partially
initialised are implicitly initialised to zero. ]
Now, I'm pretty sure about the rule with the additional fields, but I'm
wondering nonetheless. Can someone confirm or deny whether t.len above is
initialised or not?
The standard doesn't say anything about initializing the rest members
to 0.

Well, that's where a few others and I disagree with you. Also, the compiler
only warns but still does the initialisation.
Using gdb to determine whether a program is correct or not is bad
practise, as it is to run the program and come to a conclusion by the
output. Gdb is not a 'C99 tool'.

Well, surely it is not a way to prove that the program is correct C.
However, I have four different opinions on whether the program is buggy:

1. Compiler
The compiler said something was uninitialised, which often causes errant
runtime behaviour, in my case it would have lead to calling free() with a
pointer to a string literal. Of course that is not a proof, since
uninitialised storage can contain anything, including a very deterministic
value.

2. Running
Running the program didn't cause any runtime errors, as free()ing a string
literal usually does. Same uncertainty as above though, but glibc otherwise
correctly detects this error.

3. Valgrind
Typically valgrind doesn't care too much about C but operates directly on
the generated machine code. The fact that it detects use of uninitialised
memory was what first prompted me to wonder whether there was something
behind the warnings emitted by the compiler.

4. GDB (a debugger)
I used the debugger to manually shred the memory of the variable that was
claimed to be not initialised. I observed that code was executed to lateron
initialise the variable to zero.

I'm aware that none of these tests are in any way mandated by any C
standard, but that's real life. ;)

Just for the interest of those that are reading this: the actual problem was
with my use of Valgrind. If I had paid a bit more attention to its output,
I would have seen that the errors are in fact detected in /lib/ld-2.3.6.so
and not in my executable. Actually using an uninitialised struct in my code
also causes it to be detected and reported as occurring in my code, so the
compiler output can be taken as "just a [stupid] warning".

You do not use C software tools to determine if a program is correct
or not, you use the C standard. If (for instance) one of your tools
has a bug, then what will you conclude?
 
C

cr88192

[Not explicitly initialised fields of a struct that is partially
initialised are implicitly initialised to zero. ]
Now, I'm pretty sure about the rule with the additional fields, but I'm
wondering nonetheless. Can someone confirm or deny whether t.len above
is
initialised or not?
The standard doesn't say anything about initializing the rest members
to 0.

I'm aware that none of these tests are in any way mandated by any C
standard, but that's real life. ;)

Just for the interest of those that are reading this: the actual problem
was
with my use of Valgrind. If I had paid a bit more attention to its output,
I would have seen that the errors are in fact detected in /lib/ld-2.3.6.so
and not in my executable. Actually using an uninitialised struct in my
code
also causes it to be detected and reported as occurring in my code, so the
compiler output can be taken as "just a [stupid] warning".

<
You do not use C software tools to determine if a program is correct
or not, you use the C standard. If (for instance) one of your tools
has a bug, then what will you conclude?
best practice is probably to find the least common denominator (of standard
and tools).

if the standard says something should work, but the tools are broken, then
maybe it makes sense to bend to the tools (not write that code, since the
tools are broke, and it is assumed here that the user is not one of the
tools' developers).

now, if the tools allow something, but the standard does not, then one goes
with the standard.
 
U

Ulrich Eckhardt

user923005 said:
You do not use C software tools to determine if a program is correct
or not, you use the C standard. If (for instance) one of your tools
has a bug, then what will you conclude?

Yes, in theory you could do so, but in practice the correct function of a
program is determined by whether it does the right thing and not by some
standard, not even if it is written in a certain language. Sorry, but your
point of view is simply unrealistic. If you had at least said "correct
according to the C standard" I would have agreed, but like that your
statement is just rubbish to me. Try to sell a program to a customer saying
that it's correct C while executing it behaves erratically.

Uli
 
U

user923005

Yes, in theory you could do so, but in practice the correct function of a
program is determined by whether it does the right thing and not by some
standard, not even if it is written in a certain language. Sorry, but your
point of view is simply unrealistic. If you had at least said "correct
according to the C standard" I would have agreed, but like that your
statement is just rubbish to me. Try to sell a program to a customer saying
that it's correct C while executing it behaves erratically.

If it is correct C and executes erratically, then your tools are
broken. Switch tools.

If it is incorrect C and yet it appears to execute correctly, then it
is (in fact) broken. When the compiler is repaired and maintenance
work is done on the code, then it may no longer function. Or the
undefined behavior may surface at the most inopportune time, such as
when an Arianne missile takes off.

Why do you imagine that ISO bothers to write standards for languages?
For exactly the same reason that they write standards for bolts and
for oil and for just about everything else under the sun that might be
used to create products. If there is an accurate design specification
that describes exactly how something *must* behave then we can code to
that standard. It is this engineering approach that leads to accurate
and reliable systems. It is your "seat of the pants" approach that
leads to cowboy coding and people getting fried by x-ray machines.
 
U

Ulrich Eckhardt

user923005 said:
If it is correct C and executes erratically, then your tools are
broken. Switch tools.

This is ridiculous. Have you considered that there are only so many
resources for your goals and that switching tools might just not be an
option?
If it is incorrect C and yet it appears to execute correctly, then it
is (in fact) broken.

This obviously depends on the definition of broken, whether it simply
means "incorrect C" or "behaves faulty". In one case, you say "if it is
incorrect C, then it is incorrect C", duh. In the other case you say "if it
behaves correctly and is incorrect C, it behaves faulty". One statement is
obviously correct, the other obviously rubbish. Am I perhaps missing a
third definition of "broken"?
When the compiler is repaired and maintenance work is done on the
code, then it may no longer function. Or the undefined behavior
may surface at the most inopportune time, such as when an Arianne
missile takes off.

So? Bugs happen. Testing helps against shipping broken code. What is it that
you do with the standard that helps you against shipping buggy programs?
Why do you imagine that ISO bothers to write standards for languages?
For exactly the same reason that they write standards for bolts and
for oil and for just about everything else under the sun that might be
used to create products. If there is an accurate design specification
that describes exactly how something *must* behave then we can code to
that standard. It is this engineering approach that leads to accurate
and reliable systems.

Dude, you don't get it: if the observable behaviour of a program is correct,
it eventually doesn't matter to e.g. a customer if this behaviour relies on
undefined behaviour or not. In fact pretty many programs rely at least on
implementation-defined behaviour. OTOH, if the behaviour is incorrect, it
similarly doesn't matter if it complies to any standard or not, you will
not be able to sell it. This is reality, and I don't understand why you
keep arguing that.
It is your "seat of the pants" approach that
leads to cowboy coding and people getting fried by x-ray machines.

Whining about standard compliance doesn't help if you get fried. Your
real-world program must behave, even if it is compiled by real-world,
non-perfect tools. Using real-world, non-perfect tools to find errors is
the only choice we (i.e. the people living in the real world) have, the C
standard is not a tool that can be efficiently used to find errors, you can
only use it to distinguish between a bug in your program and in the
implementation, once you found it.

Uli
 
U

user923005

This is ridiculous. Have you considered that there are only so many
resources for your goals and that switching tools might just not be an
option?

Do you really think that using broken tools is better than switching?
This obviously depends on the definition of broken, whether it simply
means "incorrect C" or "behaves faulty". In one case, you say "if it is
incorrect C, then it is incorrect C", duh. In the other case you say "if it
behaves correctly and is incorrect C, it behaves faulty". One statement is
obviously correct, the other obviously rubbish. Am I perhaps missing a
third definition of "broken"?

If my automobile is supposed to get water put into the cap marked
"radiator" but you really have to put it into the one marked "oil"
then something is amiss.
If the C standard says that a function should behave in a certain way,
but it does not, then that function is broken and you should send a
defect report to the compiler vendor.
So? Bugs happen. Testing helps against shipping broken code.

If you are using broken tools, tesing can be the cause of broken code.
What is it that
you do with the standard that helps you against shipping buggy programs?

That is the fundamental intention of the standard. A conforming C
compiler *has* to behave exactly as the standards document says. If
it does not behave in that way, then the compiler vendor must fix it
in a timely manner or they are negligent.
Dude, you don't get it: if the observable behaviour of a program is correct,
it eventually doesn't matter to e.g. a customer if this behaviour relies on
undefined behaviour or not.

Are you able to test every possible input? If a function has 8 bytes
of input, that is 2^64 different inputs. Are you willing to trust a
compiler vendor who is not able to follow an extremely well designed
specification?
In fact pretty many programs rely at least on
implementation-defined behaviour.

There is nothing wrong with that. Possibly 90% of the programs I
write rely on implementation defined behavior (or at least behavior
defined by another standard such as POSIX) at some point.
OTOH, if the behaviour is incorrect, it
similarly doesn't matter if it complies to any standard or not, you will
not be able to sell it.

Is it better to break your code so that it works according to a broken
compiler or to use a compiler that works?
This is reality, and I don't understand why you
keep arguing that.

I think that your attitude is something scary and is a reason why
there are so many serious problems in the software industry. If I
were a bridge builder and failed to follow standards I would go to
jail.
Whining about standard compliance doesn't help if you get fried. Your
real-world program must behave, even if it is compiled by real-world,
non-perfect tools. Using real-world, non-perfect tools to find errors is
the only choice we (i.e. the people living in the real world) have, the C
standard is not a tool that can be efficiently used to find errors, you can
only use it to distinguish between a bug in your program and in the
implementation, once you found it.

The C standard does not diagnose errors. It teaches you how to code
correctly. However, it is a terrible mistake to imagine that running
a mass of tools against a software base and using some sort of vote
system as to what is right is going to determine correctness.

I use real world tools to help find defects also. But those tools are
not what determines what is correct. They only determine what is
observed. To imagine that this is how to build software is an
indication that software engineering has a long way to go in some
places before it can produce reliable software. If the blueprint says
I should use a 12mm bolt, I should not use a 10mm bolt. And if the
blueprint says that the bolt should have 3 diamonds on its head, then
it should have 3 diamonds on its head. If the C standard says that
something is correct, then that is what is correct. If someone's tool
does something different than what the C standard says it should, that
tells you your tool is broken. Do you know what happens when you
build something out of broken components? You get a broken product.
If you do not know what the C standard says some part of a C program
should do, then it is a hole in your education that should be filled.
It is simply part of being a responsible programmer.

IMO-YMMV.
 
R

Richard Bos

Ulrich Eckhardt said:
This obviously depends on the definition of broken, whether it simply
means "incorrect C" or "behaves faulty". In one case, you say "if it is
incorrect C, then it is incorrect C", duh. In the other case you say "if it
behaves correctly and is incorrect C, it behaves faulty". One statement is
obviously correct, the other obviously rubbish. Am I perhaps missing a
third definition of "broken"?

"Cannot be relied on to behave correctly." It often doesn't matter much
if you can demonstrate that your program works correctly on your
development machine, for your test data - if you can't guarantee that it
will work on the server, with the complete database which, e.g.,
includes personal names longer than 64 characters, your program is
broken for practical purposes.

If you want to be taken for an adult, don't call people "dude".
you don't get it: if the observable behaviour of a program is correct,
it eventually doesn't matter to e.g. a customer if this behaviour relies on
undefined behaviour or not.

However, I assure you that if it relies on undefined behaviour which
only "works" by accident[1], sooner or later the customer _will_ observe
behaviour which is not correct.

Richard

[1] As opposed to by design, as in, e.g., calling a POSIX function
 
J

James Kuyper

Ulrich said:
user923005 wrote: ....

This is ridiculous. Have you considered that there are only so many
resources for your goals and that switching tools might just not be an
option?

If you absolutely must use a compiler with a known defect, and have to
write code which in general has a different effect on a correctly
working implementation of C, then you should isolate that code as much
as possible, and clearly mark it as implementation-dependent. That way,
when and if it gets ported to a better implementation, it will be
easier to fix the code for the better implementation. It would be even
better if you could set up the code so that the work-around is
conditionally compiled only when using that particular compiler, and
that correct code is used on all other compilers.
 
S

Serve Lau

Whining about standard compliance doesn't help if you get fried. Your
real-world program must behave, even if it is compiled by real-world,
non-perfect tools. Using real-world, non-perfect tools to find errors is
the only choice we (i.e. the people living in the real world) have, the C
standard is not a tool that can be efficiently used to find errors, you
can
only use it to distinguish between a bug in your program and in the
implementation, once you found it.

I agree. I noticed over the years that when software engineers talk about
improving C code its always about the UB aspects of C. "use lint or some
better tool like that" they say. But in my experience the majority of bugs
doesnt come from somebody accidently or not using UB behaviour. Its almost
always when something unexpected happens in the field. We tend to put so
much energy in lint or trying to learn write better standard compliant code
but bugs arent reduced by it.

I remember in 11 years coding in C and C++ only ONE example where somebody
forgot to allocate space for '\0' in a string. I remember ZERO times where I
put '=' instead of '==', only one time I remember that I forgot 'break' in a
switch and I corrected that mistake after 1 time testing.
I dont dare think about how much time I spent on dynamic memory related
issues though :p
 
U

user923005

I agree. I noticed over the years that when software engineers talk about
improving C code its always about the UB aspects of C. "use lint or some
better tool like that" they say. But in my experience the majority of bugs
doesnt come from somebody accidently or not using UB behaviour. Its almost
always when something unexpected happens in the field. We tend to put so
much energy in lint or trying to learn write better standard compliant code
but bugs arent reduced by it.

I remember in 11 years coding in C and C++ only ONE example where somebody
forgot to allocate space for '\0' in a string. I remember ZERO times where I
put '=' instead of '==', only one time I remember that I forgot 'break' in a
switch and I corrected that mistake after 1 time testing.
I dont dare think about how much time I spent on dynamic memory related
issues though :p

These issues are separate issues.

First, there is determination of correctness:
If correctness is covered by a standard, then correctness is defined
that way.
If correctness is covered by a specification that is not a formal
standard, then we follow the specification.

Second, there is determination of robustness:
We must test our code against a known standard or expected outcome to
see if the outputs are correct.

Here, at CONNX Solutions, we have hundreds of test machines that are
occupied 24x7 testing our product.
They operate around the clock for 7 days before we even know if we
have a candiate for shipping.
By that time many millions of tests will have been performed.

I think it should be obvious what the clear separation is here. But
maybe it is only the way that I see things.
 
F

Flash Gordon

James Kuyper wrote, On 04/01/08 11:57:
If the tools are bad enough that you are tripping over bugs then it
might cost you more resources to work around them than to change tools!
I've had one instance where the bugs were so bad it worked out better to
not only change tools but change language as well! Admittedly that was
change from a Pascal compiler for a DSP to a C compiler for the processor.
If you absolutely must use a compiler with a known defect, and have to
write code which in general has a different effect on a correctly
working implementation of C, then you should isolate that code as much
as possible, and clearly mark it as implementation-dependent. That way,
when and if it gets ported to a better implementation, it will be
easier to fix the code for the better implementation. It would be even
better if you could set up the code so that the work-around is
conditionally compiled only when using that particular compiler, and
that correct code is used on all other compilers.

When I have found compiler bugs that could be worked around I have
always found workarounds that kept the code well defined according to
the standard. This approach means that when the code is ported (or the
compiler updated) you have minimised the chances of nasty surprises.
 
U

user923005

James Kuyper wrote, On 04/01/08 11:57:


If the tools are bad enough that you are tripping over bugs then it
might cost you more resources to work around them than to change tools!
I've had one instance where the bugs were so bad it worked out better to
not only change tools but change language as well! Admittedly that was
change from a Pascal compiler for a DSP to a C compiler for the processor.

I can also imagine a scenario where you are forced to use a single
compiler (e.g. some esoteric nearly one-off embedded chip for a highly
specialized application). I would certainly hate to be painted into
that corner.
When I have found compiler bugs that could be worked around I have
always found workarounds that kept the code well defined according to
the standard. This approach means that when the code is ported (or the
compiler updated) you have minimised the chances of nasty surprises.

I have found the same thing (although I have also actually abandoned
compilers -- at least temporarily -- until an important bug was
fixed). The key thing (in my view) is knowing when the compiler is
wrong. You cannot know if the compiler is right or wrong if you do
not know what the standard says. If something does not work the way
you expect, then there are several possibilities.
1. The logic is wrong.
2. The algorithm is wrong (algorithm logic coded correctly, but the
fundamental algorithm is flawed).
3. The Compiler is broken.
4. Other problems.
All of them are probably fixable, but you have to be able to recognize
what category the problem is in to be able to take the first step.

IMO-YMMV.
 
C

christian.bau

Greetings!

I was recently surprised by the compiler's warning concerning this code:

  struct text {
    char* s;
    size_t len;
  };
  int main() {
    struct text t = {"hello world!"};
  }

The compiler actually claimed that t.len was uninitialised. Okay, I don't
explicitly initialise it, but I was under the impression that it should be
initialised to zero then (i.e. all fields after the last one are
initialised with zero). Okay, it's just a warning, so I tended to ignore
it. Now, when I ran the code through Valgrind, it also complained that an
uninitialised value was used, which got me thinking. Lastly, I used gdb to
step through the code and explicitly shredded the value of t.len before
that line and - lo and behold - it was correctly (IMHO) reset to zero!

Now, I'm pretty sure about the rule with the additional fields, but I'm
wondering nonetheless. Can someone confirm or deny whether t.len above is
initialised or not?

Some compiler tend to give this kind of warning if _you_ didn't
initialise a member. The C language states that with your code t.len
will be initialised to zero, but the compiler doesn't know whether
that is indeed what you wanted, or whether you just forgot to give a
value. So the compiler is asking you politely if that is what you
really wanted.

So it would be better to write

struct text t = {"hello world!", 0};

so that the compiler _knows_ you wanted to set t.len to 0, but both
variants will have the same effect.
 
C

christian.bau

If it is correct C and executes erratically, then your tools are
broken.  Switch tools.

Example:

/* Initialise i to 2 +/
int i = 1;

Perfectly standard conforming C, but completely broken.
 
U

user923005

Example:

/* Initialise i to 2 +/
int i = 1;

Perfectly standard conforming C, but completely broken.

That snippet won't compile. No closing comment.
At any rate, by correct C, I meant code that correctly implements the
algorithm in question without unnecessarily invoking undefined
behavior.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top