Assigning values to char arrays

Ark Khasin · Nov 3, 2007

santosh said:
This is most hilarious sentence I've read in c.l.c. this year.

Ty. But Richard offered a satisfactory explanation.

santosh · Nov 3, 2007

Ark said:
Is this "just a theory"? IMHO, 6.2.6.2 says *exactly nothing* about
unsigned char.

<quote n1256.pdf>

6.2.6.2 Integer types

1 For unsigned integer types other than unsigned char, the bits of the
object representation shall be divided into two groups: value bits and
padding bits (there need not be any of the latter).

<endquote>

Note closely the text within the parenthesis. To me it _strongly_
implies, to say the least, that value bits are mandatory for objects of
all unsigned integer types. Since unsigned char is disallowed from
having padding bits, it must be composed only of value bits.

<quote n1256.pdf>

If there are N value bits, each bit shall represent a different power of
2 between 1 and 2N-1, so that objects of that type shall be capable of
representing values from 0 to 2N -1 using a pure binary representation;
this shall be known as the value representation. The values of any
padding bits are unspecified.44)

6.2.6.1

3 Values stored in unsigned bit-fields and objects of type unsigned char
shall be represented using a pure binary notation.40)

<endquote>

Again 6.2.6.1(3) in conjunction with 6.2.6.2(1) reinforces the
requirement that unsigned char may not have padding bits.

<quote n1256.pdf>

4 Values stored in non-bit-field objects of any other object type
consist of n ´ CHAR_BIT bits, where n is the size of an object of that
type, in bytes. The value may be copied into an object of type unsigned
char [n] (e.g., by memcpy); the resulting setof bytes is
called the object representation of the value. Values stored in
bit-fields consist of m bits, where m is the size specified for the
bit-field. The object representation is the set of m bits the bit-field
comprises in the addressable storage unit holding it. Two values (other
than NaNs) with the same object representation compare equal, but values
that compare equal may have different object representations.

<endquote>

This answers the other issue that you raised concerning null pointers
not being all bits zero.

pete · Nov 3, 2007

Is this "just a theory"?

No.

N869
5.2.4.2.1 Sizes of integer types <limits.h>

[#2] The value UCHAR_MAX+1
shall equal 2 raised to the power CHAR_BIT.

Ark Khasin · Nov 3, 2007

pete said:
Is this "just a theory"?

Click to expand...

No.

N869
5.2.4.2.1 Sizes of integer types <limits.h>

[#2] The value UCHAR_MAX+1
shall equal 2 raised to the power CHAR_BIT.

Thanks to adding to my confusion

So I have an 11-bit machine bytes and UCHAR_MAX==8 and 3 padding most
significant bits. Anything wrong?
BTW, if I am not mistaken, in other integer types padding bits don't
have to be contiguous.

CBFalconer · Nov 3, 2007

Ark said:
.... snip ...

As a practitioner, I didn't think twice to clear all bits with
memset clones including the likes of the code above. But now this
post scared me: if unsigned char has padding bits in its
representation (which I guess is allowed) then what do I get?
unsigned a;
memset_as_above(&a, 0, sizeof(a));
Will a necessarily compare equal to 0?

Can't happen. In C, char is expressly forbidden to have padding
bits.

santosh · Nov 3, 2007

Ark said:
pete said:

Ark said:

santosh wrote:
Ark Khasin wrote:
Ben Bacarisse wrote:
<snip>

No. unsigned char may not have padding bits.

Click to expand...

Is this "just a theory"?

Click to expand...

No.

N869
5.2.4.2.1 Sizes of integer types <limits.h>

[#2] The value UCHAR_MAX+1
shall equal 2 raised to the power CHAR_BIT.

Click to expand...

Thanks to adding to my confusion
So I have an 11-bit machine bytes and UCHAR_MAX==8 and 3 padding most
significant bits. Anything wrong?

What do you mean by UCHAR_MAX==8? Do you mean CHAR_BIT==8?

As far as the Standard is concerned a char i.e., a byte (as defined by
C) contains CHAR_BIT bits. Additionally unsigned char may not contain
padding bits.

I don't know what you mean by "machine bytes" above. Are they supposed
to be different from C bytes?

BTW, if I am not mistaken, in other integer types padding bits don't
have to be contiguous.

Yes. Padding bits need not be contiguous.

Chris Torek · Nov 3, 2007

pete said:
[#2] The value UCHAR_MAX+1
shall equal 2 raised to the power CHAR_BIT.

Click to expand...

Thanks to adding to my confusion
So I have an 11-bit machine bytes and UCHAR_MAX==8 and 3 padding most
significant bits. Anything wrong?

Two.

First, I assume you meant for CHAR_BIT to be 8 here (so that
UCHAR_MAX is 255).

Second, if you have 11-bit machine bytes, CHAR_BIT must be 11
(or, alternatively, the implementation can make CHAR_BIT be 8
and completely hide the existence of the other 3 bits, by
emulating an 8-bit machine; but in this case, you do not have
11-bit machine bytes, you have 8-bit machine bytes in the
emulated machine on which the C system runs).

In other words, "unsigned char" has no padding bits.

BTW, if I am not mistaken, in other integer types padding bits don't
have to be contiguous.

Right. In practice, they tend to be clumped at one end (e.g., a
la Burroughs A-series machine where "integer" just means "floating
point value with carefully controlled exponent"). The most likely
candidate for "internal" padding bits would be a ones' complement
machine with no native at-least-32 and/or at-least-64 bit types,
where "long" and/or "long long" are made up of several machine
words glued together, with the sign bit unused (and always-0) in
the lower order words. One might do this when implementing C99 on
a Univac 11xx (Unisys? what *are* they called these days?) series
machine, for instance.

Chris Torek · Nov 3, 2007

(Given:

T **mem;
size_t size = n * sizeof *mem;
mem = malloc(size);
... check for success ...
memset(mem, 0, size);

)

... my understanding was in pointer context 0 and NULL is converted to
null pointer.

Assuming you mean the same thing I do by "in pointer context",
yes.

And converting to null pointer is compiler responsibility.

Yes -- but providing "pointer context" is the programmer's.

So I thought 0 in memset will be converted to null
pointer (which is system specific).

This is where you go astray: the call to memset() loses the "pointer
context" in question.

Remember that we *can* do this:

unsigned char table[100];
...
/* now zero out all 100 bytes in table[] */
memset(table, 0, 100);

as well as the example at the top of this article. How will memset()
"know" whether we passed the address of the first byte of 100
"unsigned char"s (i.e., "table"), or the first byte of n "pointer
to ..."s (i.e., "mem")?

The answer is that it does *not* know. Instead, it just *assumes*
that its first argument is a pointer to "ordinary integer" bytes,
i.e., in the style of memset(table). It thus sets all the bytes
to "integer zeros", not "pointer nulls".

To put it another way, "pointer context" survives only across "very
short distances" in C, specifically, certain operators.

Given an operator that takes two operands -- such as the ordinary
assignment operator "=", or the comparison operators "==" and "!="
-- a C compiler will detect that one operand has some pointer type
while the other is the integer constant zero, and in those specific
cases, will convert the "integer constant zero" to "null pointer of
appropriate type". Thus, in:

T **mem;
mem = NULL;
mem = 0;

the two assignments have the exact same effect, because the compiler
can see that "mem" has a pointer type (specifically "pointer to
pointer to T", whatever type T may be). Or, after calling malloc()
successfully to set mem to something non-NULL:

mem = 0;

again supplies a pointer type on the left (because mem has type
"pointer to T") and an integer-constant-zero on the right, and the
compiler can -- indeed, must -- convert that zero to an appropriate
null pointer value.

These are examples of "pointer context". Arguments to prototyped
functions also provide a short-term "pointer context", because
parameter passing (when prototypes are used) is defined in terms
of ordinary assignment:

void zorg(double *evil);
...
zorg(0);

is very much like writing "evil = 0" (except that the assignment
is actually to whatever parameter-name zorg() uses, which may
differ:

void zorg(double *trouble) { ... }

The name in the prototype is optional and need not match the
actual formal parameter name; the "assignment" happens during
the subroutine call).

In the case of memset(), however, the one pointer parameter --
which I call "base" here -- has type "void *":

void *memset(void *base, int c, size_t n);

and "void *" points to no type at all. We cannot tell from the
call alone whether memset() will use "base" as a pointer to pointers,
or to integers, or to floating-point values, or indeed anything.
The only information we have is in whatever documentation we have
(in this case, the C standard itself!) describing the function.
It tells us that, internally, the mem*() functions convert their
pointer parameters to "unsigned char *", and treat the memory region
as an array of bytes ("unsigned char"s, an integral type).

(Note that in the absence of a prototype, or if the prototype ends
in ", ..." and we are in the "..." part of the call, parameters
are *not* passed as if by ordinary assignment, but rather with the
"default argument promotions". For most C programmers in most
situations, this little wrinkle can be ignored, since most of us
will use prototypes always. The exception occurs in calls to
variadic functions like printf(), where we have to be careful with
our parameters, especially with pointers and the "%p" directive.)

Flash Gordon · Nov 3, 2007

Ark Khasin wrote, On 03/11/07 20:05:

pete said:
pete said:

Ark said:

santosh wrote:
Ark Khasin wrote:
Ben Bacarisse wrote:
<snip>

No. unsigned char may not have padding bits.

Click to expand...

Is this "just a theory"?

Click to expand...

No.

N869
5.2.4.2.1 Sizes of integer types <limits.h>

[#2] The value UCHAR_MAX+1 shall equal 2 raised to the
power CHAR_BIT.

Click to expand...

Thanks to adding to my confusion
So I have an 11-bit machine bytes and UCHAR_MAX==8 and 3 padding most
significant bits. Anything wrong?

CHAR_BIT is the number of bits in a signed, unsigned and plain char.
Note, the number of bits, NOT the number of value bits. Therefore, as
UCHAR_MAX is 2 raised to the power of CHAR_BIT all of the bits must be
value bits.

BTW, if I am not mistaken, in other integer types padding bits don't
have to be contiguous.

The padding bits can be anywhere, but short of using an unsigned char
pointer to look at the representation they are hard to get at since the
bitwise operations are defined as operating on values.

James Kuyper · Nov 3, 2007

[>> Ben Bacarisse wrote:]
....
The parenthesized comment was not actually needed to make the statement
"scrupulously correct"; it would have been just as correct, and less
confusing, without it.

That's where I am lost and reading the standard doesn't help:
What's the difference between a value of an object and how it compares
equal? I mean, if a==b, whatever their representations, in what
context(s) does it make sense to say they may have different values?

There is no difference. Don't let the unnecessary "clarification"
confuse you. The issue isn't having different values with the same
representation in a single type - that can't happen. The issue is that
there can be multiple different representations of the same value in a
given type. However, the values of objects of that type containing those
different representations must compare equal.

You're tripping over a minor issue; the fact that there can be multiple
representations of a null pointer. However, you've lost track of the key
issue: that a pointer object with all of its bits set to 0 doesn't have
to be one of those representations. In fact, it doesn't have to
represent a valid pointer value of any kind.

[NEGATIVE_ZERO comes to mind - and goes away. BTW, is it fair to say
that bitwise logic is a magic performed on representations, and not on
values?]

No. In general, the bitwise operations are defined in terms of their
actions on the values, not the representations. For instance, E>>1 is
defined as dividing the value of E by 2. The complicated exceptions all
involve sign bits, and most result in undefined behavior, which is why
it's strongly recommended that bitwise operations be restricted to
unsigned types, or at least restricted to values which are guaranteed to
be positive both before and after the operation.

There is, at this point, no guarantee that 'a' contains a valid pointer
representation. Therefore, the next line renders the behavior of your
entire program undefined:

//Which is correct but implies
{
void **pNULL = 0;
if(a==*pNULL) {
/* not guaranteed */

I'm not sure what your point was; but you've just attempted to
dereference a null pointer, again making the behavior undefined.

pete · Nov 3, 2007

Ark said:
Ark said:

santosh wrote:
Ark Khasin wrote:
Ben Bacarisse wrote:
<snip>

No. unsigned char may not have padding bits.

Click to expand...

Is this "just a theory"?

Click to expand...

No.

N869
5.2.4.2.1 Sizes of integer types <limits.h>

[#2] The value UCHAR_MAX+1
shall equal 2 raised to the power CHAR_BIT.

Click to expand...

Thanks to adding to my confusion
So I have an 11-bit machine bytes

That's what "CHAR_BIT equals eleven" means.

Ben Bacarisse · Nov 4, 2007

James Kuyper said:
[>> Ben Bacarisse wrote:]
...
The parenthesized comment was not actually needed to make the
statement "scrupulously correct"; it would have been just as correct,
and less confusing, without it.

Sorry if I've confused the issue. I was worried about suggesting that
there was only one such thing (one null pointer) but I can see that I
clearly don't. Maybe I did at some point as I was editing the text.

[NEGATIVE_ZERO comes to mind - and goes away. BTW, is it fair to say
that bitwise logic is a magic performed on representations, and not
on values?]

Click to expand...

No. In general, the bitwise operations are defined in terms of their
actions on the values, not the representations.

Is that true for &, |, ^ and ~? The definitions are very bland, but
they suggest (simply by saying so little) that the interpretation is
to be based on the representation. This is backed up by section
6.5p4.

Ark Khasin · Nov 4, 2007

pete said:
<snip>

No. unsigned char may not have padding bits.
Is this "just a theory"?
No.

N869
5.2.4.2.1 Sizes of integer types <limits.h>

[#2] The value UCHAR_MAX+1
shall equal 2 raised to the power CHAR_BIT.

Click to expand...

Thanks to adding to my confusion
So I have an 11-bit machine bytes

Click to expand...

That's what "CHAR_BIT equals eleven" means.

Sorry for being that stubborn, but:
Why?
Why can't I have CHAR_BIT==8 on a 11-bit machine?
E.g. my int would be something like say 11(lower)+8(upper)=19 bits.
Is it postulated somewhere that
UINT_MAX+1==(UCHAR_MAX+1)*sizeof(unsigned)
?
I don't think so.

Ark Khasin · Nov 4, 2007

Ben said:
[NEGATIVE_ZERO comes to mind - and goes away. BTW, is it fair to say
that bitwise logic is a magic performed on representations, and not
on values?]

Click to expand...

No. In general, the bitwise operations are defined in terms of their
actions on the values, not the representations.

Click to expand...

Is that true for &, |, ^ and ~? The definitions are very bland, but
they suggest (simply by saying so little) that the interpretation is
to be based on the representation. This is backed up by section
6.5p4.

Yes, I took a beating in this ng recently for proposing, as an academic
exercise,
int cmpneq(int a, int b){ return a^b; }
At the time, I agreed that the beating was well deserved. But as far as
I can tell, it depended on ^ operating on representations.
An authoritative and well-substantiated clarification would be more than
welcome!

Ian Collins · Nov 4, 2007

Ark said:
pete said:

<snip>

No. unsigned char may not have padding bits.
Is this "just a theory"?
No.

N869
5.2.4.2.1 Sizes of integer types <limits.h>

[#2] The value UCHAR_MAX+1
shall equal 2 raised to the power CHAR_BIT.

Thanks to adding to my confusion
So I have an 11-bit machine bytes

Click to expand...

That's what "CHAR_BIT equals eleven" means.

Click to expand...

Sorry for being that stubborn, but:
Why?
Why can't I have CHAR_BIT==8 on a 11-bit machine?
E.g. my int would be something like say 11(lower)+8(upper)=19 bits.

Given sizeof(char) == 1 by definition, how would you express
sizeof(int)? 2.38 doesn't fit into size_t very well....

Ben Pfaff · Nov 4, 2007

Ark Khasin said:
Why can't I have CHAR_BIT==8 on a 11-bit machine?
E.g. my int would be something like say 11(lower)+8(upper)=19 bits.

Because the individual bytes in an object must be able to be
inspected and modified. If I understand what you are proposing,
there would be 3 bits in the lower byte of your 19-bit int that
would not appear when that byte was inspected, because a char
would only be 8 bits wide.

Ark Khasin · Nov 4, 2007

Ark said:
pete said:

<snip>

No. unsigned char may not have padding bits.
Is this "just a theory"?
No.

N869
5.2.4.2.1 Sizes of integer types <limits.h>

[#2] The value UCHAR_MAX+1
shall equal 2 raised to the power CHAR_BIT.

Thanks to adding to my confusion
So I have an 11-bit machine bytes

Click to expand...

That's what "CHAR_BIT equals eleven" means.

Click to expand...

Sorry for being that stubborn, but:
Why?
Why can't I have CHAR_BIT==8 on a 11-bit machine?
E.g. my int would be something like say 11(lower)+8(upper)=19 bits.
Is it postulated somewhere that
UINT_MAX+1==(UCHAR_MAX+1)*sizeof(unsigned)
?
I don't think so.

Sorry for posting nonsense contradicting 6.2.6.1 #4.
It appears indeed that I cannot have 11+9-bit int. While I can have
8+8=16-bit int etc, such a C machine would simply ignore the 3 of 11
bits. Or it can use for padding, which demonstrates that padding of
unsigned char is possible e.g. for trap values (for instance,
uninitialized or truncated on assignment or whatever).
Would it be a legit implementation?

Ben Bacarisse · Nov 4, 2007

Ark Khasin said:
Ben said:

[NEGATIVE_ZERO comes to mind - and goes away. BTW, is it fair to say
that bitwise logic is a magic performed on representations, and not
on values?]
No. In general, the bitwise operations are defined in terms of their
actions on the values, not the representations.

Click to expand...

Is that true for &, |, ^ and ~? The definitions are very bland, but
they suggest (simply by saying so little) that the interpretation is
to be based on the representation. This is backed up by section
6.5p4.

Click to expand...

Yes, I took a beating in this ng recently for proposing, as an
academic exercise,
int cmpneq(int a, int b){ return a^b; }
At the time, I agreed that the beating was well deserved. But as far
as I can tell, it depended on ^ operating on representations.
An authoritative and well-substantiated clarification would be more
than welcome!

If you think about it, you can *always* define the meaning in terms of
values even if it is more natural to think of it in terms of
representations. However, that would be stretching a point. An
expression like '-1 | -2' does not invoke undefined behaviour and the
result is most easily explained in terms of the representation of the
operands. (Of course it is daft, but that is not really the point.)

Ben Pfaff · Nov 4, 2007

Ark Khasin said:
It appears indeed that I cannot have 11+9-bit int. While I can have
8+8=16-bit int etc, such a C machine would simply ignore the 3 of 11
bits. Or it can use for padding, which demonstrates that padding of
unsigned char is possible e.g. for trap values (for instance,
uninitialized or truncated on assignment or whatever).
Would it be a legit implementation?

You mean an implementation with CHAR_BIT == 11 but only 8 value
bits in an unsigned char? No, that would not be a legitimate
implementation because unsigned char may not have padding bits.

Ben Bacarisse · Nov 4, 2007

Ark Khasin said:
Ark said:

pete said:

<snip>

No. unsigned char may not have padding bits.
Is this "just a theory"?
No.

N869
5.2.4.2.1 Sizes of integer types <limits.h>

[#2] The value UCHAR_MAX+1
shall equal 2 raised to the power CHAR_BIT.

Thanks to adding to my confusion
So I have an 11-bit machine bytes

That's what "CHAR_BIT equals eleven" means.

Click to expand...

Sorry for being that stubborn, but:
Why?
Why can't I have CHAR_BIT==8 on a 11-bit machine?
E.g. my int would be something like say 11(lower)+8(upper)=19 bits.
Is it postulated somewhere that
UINT_MAX+1==(UCHAR_MAX+1)*sizeof(unsigned)
?
I don't think so.

Click to expand...

Sorry for posting nonsense contradicting 6.2.6.1 #4.
It appears indeed that I cannot have 11+9-bit int. While I can have
8+8=16-bit int etc, such a C machine would simply ignore the 3 of 11
bits. Or it can use for padding, which demonstrates that padding of
unsigned char is possible e.g. for trap values (for instance,
uninitialized or truncated on assignment or whatever).
Would it be a legit implementation?

No. Unsigned char can't have padding bits. It is not permitted.
Neither are trap representations.

If you choose to fake an 8-bit char on your 11-bit hardware you must
do so in such a way as to hide all evidence of the extra bits.

Padding bits are visible. You can tell they are there because the set
of representable values in type T is less than or equal 2**(CHAR_BIT *
sizeof(T) - 1). I.e. at least one bit does not contribute to the set
of values.

Outputting signal values to terminal Within Character Array	0	Dec 10, 2021
assigning const char* to char*	6	Jun 1, 2007
printing first char	17	Jun 22, 2013
Arrays - Processing 3 (using Java Subscript)	0	Dec 10, 2018
Assigning a pointer to an array	10	Sep 11, 2010
int * vs char *	50	Jun 21, 2011
Objects, lists and assigning values	5	Apr 5, 2007
why assigning to mismatch type in template still works.	6	Jul 13, 2013

Assigning values to char arrays

Ark Khasin

santosh

pete

Ark Khasin

CBFalconer

santosh

Chris Torek

Chris Torek

Flash Gordon

James Kuyper

pete

Ben Bacarisse

Ark Khasin

Ark Khasin

Ian Collins

Ben Pfaff

Ark Khasin

Ben Bacarisse

Ben Pfaff

Ben Bacarisse

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads