different struct sizes

  • Thread starter Borked Pseudo Mailed
  • Start date
R

Richard Bos

=?utf-8?B?SGFyYWxkIHZhbiBExLNr?= said:
Well yeah, but the is* functions are never (counterexamples are
welcome) useful for binary data,

I can think of only one reason: a strings-like utility program. That's
system-level enough not to make this a problem.

Richard
 
J

Jordan Abel

2006-11-07 said:
I can think of only one reason: a strings-like utility program. That's
system-level enough not to make this a problem.

How is strings system-level?

#define N 4
main(int argc, char **argv) {
int c;
int i=0;
char buf[N];
FILE *myfile = ...;
while(c = getc(file) != EOF) {
if(isprint(c) || isspace(c))
if(i<N) {
buf[i++]=c;
if(i==N) { i++; fwrite(buf,1,N,stdout); }
} else {
putchar(c);
}
}
else {
if(i==N+1) {
putchar('\n');
i=0;
}
}
}
putchar('\n');
}
 
R

Richard Tobin

Keith Thompson said:
In this newsgroup, I can and I do expect words defined in the C
standard to be used in accordance with the way the C standard defines
them.

Well, you're often going to be disappointed.

The newsgroup isn't the *only* context determining how a word should
be interpreted. Usenet isn't a standards document, it's a
conversation.
Anyone using the word "byte" here in some other sense needs to
say so.

The statement was:

sizeof(char) is 1 by definition, that doesn't mean that it's one byte

And he said it in the context of a up-thread statement

Certain platforms use 4 bytes memory for "char" and "short int"
variables

Why not start by assuming that the person is talking sense? (Of
course, you may have to change your assumption later.) Since "byte"
corresponds to "char" in C standards-speak, it would make no sense for
him to say "that doesn't mean that it's one byte" if he had meant the
C-standards-speak usage of "byte". Assuming that he is talking sense
removes the ambiguity, and your response therefore seems excessively
pedantic.

-- Richard
 
R

Richard Tobin

Even if stdio support is provided, I don't think it actually matters very
much about the EOF issue, in practical terms. Yes, 0xFFFF (or 0xFFFFFFFF,
or however many bits you're dealing with) can be seen as either EOF or a
character value, but (a) there's no problem when CHAR_BIT < 16, and (b)
when CHAR_BIT /is/ 16 or higher, the chances of real world data containing
a genuine character with the maximum possible value are pretty low. You'd
have to work pretty hard to find a counter-example, I think.

And anyone designing such a character set today would be nuts.
Unicode for example makes 0xFFFF be an explicit non-character,
because of its likely use for purposes such as EOF.

-- Richard
 
P

pete

Richard said:
Well, you're often going to be disappointed.

The newsgroup isn't the *only* context determining how a word should
be interpreted. Usenet isn't a standards document, it's a
conversation.


And he said it in the context of a up-thread statement

Certain platforms use 4 bytes memory for "char" and "short int"
variables

Why not start by assuming that the person is talking sense?

Because it makes no sense.

N869
6.5.3.4 The sizeof operator
The sizeof operator yields the size (in bytes) of its operand
 
K

Keith Thompson

Well, you're often going to be disappointed.

I often am, but somehow I muddle through.
The newsgroup isn't the *only* context determining how a word should
be interpreted. Usenet isn't a standards document, it's a
conversation.


And he said it in the context of a up-thread statement

Certain platforms use 4 bytes memory for "char" and "short int"
variables

Why not start by assuming that the person is talking sense? (Of
course, you may have to change your assumption later.) Since "byte"
corresponds to "char" in C standards-speak, it would make no sense for
him to say "that doesn't mean that it's one byte" if he had meant the
C-standards-speak usage of "byte". Assuming that he is talking sense
removes the ambiguity, and your response therefore seems excessively
pedantic.

If we're not going to use the C standard's definition of "byte" in
comp.lang.c, where are we going to use it? The point of defining
terms is to have a common vocabulary so we can discuss things without
talking past each other.

I understand that the word "byte" is often used differently outside
the context of the C programming language. People these days often
use it as a synonym for "octet", though that's inconistent with the
original meaning of the word (which predates C).

The statement was:

sizeof(char) is 1 by definition, that doesn't mean that it's one byte

but that's exactly what it means, since sizeof yields the size of its
operand *in bytes*. That's not pedantry, it's simple correctness.

The alternative is to explicitly qualify every usage of the word
"byte" with either "(meaning 8 bits)", or "(in the sense defined in
the C standard)", or whatever, and to do the same for every other
technical word that may have more than one meaning.
 
M

Mark L Pappin

If it is known that a program's input is not text but (possibly)
binary, the wise programmer doesn't use fgetc() in the first
place. He uses fread(),

Or, checks feof() after fgetc() returns something that equals EOF.

mlp
 
M

Mark McIntyre

Important distinction. When I ported lcc-win32 to a DSP, each
character took two bytes (16 bits)

Strictly speaking, it took two OCTETS.
because the machine could not
address odd bytes.
octets...

Still, sizeof(char) was 1 of course.

And it was still one byte, I'm afraid.

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
G

Gordon Burditt

Even if stdio support is provided, I don't think it actually matters very
much about the EOF issue, in practical terms. Yes, 0xFFFF (or 0xFFFFFFFF,
or however many bits you're dealing with) can be seen as either EOF or a
character value, but (a) there's no problem when CHAR_BIT < 16, and (b)
when CHAR_BIT /is/ 16 or higher, the chances of real world data containing
a genuine character with the maximum possible value are pretty low. You'd
have to work pretty hard to find a counter-example, I think.

The chances of real-world data containing any particular unusual
combination of data that makes the program malfunction is pretty
much certainty if there is anything to be gained by writing MALICIOUS
code and the program accepts data from the real world.

Read some Microsoft security reports (viruses are not limited to
Microsoft but their code is the biggest target due to market share).
What are the chances of some of the exploits showing up BY ACCIDENT?
Pretty much zero. What are the chances of the exploits showing up
MALICIOUSLY? Nearly certainty.
 
O

Old Wolf

Keith said:
A C implementation *must* allow char objects to be stored at odd byte
addresses. It can choose to align all single declared char objects,
or even char struct members, at even addresses if that makes access
easier or faster, but there can be no padding between array elements:

char arr[2];
/* either arr[0] or arr[1] is at an odd byte address */

If the hardware doesn't allow this (or makes it too expensive), then
the implementation can make bytes bigger than 8 bits.

How do you define "odd byte address" ?

I don't see why you can't have 16-bit bytes, at addresses 0, 2, 4, ...
 
R

Richard Heathfield

Old Wolf said:
Keith said:
A C implementation *must* allow char objects to be stored at odd byte
addresses. It can choose to align all single declared char objects,
or even char struct members, at even addresses if that makes access
easier or faster, but there can be no padding between array elements:

char arr[2];
/* either arr[0] or arr[1] is at an odd byte address */

If the hardware doesn't allow this (or makes it too expensive), then
the implementation can make bytes bigger than 8 bits.

How do you define "odd byte address" ?

I don't see why you can't have 16-bit bytes, at addresses 0, 2, 4, ...

16-bit bytes are not a problem, but of course they will be at addresses 0,
1, 2, 3...

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: normal service will be restored as soon as possible. Please do not
adjust your email clients.
 
B

Ben Pfaff

Richard Heathfield said:
Old Wolf said:


16-bit bytes are not a problem, but of course they will be at addresses 0,
1, 2, 3...

I don't think that a strictly conforming program could tell the
difference. The representation of pointers doesn't have to be
flat, and there's no reason that subtraction of pointers to char
can't divide by 2. (Is there?)
 
R

Richard Heathfield

Ben Pfaff said:
I don't think that a strictly conforming program could tell the
difference. The representation of pointers doesn't have to be
flat, and there's no reason that subtraction of pointers to char
can't divide by 2. (Is there?)

If you have a pointer to the first character in a string, and a pointer to
the second character in the same string, those pointers are required to
differ by 1 (because arrays are stored contiguously, and sizeof(char) is
guaranteed to be 1). In an architecture where the concepts of "even
address" and "odd address" are meaningful, it seems to me that one or other
of those pointers *must* store an odd address.

What am I missing?

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: normal service will be restored as soon as possible. Please do not
adjust your email clients.
 
B

Ben Pfaff

Richard Heathfield said:
Ben Pfaff said:


If you have a pointer to the first character in a string, and a pointer to
the second character in the same string, those pointers are required to
differ by 1 (because arrays are stored contiguously, and sizeof(char) is
guaranteed to be 1).

We may be talking past one another. I agree that the result of
subtracting pointers must be 1 in this case. But the numbers
that represent the pointers involved in the subtraction could
differ by 2, or by 4, or by 623.
In an architecture where the concepts of "even
address" and "odd address" are meaningful, it seems to me that one or other
of those pointers *must* store an odd address.

I don't think the concepts of even and odd addresses would be
meaningful in the architecture that I'm envisioning.
Potentially, all pointers could be represented by odd numbers, or
by even numbers, or the bit with value 1 could be the parity of
the rest of the bits, or whatever.
 
K

Keith Thompson

Old Wolf said:
Keith said:
A C implementation *must* allow char objects to be stored at odd byte
addresses. It can choose to align all single declared char objects,
or even char struct members, at even addresses if that makes access
easier or faster, but there can be no padding between array elements:

char arr[2];
/* either arr[0] or arr[1] is at an odd byte address */

If the hardware doesn't allow this (or makes it too expensive), then
the implementation can make bytes bigger than 8 bits.

How do you define "odd byte address" ?

I don't see why you can't have 16-bit bytes, at addresses 0, 2, 4, ...

Actually, I don't define "odd byte address" -- but the standard does
implicitly use the concept without defining it. C99 3.2 defines the
term "alignment":

alignment

requirement that objects of a particular type be located on
storage boundaries with addresses that are particular multiples of
a byte address

Strictly speaking, 0, 2, 4, ... are not addresses; they're integers.
(void*)0, (void*)2, (void*)4, ... are addresses, but the standard
tells us nothing about their representation. On some systems I've
used (Cray vector systems), machine addresses point to 64-bit words,
but CHAR_BIT==8, and void* and char* have extra offset information.
An pointer object whose representation looks like an odd integer might
point to what I'd call an even byte address, and an pointer object
whose representation looks like an even integer might point to what
I'd call an odd byte address.

My assumption is that if p is an even address, then p+1 is an odd
address (assuming p is a character pointer), regardless of the
representation; I think that's the only model that's consistent with
the standard's concept of "alignment".
 
R

Richard Heathfield

Ben Pfaff said:

We may be talking past one another.

Always possible. But we seem to be converging now.
I agree that the result of
subtracting pointers must be 1 in this case. But the numbers
that represent the pointers involved in the subtraction could
differ by 2, or by 4, or by 623.

Pathologically, yes, I suppose you're correct. At least, my instinct says
it's complete nonsense, but I don't see any conforming way to establish a
mapping between the object representation of a pointer and the actual
memory address that it represents. And so my clc experience contradicts my
instincts, and tells me that it's time to fold.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: normal service will be restored as soon as possible. Please do not
adjust your email clients.
 
K

Keith Thompson

Ben Pfaff said:
We may be talking past one another. I agree that the result of
subtracting pointers must be 1 in this case. But the numbers
that represent the pointers involved in the subtraction could
differ by 2, or by 4, or by 623.
[...]

What "numbers that represent the pointers" are you referring to?
Pointers aren't necessarily represented as numbers, and the results of
conversions between pointer and integer types are
implementation-defined.

But see my discussion of "alignment" elsethread.
 
B

Ben Pfaff

Keith Thompson said:
Ben Pfaff said:
We may be talking past one another. I agree that the result of
subtracting pointers must be 1 in this case. But the numbers
that represent the pointers involved in the subtraction could
differ by 2, or by 4, or by 623.
[...]

What "numbers that represent the pointers" are you referring to?
Pointers aren't necessarily represented as numbers, and the results of
conversions between pointer and integer types are
implementation-defined.

Every byte of memory represents a number. Because a pointer is
made out of bytes, it can also be said to be represented by a
number, formed by concatenating bits. For many implementation,
this number is meaningful.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top