Malloc and sizeof

G

gamehack

Hi all,

As I was reading sizeof returns the size in units of char. So if
sizeof(int) == 4, it means int is 4 times as big as char =>
sizeof(char) == 1 (always). But malloc() is defined (from what I
googled) as allocating bytes. So malloc(1) will alocate one byte. But
on some systems a char can be 2/3/4 bytes. So really doing something
like:
char* p = malloc(1); for storing one character would not work. Or does
actually malloc allocate in units of char? And because char is 1 byte
on most system, people just assume that malloc allocates bytes? So what
do I do if I want a pointer to a byte in a portable way? Is it
possible?

Thanks
 
T

tedu

gamehack said:
As I was reading sizeof returns the size in units of char. So if
sizeof(int) == 4, it means int is 4 times as big as char =>
sizeof(char) == 1 (always). But malloc() is defined (from what I
googled) as allocating bytes. So malloc(1) will alocate one byte. But
on some systems a char can be 2/3/4 bytes. So really doing something

a char is always 1 byte.
 
C

Christopher Hulbert

gamehack said:
Hi all,

As I was reading sizeof returns the size in units of char.

6.5.3.4 paragraph 2. The sizeof operator yields the size (in bytes) of its
operand...

Paragraph 3. When applied to an operand that has type char,unsigned char, or
signed char, (...) the result is 1.

3.6 - A char whether signed or unsigned occupies exactly 1 byte

It does not say how many bits make up a byte (but must be at least 8 bits). A
byte could be 8,9,12,etc bits.


So if
 
K

Keith Thompson

gamehack said:
As I was reading sizeof returns the size in units of char. So if
sizeof(int) == 4, it means int is 4 times as big as char =>
sizeof(char) == 1 (always). But malloc() is defined (from what I
googled) as allocating bytes. So malloc(1) will alocate one byte. But
on some systems a char can be 2/3/4 bytes.

No, a char is always exactly 1 byte; that's how the C language defines
the word "byte". The C standard's usage of the term may not match the
meaning in other contexts; in particular, a C byte is guaranteed to be
at least 8 bits, but can be larger.

The number of bits in a byte is CHAR_BIT, a macro defined in
<limits.h>.
 
E

Emmanuel Delahaye

gamehack a écrit :
As I was reading sizeof returns the size in units of char. So if
sizeof(int) == 4, it means int is 4 times as big as char =>
sizeof(char) == 1 (always).
Correct.

But malloc() is defined (from what I
googled) as allocating bytes. So malloc(1) will alocate one byte. But
on some systems a char can be 2/3/4 bytes. So really doing something
like:

Nope. In C, a char and a byte are always 1. What may change is the
number of bits in a char (CHAR_BIT). It is defined in <limits.h> for
your implementation. Don't mix byte and octet. There are different
pieces of cake.
char* p = malloc(1); for storing one character would not work.

Yes, it will always work (unless there is no available memory and a NULL
is returned).
Or does
actually malloc allocate in units of char?

It allocates in number of units. The unit is char (aka byte).
And because char is 1 byte
on most system, people just assume that malloc allocates bytes? So what
do I do if I want a pointer to a byte in a portable way? Is it
possible?

Yes, it is.
 
F

Flash Gordon

tedu said:
a char is always 1 byte.

Correct in C terms but not as helpful as it could be.

In C by definition a byte is an "addressable unit of data storage large
enough to hold any member of the basic character set of the execution
environment". Note that this does *not* define how many bits are in a
byte. I've personally worked on systems where CHAR_BIT (the number of
bits in a byte) was 16 and there have been and continue to be systems
with other values.

In C the size of a char, signed char, and unsigned char are all defined
as 1 byte, where a byte is defined as above.

So if malloc(1) succeeds, it is guaranteed to provide enough space for 1
char.

However, C also defines a wide character type, and the standard also
refers to multibyte characters (at least, it does in C99). There is no
guarantee that a wide character or multibyte character will fit in a
single byte for obvious reasons.
 
J

Jalapeno

gamehack said:
Not really. Any source?
"On any C language implementation compliant with any C standard ever
written, sizeof(char) is exactly one, whether char is eight bits, 16,
60, or 64. If you use the GNU autoconf test for sizeof(char), you might
as well tattoo "I don't know what sizeof means" on your forehead."
That's from IBM developerWorks.
http://www-128.ibm.com/developerworks/power/library/pa-ctypes1/?ca=dgr-lnxwCTypesP1

You can start with the source you quoted. Further down in the article
Peter writes:

The built-in sizeof operator yields this size. This is often referred
to, quite correctly, as the "size in bytes," but it is important to
know that, in C, a /byte/ is "an object of type unsigned char." On a
system where unsigned char is larger than eight bits, sizeof(char) is
still always one, and everything else is counted in terms of the actual
number of bits in a char, not in terms of octets.
 
K

Keith Thompson

Jalapeno said:
You can start with the source you quoted. Further down in the article
Peter writes:

The built-in sizeof operator yields this size. This is often referred
to, quite correctly, as the "size in bytes," but it is important to
know that, in C, a /byte/ is "an object of type unsigned char." On a
system where unsigned char is larger than eight bits, sizeof(char) is
still always one, and everything else is counted in terms of the actual
number of bits in a char, not in terms of octets.

The standard's actual definition of "byte" is in C99 3.6:

byte

addressable unit of data storage large enough to hold any member
of the basic character set of the execution environment

NOTE 1 It is possible to express the address of each individual
byte of an object uniquely.

NOTE 2 A byte is composed of a contiguous sequence of bits, the
number of which is implementationdefined. The least significant
bit is called the _low-order bit_; the most significant bit is
called the _high-order bit_.

(The phrases "low-order bit" and "high-order bit" are in italics,
which means that these are the definitions of those terms -- but notes
are non-normative. It's a bit odd in a comp.std.c sense, but clear
enough in a comp.lang.c sense.)
 
J

Jalapeno

The standard's actual definition of "byte" is in C99 3.6:

Absolutely. My point, I guess, was that the OP was quoting an article
that would have cleared up his apparent misconception regarding a C
byte and an octet as byte had he read a few more sentences.
 
M

Mark McIntyre

On 17 Jan 2006 11:26:57 -0800, in comp.lang.c , "gamehack"

please don't top post. It makes it extremely hard to read posts/.
Not really. Any source?

Yes, really. See the C standard.
"On any C language implementation compliant with any C standard ever
written, sizeof(char) is exactly one, whether char is eight bits, 16,
60, or 64. If you use the GNU autoconf test for sizeof(char), you might
as well tattoo "I don't know what sizeof means" on your forehead."

Indeed.
Mark McIntyre
 
M

Martin Ambuhl

gamehack said:
Hi all,

As I was reading sizeof returns the size in units of char. So if
sizeof(int) == 4, it means int is 4 times as big as char =>
sizeof(char) == 1 (always). But malloc() is defined (from what I
googled) as allocating bytes. So malloc(1) will alocate one byte. But
on some systems a char can be 2/3/4 bytes.

In C, the word 'byte' refers to the space occupied by a 'char' and must
be able to represent the values which would be represented by 8 bits on
a twos-complement binary machine. There is no such thing as a char or 2
or 3 or 4 bytes: if a char is 16 bits, it occupies 1 byte; if a char is
24 bits, it occupies 1 byte; if a char is 32 bits, it occupies 1 byte.

Remember that the word 'byte' refers to a collection of continguous
bits. On the GE645, for which Multics was written, a 'byte' was 9 bits.
In the PDP-6 and PDP-10 series, there were byte pointers at the
assembly language which could handle any byte size of 1 to 36 bits.
What you are thinking of as 'bytes' are actually 'octets'. The world is
not defined by your platform.
 
M

Malcolm

Keith Thompson said:
No, a char is always exactly 1 byte; that's how the C language defines
the word "byte". The C standard's usage of the term may not match the
meaning in other contexts; in particular, a C byte is guaranteed to be
at least 8 bits, but can be larger.

The number of bits in a byte is CHAR_BIT, a macro defined in
<limits.h>.
"byte" is a word that has become a bit vaguer in meaning. It used to be the
smallest addressible unit of memory, and in a C context a "char" is always
the smallest addressible unit of memory.
However the term has come to mean an ASCII character, or 8 bits. C chars may
not be 8 bits, even if the underlying hardware supports 8-bit addressing.
Nowadays the concept of the "smallest addressible unit" isn't as clear,
anyway, because modern systems tend to supports lots of operations on a
hierarchy of memory, with very different time penalties.
 
J

Jordan Abel

Correct in C terms but not as helpful as it could be.

In C by definition a byte is an "addressable unit of data storage large
enough to hold any member of the basic character set of the execution
environment".

By that definition, there could be 7 bits in a byte, and 14 in a char,
couldn't there?
 
K

Keith Thompson

Jordan Abel said:
By that definition, there could be 7 bits in a byte, and 14 in a char,
couldn't there?

Such a thing might or might not violate the definition of "byte" in
C99 3.6 (depending on how you read it), but it would violate the
requirement in 6.5.3.4:

When applied to an operand that has type char, unsigned char, or
signed char, (or a qualified version thereof) the result is 1.
 
F

Flash Gordon

Jordan said:
By that definition, there could be 7 bits in a byte, and 14 in a char,
couldn't there?

The first half of that comment is arguably reasonable criticism since I
failed to quote the requirement (from a completely different part of the
standard) that it be at least 8 bits. However, the second part is not
reasonable because I also stated but your have snipped without noting
it, "In C the size of a char, signed char, and unsigned char are all
defined as 1 byte, where a byte is defined as above." Now, if you can
tell me how there can be more bits in a char that a byte when I've
stated that a char is the size of 1 byte I'll accept you are being
reasonable. Otherwise it looks like you are deliberately and without
comment cutting things to make it look like people are making errors
that they are not making.
 
L

Logan Shaw

Emmanuel said:
gamehack a écrit :

Yes, it is.

Assuming that "byte" means the C definition of a byte. If byte means
octet (which may be what the original poster meant), then the answer
is that it's not always possible, right?

- Logan
 
J

Jordan Abel

The first half of that comment is arguably reasonable criticism since I
failed to quote the requirement (from a completely different part of the
standard) that it be at least 8 bits. However, the second part is not
reasonable because I also stated but your have snipped without noting
it, "In C the size of a char, signed char, and unsigned char are all
defined as 1 byte, where a byte is defined as above." Now, if you can
tell me how there can be more bits in a char that a byte when I've
stated that a char is the size of 1 byte I'll accept you are being
reasonable. Otherwise it looks like you are deliberately and without
comment cutting things to make it look like people are making errors
that they are not making.

I didn't see it, (I snipped without looking thoroughly), I didn't accuse
anyone of any errors, and as far as I can tell, my comment cannot even
be divided into a "first half" and a "second part". Did you, perhaps,
imagine the second part?
 
K

Keith Thompson

Logan Shaw said:
Assuming that "byte" means the C definition of a byte.

Yes, of course. This is comp.lang.c, after all.
If byte means
octet (which may be what the original poster meant), then the answer
is that it's not always possible, right?

Byte doesn't mean octet around here.
 
L

Logan Shaw

Keith said:
Yes, of course. This is comp.lang.c, after all.
Byte doesn't mean octet around here.

I find communication is usually more effective if you use the
meanings of words that the speaker understands and intends.
That is not to say that definitions, when they are wrong,
shouldn't be corrected. They should be. But they should be
understood for what they are as well, IMHO.

- Logan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

malloc 40
Alternative to Malloc in C 0
C pipe 1
malloc and maximum size 56
Fibonacci 0
Adding adressing of IPv6 to program 1
Why sizeof(main) = 1? 8
sizeof() 6

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,280
Latest member
BGBBrock56

Latest Threads

Top