Magic structs

  • Thread starter Name and address withheld
  • Start date
N

Name and address withheld

I am trying to understand how ustr (http://www.and.org/ustr/design)
works. Its a string-handling library for C, and it stores strings in the
following struct, called a "magic struct"

struct Ustr
{
unsigned char data[1];
/* 0b_wxzy_nnoo =>
*
* w = allocated
* x = has size
* y = round up allocations (off == exact
* allocations)
* z = memory error
* nn = refn
* oo = lenn
*
* 0b_0000_0000 = "", const, no alloc fail, no mem
* err, refn=0, lenn=0 */
};

It seems to me that a struct Ustr will only occupy 1-byte of memory, so
how can it contain even a simple pointer to char??? What's happening
here?
 
S

santosh

Name said:
I am trying to understand how ustr (http://www.and.org/ustr/design)
works. Its a string-handling library for C, and it stores strings in the
following struct, called a "magic struct"

struct Ustr
{
unsigned char data[1];
/* 0b_wxzy_nnoo =>
*
* w = allocated
* x = has size
* y = round up allocations (off == exact
* allocations)
* z = memory error
* nn = refn
* oo = lenn
*
* 0b_0000_0000 = "", const, no alloc fail, no mem
* err, refn=0, lenn=0 */
};

It seems to me that a struct Ustr will only occupy 1-byte of memory, so
how can it contain even a simple pointer to char??? What's happening
here?

Search the Google Groups archive for this group on the topic 'struct hack'
 
R

Richard Heathfield

Name and address withheld said:
I am trying to understand how ustr (http://www.and.org/ustr/design)
works. Its a string-handling library for C, and it stores strings in
the following struct, called a "magic struct"

struct Ustr
{
unsigned char data[1];
};

It seems to me that a struct Ustr will only occupy 1-byte of memory,
so how can it contain even a simple pointer to char???

It can't, unless pointers are only one byte wide (which is possible but
rare).
What's happening here?

Broken code is happening here.
 
M

Malcolm McLean

Name and address withheld said:
I am trying to understand how ustr (http://www.and.org/ustr/design)
works. Its a string-handling library for C, and it stores strings in the
following struct, called a "magic struct"

struct Ustr
{
unsigned char data[1];
/* 0b_wxzy_nnoo =>
*
* w = allocated
* x = has size
* y = round up allocations (off == exact
* allocations)
* z = memory error
* nn = refn
* oo = lenn
*
* 0b_0000_0000 = "", const, no alloc fail, no mem
* err, refn=0, lenn=0 */
};

It seems to me that a struct Ustr will only occupy 1-byte of memory, so
how can it contain even a simple pointer to char??? What's happening
here?
struct Ustr x;

x.data is effectively a pointer to a single char. the sturct doesn't need to
contain the address, the compiler can work out the address of the data array
from the address of the struct. Since there is only one item, it must in
fact be the same as the address of the struct.
We can now play silly games with the implementation to get more than one
character in the string. For instance if the struct sits at the top of an
uninitialised block of memory the last member, if it is an array, can be
extended. This is an unwarranted chumminess with the implementation, and
isn't to be recomended in your own code, but for something as fundamental as
a string library there is maybe a case for it. Just maybe.
 
R

Richard Heathfield

Malcolm McLean said:
struct Ustr
{
unsigned char data[1];
};

It seems to me that a struct Ustr will only occupy 1-byte of memory,
so how can it contain even a simple pointer to char??? What's
happening here?
struct Ustr x;

x.data is effectively a pointer to a single char.

No, it isn't. It's an array of a single char.
 
M

Malcolm McLean

Richard Heathfield said:
Malcolm McLean said:

No, it isn't. It's an array of a single char.
effectively.That means it isn't, but can be thought of as if it is for some
purposes.
 
K

Keith Thompson

Richard Heathfield said:
Malcolm McLean said:
struct Ustr
{
unsigned char data[1];
};

It seems to me that a struct Ustr will only occupy 1-byte of memory,
so how can it contain even a simple pointer to char??? What's
happening here?
struct Ustr x;

x.data is effectively a pointer to a single char.

No, it isn't. It's an array of a single char.

Well, yes and no.

x.data as an object is a member of the structure x, which is of type
struct Ustr. x.data is of type unsigned char[1], which is of course a
one-byte array.

But x.data as an expression, unless it appears as the operand of a
unary 'sizeof' or '&' operator, is implicitly converted to a value of
type 'unsigned char*', pointing to the first element of the array. If
you're willing to be unwarrantedly chummy with the implementation, you
can use this pointer to access memory beyond the bounds of the array
itself, taking advantage of the fact that most implementations don't
do the bounds checking that they're permitted to do (and that a
compiler writer would almost certainly be unwilling to break the
"struct hack").

Malcom, IMHO, should have been much clearer on this point.

The comp.lang.c FAQ <http://c-faq.com/>, discusses the struct hack, of
which this is a particularly odd example in question 2.6. But I've
never seen a usage of the struct hack where the array is the only
declared member of the structure.

Code using 'struct Ustr' is likely to work, but that's certainly not
the way I would have impleemnted it. Instead, I'd probably just use
'unsigned char*' or perhaps 'void*'.
 
K

Keith Thompson

Malcolm McLean said:
effectively.That means it isn't, but can be thought of as if it is for
some purposes.

Huh? I'm afraid I have no idea what that's supposed to mean.

See my other followup for what it *should* mean.
 
M

Malcolm McLean

Keith Thompson said:
Huh? I'm afraid I have no idea what that's supposed to mean.
If we say "this free kick is from a position that makes it effectively a
corner" then we are saying it not a corner - a kick awarded when the other
side put the ball out of play behind their own line; it is a free kick - a
kick usually awarded for foul play. However if the foul occured in the
extreme corner of the pitch, then the kick will be from almost the same
spot, and so in terms of tactics it can be thought of as a corner. The words
"effectively a corner" mean "not a corner".
See my other followup for what it *should* mean.
I should maybe have been a bit clearer. Your post is better.
 
J

James Antill

Well as the author of Ustr, I should probably respond...

Richard Heathfield said:
Malcolm McLean said:
struct Ustr
{
unsigned char data[1];
};

It seems to me that a struct Ustr will only occupy 1-byte of memory,
so how can it contain even a simple pointer to char??? What's
happening here?

struct Ustr x;

x.data is effectively a pointer to a single char.

No, it isn't. It's an array of a single char.

Well, yes and no.

x.data as an object is a member of the structure x, which is of type
struct Ustr. x.data is of type unsigned char[1], which is of course a
one-byte array.

But x.data as an expression, unless it appears as the operand of a unary
'sizeof' or '&' operator, is implicitly converted to a value of type
'unsigned char*', pointing to the first element of the array. If you're
willing to be unwarrantedly chummy with the implementation, you can use
this pointer to access memory beyond the bounds of the array itself,
taking advantage of the fact that most implementations don't do the
bounds checking that they're permitted to do (and that a compiler writer
would almost certainly be unwilling to break the "struct hack").

Right, the struct hack is _very_ well known IMNSHO ... I guess you could
argue that limiting the code to C99 and using:

struct Ustr
{
unsigned char info;
unsigned char data[];
};

....is "better" from a stds. POV, although it seemed more natural to me
to represent it as one unit.

Personally I'd say that "unwarrantedly chummy with the implementation"
is pretty harsh considering how likely the assumption is to fail.
Malcom, IMHO, should have been much clearer on this point.

The comp.lang.c FAQ <http://c-faq.com/>, discusses the struct hack, of
which this is a particularly odd example in question 2.6. But I've
never seen a usage of the struct hack where the array is the only
declared member of the structure.

Right, most people when they want that just go for "char *" as the
representation ... the main reason I didn't is that the I wanted to make
the compiler complain if you did:

Ustr *s1 = "abcd";

....if C had a "decent" form of typedef, I'd have just used that.
Code using 'struct Ustr' is likely to work, but that's certainly not the
way I would have impleemnted it. Instead, I'd probably just use
'unsigned char*' or perhaps 'void*'.

Sure, you might want to _implement_ it that way ... but would you want
to _use_ a string API that had took (unsigned char *) types? I certainly
wouldn't.

Atm. for the users of the library it looks like a "normal" string API
except that it's much more efficient than normal for small strings, and
you can easily create auto/const strings etc.
 
P

pete

santosh said:
I am trying to understand how ustr (http://www.and.org/ustr/design)
works. Its a string-handling library for C,
and it stores strings in the
following struct, called a "magic struct"

struct Ustr
{
unsigned char data[1];
/* 0b_wxzy_nnoo =>
*
* w = allocated
* x = has size
* y = round up allocations (off == exact
* allocations)
* z = memory error
* nn = refn
* oo = lenn
*
* 0b_0000_0000 = "", const, no alloc fail, no mem
* err, refn=0, lenn=0 */
};

It seems to me that a struct Ustr will
only occupy 1-byte of memory, so
how can it contain even a simple pointer to char??? What's happening
here?

Search the Google Groups archive
for this group on the topic 'struct hack'

I've never been able to understand the struct hack.
It looks like when the array expression is converted to a pointer,
that pointer will have address of
the lowest addressable byte of the structure.

So why not just use the address operator on the structure instead?
 
C

Chris Thomasson

pete said:
santosh wrote: [...]
I've never been able to understand the struct hack.

typedef struct foo_s {
char weird[];
} foo_t;


foo_t* const _this = malloc(foo_t * 10);

_this if not NULL is foo_t::wierd[10]?


I am not sure about this crap either? Please correct me!

:^0
 
K

Keith Thompson

James Antill said:
Well as the author of Ustr, I should probably respond...
But x.data as an expression, unless it appears as the operand of a unary
'sizeof' or '&' operator, is implicitly converted to a value of type
'unsigned char*', pointing to the first element of the array. If you're
willing to be unwarrantedly chummy with the implementation, you can use
this pointer to access memory beyond the bounds of the array itself,
taking advantage of the fact that most implementations don't do the
bounds checking that they're permitted to do (and that a compiler writer
would almost certainly be unwilling to break the "struct hack").

Right, the struct hack is _very_ well known IMNSHO ... I guess you could
argue that limiting the code to C99 and using:

struct Ustr
{
unsigned char info;
unsigned char data[];
};

...is "better" from a stds. POV, although it seemed more natural to me
to represent it as one unit.

I certainly wouldn't argue for using C99-specific features unless you
want to limit yourself to the few compilers that implement those
features.
Personally I'd say that "unwarrantedly chummy with the implementation"
is pretty harsh considering how likely the assumption is to fail.

Quoting question 2.6 of the comp.lang.c FAQ:

Despite its popularity, the technique is also somewhat notorious:
Dennis Ritchie has called it ``unwarranted chumminess with the C
implementation,'' and an official interpretation has deemed that
it is not strictly conforming with the C Standard, although it
does seem to work under all known implementations. (Compilers
which check array bounds carefully might issue warnings.)

So you can take it up with dmr. :cool:}

There's little doubt that attempting to access data beyond the
declared bounds of an array invokes undefined behavior. There's also
little doubt that most existing compilers will happily let you get
away with it -- thus the popularity of the struct hack.
Right, most people when they want that just go for "char *" as the
representation ... the main reason I didn't is that the I wanted to make
the compiler complain if you did:

Ustr *s1 = "abcd";

...if C had a "decent" form of typedef, I'd have just used that.

And using void* would have the same problem, due to implicit
conversions.

[snip]
 
J

James Antill

santosh said:
I am trying to understand how ustr (http://www.and.org/ustr/design)
works. Its a string-handling library for C, and it stores strings in
the
following struct, called a "magic struct"

struct Ustr
{
unsigned char data[1];
/* 0b_wxzy_nnoo =>
*
* w = allocated
* x = has size
* y = round up allocations (off == exact *
allocations)
* z = memory error
* nn = refn
* oo = lenn
*
* 0b_0000_0000 = "", const, no alloc fail, no mem * err, refn=0,
lenn=0 */
};

It seems to me that a struct Ustr will only occupy 1-byte of memory,
so
how can it contain even a simple pointer to char??? What's happening
here?

Search the Google Groups archive
for this group on the topic 'struct hack'

I've never been able to understand the struct hack. It looks like when
the array expression is converted to a pointer, that pointer will have
address of
the lowest addressable byte of the structure.

So why not just use the address operator on the structure instead?

struct Foo1
{
int foo1;
int foo2;
Bar blah[0];
};

struct Foo2
{
int foo1;
int foo2;
};

/* option 1 */
struct Foo1 *foo1 = malloc(sizeof(struct Foo1) + (sizeof(Bar) * 10));

foo->blah[9];

/* option 2 */
struct Foo2 *foo2 = malloc(sizeof(struct Foo2) + (sizeof(Bar) * 10));

*((Bar *)((char *)foo + sizeof(struct Foo)) + (sizeof(Bar) * 9));
 
C

Chris Thomasson

foo_t* const _this = malloc(foo_t * 10);

foo_t* const _this = malloc(foo_t * (sizeof(char) * 10));



?
 
C

Chris Thomasson

Chris Thomasson said:
foo_t* const _this = malloc(foo_t * (sizeof(char) * 10));

CRAP!

foo_t* const _this = malloc(foo_t + (sizeof(char) * 10));

? shitfire!
 
C

Chris Thomasson

Chris Thomasson said:
CRAP!

foo_t* const _this = malloc(foo_t + (sizeof(char) * 10));

:^0 holy crap.

foo_t* const _this = malloc(sizeof(*_thsi) + (sizeof(char) * 10));

WTF is going on with my crappy brain!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,586
Members
45,088
Latest member
JeremyMedl

Latest Threads

Top