Semi-circular definitions (plus circular references)

K

Kiuhnm

Is there an elegant way to deal with semi-circular definitions?

Semi-circular definition:
A { B };
B { *A };
Circular reference:
A { *B };
B { *A };

The problems arise when there are more semi-circular definitions and
circular references.

Consider the following sample code:

--- inc1.h ---
#ifndef INC1_INCLUDE
#define INC1_INCLUDE

typedef struct _struc2
{
struct _struc1 *s1;
} struc2;


typedef struct _struc3
{
struct _struc1 *s1;
} struc3;

typedef struct _struc1
{
struct _struc2 s2;
struct _struc3 s3;
} struc1;

typedef struct _struc4
{
struct _struc1 s1;
} struc4;

#endif

--- test.cpp ---

#include "inc1.h"

int main()
{
struc1 s1;
s1.s2;

return 1;
}

------

The previous code is obviously correct.
Now I want to split the include file in two separated files.
The original include file contains 4 definitions:
A
B
C
D
The problem arises when the splitting is such that the original include
file cannot be obtained by reattaching the 2 include files.
For Example, inc1.h could contain A and D, and inc2.h B and C.

--- inc1.h ---
#ifndef INC1_INCLUDE
#define INC1_INCLUDE

#include "inc2.h"

typedef struct _struc2
{
struct _struc1 *s1;
} struc2;

typedef struct _struc4
{
struct _struc1 s1;
} struc4;

#endif

--- inc2.h ---
#ifndef INC2_INCLUDE
#define INC2_INCLUDE

#include "inc1.h"

typedef struct _struc3
{
struct _struc1 *s1;
} struc3;

typedef struct _struc1
{
struct _struc2 s2;
struct _struc3 s3;
} struc1;

#endif

-------

Someone affirms that this situation is index of bad organization, but I
disagree, then I do not seek an obvious workaround, but a solution to
the *original* problem.
Yes, I could use a pointer instead of the struct itself, but I do not
want to use an unneeded indirection.
I could even use a dummy structure but that solution is very inelegant.

Is there a particular directive which instructs the compiler to analyze
the entire file before giving up?

Is there a better solution than using a bunch of #if..[...]..#endif? :)

Thanks, Kiuhnm
 
J

Jonathan Bartlett

Kiuhnm said:
Is there an elegant way to deal with semi-circular definitions?

You just need to restructure it a bit. Here's my solution which seems
to work:

inc1.h:
---------

#ifndef INC1_INCLUDE
#define INC1_INCLUDE




typedef struct _struc2
{
struct _struc1 *s1;
} struc2;


#include "inc2.h"


typedef struct _struc4
{
struct _struc1 s1;
} struc4;


#endif

----------------


inc2.h:
----------------#include "inc1.h"


#ifndef INC2_INCLUDE
#define INC2_INCLUDE




typedef struct _struc3
{
struct _struc1 *s1;
} struc3;




typedef struct _struc1
{
struct _struc2 s2;
struct _struc3 s3;
} struc1;






#endif
 
K

Kiuhnm

Jonathan said:
You just need to restructure it a bit. Here's my solution which seems
to work:
[cut]

You recomposed the original file!
Is it always possible?
No, unfortunately.

--- inc1.h ---
s1 {s3}
s2 {s4}
--- inc2.h ---
s3 {s2}
s4 {}
--------------

Let "a < b" mean "a must appear before b".

s3 < s1,
s4 < s2
s2 < s3

=>

s4 < s2 < s3 < s1

then that's the only correct arrangement.

By partially reordering the structures in each include file, we obtain:
--- inc1.h ---
s2 {s4}
s1 {s3}
--- inc2.h ---
s4 {}
s3 {s2}
--------------

s2 s1
s4 s3

How can we obtain that arrangement without splitting any file?
Why are the compilers so stupid?
The compiler could simply perform a first pass and compile a table of
the sizes of all the structures and then use that table in the
successive passes.

Perhaps the documentation of the preprocessor could help...

Thank you anyway!

Kiuhnm
 
D

Dave Rahardja

Kiuhnm said:
You recomposed the original file!
Is it always possible?
No, unfortunately.

Yes, it is *always* possible to rearrange the way data structures are
declared so that entities are declared before they are needed. It may
entail declaring each data structure in its own header file, but it is
always possible.

Consider what happens when a truly circular dependency is declared:

struct A { B b };
struct B ( A a };

This implies that B includes a copy of A, which includes a copy of B,
which includes a copy of A, ad infinitum. This struct has infinite size
and is not possible to instantiate.

In other words, in order for a data structure to have finite size, its
composition graph must have no loops. Thus, all finite data structures
can be described by a sequence of declarations with no "forward dependency".

Sometimes it is not possible to properly declare data structures if
multiple data structures are declared within one or more header files.
Your problem in this case in not with the language definition or the
compiler, but with the way that your modules are organized.

-dr
 
K

Kiuhnm

Dave said:
Yes, it is *always* possible to rearrange the way data structures are
declared so that entities are declared before they are needed. It may
entail declaring each data structure in its own header file, but it is
always possible.

I've just *proved* that (with the restrictions I imposed) it is impossible.
Sometimes it is not possible to properly declare data structures if
multiple data structures are declared within one or more header files.
Your problem in this case in not with the language definition or the
compiler, but with the way that your modules are organized.

No, the problem resides in the c/c++ compilers/language.
If something could have been done better, then it should have been done
better.
A smart compiler would scan the entire preprocessed file and build a
graph with the dependencies in order to quickly determine the structure
sizes.
I can understand that you consider the compiler something untouchable,
but, IMHO, a compiler is a tool, not a religion.

Do you know C#? In this language, the forward definitions are handled
seamlessly.
I think I'll pass to C#... or maybe I could write a little preprocessor
for the C++ compiler (portability is not an issue to me).

Kiuhnm
 
J

Jonathan Mcdougall

Kiuhnm said:
I've just *proved* that (with the restrictions I imposed) it is impossible.

If you cannot change the headers, C++ cannot do
anything for you.
No, the problem resides in the c/c++ compilers/language.

It is because of the language, but it is not a
problem, it is called "type checking". To use a
name, you need to declare it and to instanciate a
class, you need its definition.
If something could have been done better, then it should have been done
better.
A smart compiler would scan the entire preprocessed file and build a
graph with the dependencies in order to quickly determine the structure
sizes.

If you think compilers are dumb, write one.
I can understand that you consider the compiler something untouchable,
but, IMHO, a compiler is a tool, not a religion.
Do you know C#?

No one is forcing you to use C++. If a language
cannot do what you want, change it! Languages are
tools, not religions.
In this language, the forward definitions are handled
seamlessly.

If I recall correctly, C# only deals with
references, not with stack-based objects. You
never need the size of the classes in this case.
It's like replacing automatic objects with
pointers in your example, which solves your
problem. Before falling in love with this, read
about the problems (and advantages) associated
with this behavior.


Jonathan
 
K

Kiuhnm

Jonathan said:
If you cannot change the headers, C++ cannot do anything for you.

The headers are thematically divided, then I don't want to move a
definition from a file to another.
It is because of the language, but it is not a problem, it is called
"type checking". To use a name, you need to declare it and to
instanciate a class, you need its definition.

I know, but you omitted a detail: C++ uses a "chronological type
checking". The mathematical language is ABSOLUTELY UNAMBIGUOUS and
doesn't have such a limitation.
Besides, why should the programmer tolerate such a restriction when a
simple algorithm could completely resolve the problem?
If you think compilers are dumb, write one.

Very humorous, but I prefer to write a preprocessor.
If I recall correctly, C# only deals with references, not with
stack-based objects. You never need the size of the classes in this
case. It's like replacing automatic objects with pointers in your
example, which solves your problem. Before falling in love with this,
read about the problems (and advantages) associated with this behavior.

Then C# is not a language for me (I'm developing a real-time OS).

Kiuhnm
 
J

Jonathan Mcdougall

Kiuhnm said:
The headers are thematically divided, then I don't want to move a
definition from a file to another.

There are situations where you must sacrifice a bit of design in order to
make the program work. It seems to be your case, but you only can tell.
I know, but you omitted a detail: C++ uses a "chronological type
checking". The mathematical language is ABSOLUTELY UNAMBIGUOUS and doesn't
have such a limitation.

I don't understand that.
Besides, why should the programmer tolerate such a restriction when a
simple algorithm could completely resolve the problem?

Because that's how the language works.

A a;

class A
{
};

does not work, not because the compiler cannot make it work (postponing the
checking of names until the translation unit is completly processed would be
a solution to this example but has many drawbacks for others, more
complicated constructs), but because the language mandates it.


Jonathan
 
K

Kiuhnm

Jonathan said:
There are situations where you must sacrifice a bit of design in order to
make the program work. It seems to be your case, but you only can tell.

I moved the most referenced structure to a separated file.
I don't understand that.

|u+v| <= |u|+|v| for each u,v in C(T, R).

u and v are used before the definition.

Another example:
for each x > 0, there exists K in N : d(u_k,u_h) < eps, for each k,h in
N, h,k > K.
A a;

class A
{
};

does not work, not because the compiler cannot make it work (postponing the
checking of names until the translation unit is completly processed would be
a solution to this example but has many drawbacks for others, more
complicated constructs), but because the language mandates it.

The languages C and C++ are a bit incoherent: it is possible to define a
pointer to an undefined object, because you will need the object
definition only when you will use the pointer, but it is impossible to
define an object which contains an undefined object, even if the object
definition will not be used before the first instantiation.
You call that a feature? IMHO, that is a flaw.

Kiuhnm
 
J

Jonathan Mcdougall

Kiuhnm said:
The languages C and C++ are a bit incoherent: it is possible to define a
pointer to an undefined object,

Undefined, but declared.
because you will need the object definition only when you will use the
pointer,

Not when you use it, only when you need the definition of the class.

class A;

void f(A *a);

int main()
{
A *a=0;
f(a);
}

This example does not need the definition of A, only its declaration.
but it is impossible to define an object which contains an undefined
object,

Of course, because its size is needed. If you have a pointer or reference,
you don't need the class' definition to define one.
You call that a feature? IMHO, that is a flaw.

Requiring the class's definition for everything would not only be
unnecessary but also a pain. It is the base for a common technique of
reducing compilation dependencies.

I wouldn't say C++ does a good job at separating modules, but I think the
language *is* consistent. When the compiler does not need the class's
definition to operate, you don't have to specify it.


Jonathan
 
D

Dave Rahardja

Kiuhnm said:
|u+v| <= |u|+|v| for each u,v in C(T, R).

u and v are used before the definition.

Another example:
for each x > 0, there exists K in N : d(u_k,u_h) < eps, for each k,h in
N, h,k > K.

There is a need for a balance between mathematical purity,
comprehensibility, economy of compile time, and ease of implementation
that one needs to consider while writing a language. Allowing the "use
before definition" paradigm would necessitate a multiple-pass
compilation (beyond the preprocessor stage), increase compile times, and
would lead to confusing paradoxes like this:

class A { B b; };
class B { A a; };

Which the compiler must detect.

The benefits of allowing such behavior is dubious--certainly nothing
that can't be achieved by the language as it stands today.
The languages C and C++ are a bit incoherent: it is possible to define a
pointer to an undefined object, because you will need the object
definition only when you will use the pointer, but it is impossible to
define an object which contains an undefined object, even if the object
definition will not be used before the first instantiation.
You call that a feature? IMHO, that is a flaw.

In C and C++, you can define a pointer (or a reference) to an undefined
(but declared) entity because the size of a pointer is known. You cannot
use an undefined data structure as a member of another because the
compiler needs to know how much space to allocate for the data storage.

That doesn't make the language "incoherent". The rules of the language
are coherent and sensible once you understand the needed compromise.

The definition-before-use requirement of C++ doesn't give me any
headaches, and I've worked on embedded applications that have upward of
100 compilation units. I'm afraid you'd either have to change the way
you analyze your problem, or look for another language that more closely
matches your expectations.

My advice is not to get too hung up on a peculiarity of the language and
try to work around any limitations that you may find. Sure beats halting
your development schedule.
 
K

Kiuhnm

Jonathan said:
class A;

void f(A *a);

int main()
{
A *a=0;
f(a);
}

This example does not need the definition of A, only its declaration.

I meant when you use the pointer to access the members of the pointed
object. My mistake.
Of course, because its size is needed. If you have a pointer or reference,
you don't need the class' definition to define one.

When is it needed? A definition does not require any allocation. If I
instantiate the object after all the definitions, there should be no
problem.
I wouldn't say C++ does a good job at separating modules, but I think the
language *is* consistent. When the compiler does not need the class's
definition to operate, you don't have to specify it.

But the compiler requires the definitions when these are not
intrinsically required. A definition is a model, not a physical object
and thus does not require space.

Kiuhnm
 
K

Kiuhnm

Dave Rahardja wrote:
[cut]
compilation (beyond the preprocessor stage), increase compile times, and
would lead to confusing paradoxes like this:

class A { B b; };
class B { A a; };

Which the compiler must detect.

There are many methods to minimize the increase in compilation time. The
compiler could use a graph only when needed or in the presence of a hint
(e.g. a special forward declaration).
In C and C++, you can define a pointer (or a reference) to an undefined
(but declared) entity because the size of a pointer is known. You cannot
use an undefined data structure as a member of another because the
compiler needs to know how much space to allocate for the data storage.

This is not completely exact: a definition does not require space.
And if all the definitions appear before the first instantiation...
My advice is not to get too hung up on a peculiarity of the language and
try to work around any limitations that you may find. Sure beats halting
your development schedule.

Don't worry. The major difficulty is devising the algorithms! The last
took me 6 months!!!

Kiuhnm
 
J

Jonathan Mcdougall

Kiuhnm said:
When is it needed? A definition does not require any allocation. If I
instantiate the object after all the definitions, there should be no
problem.

I don't understand that. A class definition

class A
{
};

does not require space in itself, but it is required to instantiate objects
because the compiler needs the size of the class.

class A;

int main()
{
A *a = new A; // how many bytes?

delete a; // ditto
}

Since all pointers are of the same size (excluding member function
pointers), you only need to declare the name you are using.

class A;

A *p;

That's ok, because the compiler knows the size of 'A*' (typically 4 bytes),
but

class A;
A a;

is not not, because the compiler needs the size of A, which depends on its
members.
But the compiler requires the definitions when these are not intrinsically
required. A definition is a model, not a physical object and thus does not
require space.

The compiler needs the space for the object, not for the class.


Jonathan
 
J

Jonathan Mcdougall

Kiuhnm said:
Dave Rahardja wrote:
[cut]
compilation (beyond the preprocessor stage), increase compile times, and
would lead to confusing paradoxes like this:

class A { B b; };
class B { A a; };

Which the compiler must detect.

There are many methods to minimize the increase in compilation time. The
compiler could use a graph only when needed or in the presence of a hint
(e.g. a special forward declaration).

The language forbids that and the compiler would be non compliant.,
This is not completely exact: a definition does not require space.

You understanding of classes and objects seems to be flawed. A class does
not take memory, only objects do.

class A
{
}; // no memory

int main()
{
A a; // 1 byte of memory
}
And if all the definitions appear before the first instantiation...

What?


Jonathan
 
K

Kiuhnm

Jonathan said:
You understanding of classes and objects seems to be flawed. A class does
not take memory, only objects do.

I prefer to distinguish between object definitions and object instances.
"Class" is a specific term, while "object" is more generic, then I
usually use the terms "object" and "instance" (for example in assembly
or in other languages).

Let's use the common terminology.

In the previous 100 posts, I simply said that, since a definition does
not take memory, the following code should compile:

struct s1
{
struct s2 s; (*)
};

struct s2
{
int ok;
};

int main()
{
s1 s; (**)
s.s.ok = 0;

return 1;
}

The definition of s2 is required only in (**) and not before, then any
error in (*) is unjustified.
Since s2 has been defined before (**), the code in (**) can be generated
without problems.

struct s1
{
struct s2 *ptr; (*)
};

int main()
{
s1 s;
s.ptr = 0;
s.ptr++; (**)

return 1;
}

It obviously does not compile, but where is the error? It is in (**),
not in (*)! If we delete the line (**), the code compiles.
The compiler shows off two different behaviours: in the first case it
generates an error "a priori", and, in the latter, only when it actually
cannot generate the code.
Why?
It seems that this exception to the rule was expressly introduced in
order to permit circular referencing between two different objects.

If you cannot see any incoherence in that, AMEN.
I am tired of this thread, and my bad English does not help!

Kiuhnm
 
J

Jonathan Mcdougall

Kiuhnm said:
I prefer to distinguish between object definitions and object instances.
"Class" is a specific term, while "object" is more generic, then I usually
use the terms "object" and "instance" (for example in assembly or in other
languages).

That is not the common use of these terms and this is not how they are used
in the C++ standard. A class is a "user-defined type" which takes no memory
nor has any runtime impacts. It could be considered as syntactic sugar.

An object or an instance of a class (usually) has a physical address in
memory and takes at least one byte. You should try to use the same
nomenclature as everybody to make these discussion simpler.
Let's use the common terminology.

Which one?
In the previous 100 posts, I simply said that, since a definition does not
take memory, the following code should compile:

And I said to you many times that it is not how the language works.
struct s1
{
struct s2 s; (*)

Illegal, since s2 has not been defined. A forward declaration such as

class s2;

would still be illegal. The compiler needs the full class definition to
determine the exact size of s1.
};

struct s2
{
int ok;
};

Of course, since int is an intrinsic type. The compiler knows its size.
int main()
{
s1 s; (**)

Illegal since s1 has not been correctly defined.
s.s.ok = 0;

Illegal since s has not been correctly defined.
return 1;
}

The definition of s2 is required only in (**) and not before, then any
error in (*) is unjustified.

Wrong. The definition of s2 is required each time an object is allocated,
whether on the stack or on the heap. The class s1 allocates such object. It
does not matter whether an object of type s1 is actually instanciated in
your code. This is how the language works. If you compile this exact code

class C
{
D d;
};

you will get an error because D has not been defined. If you append its
definition, or put it in a different file or whatever, the code still won't
compile. The definition of D must appear before the definition of C.
Since s2 has been defined before (**), the code in (**) can be generated
without problems.

Again, this is not how the language works. If you think this is a defect,
fill a report.
struct s1
{
struct s2 *ptr; (*)
};

That's different because the object is not a s2 anymore, it is a s2*, that
is, a pointer. Pointers are always the same size on an implementation (and
that size is implementation defined). Pointers to member functions may be
of different size, but work the same way.
int main()
{
s1 s;
s.ptr = 0;
Ok.

s.ptr++; (**)

That's an error, because for the compiler to calculate the address, it must
have the size of the class s2. Take this:

int *pi = ...;
++pi;

On an implementation where ints are 4 bytes, pi is moved forward by 4 bytes.

s2 *p = ...;
++p;

If your class s2 has 30 bytes of size, p is moved forward by 30 bytes. How
would the compiler know how many bytes to move the pointer if it does not
have access to the class' definition?
return 1;
}

It obviously does not compile, but where is the error? It is in (**), not
in (*)! If we delete the line (**), the code compiles.
The compiler shows off two different behaviours: in the first case it
generates an error "a priori", and, in the latter, only when it actually
cannot generate the code.
Why?

Your reasoning is wrong. Every time the compiler needs a size, you must
provide it. In case of builtin types, the compiler knows them already (and
that includes int, char, etc. and pointers). In case of user defined types,
*you* must provide their size by showing the class' definition. From that
point, the language mandates that this definition must come before the first
use, which you fail to do in your examples.

It may be, in your examples, that the compiler could process the whole
translation unit, building whatever table you want. But your examples are
very simple cases.

The language is very consistent because each time a name is used, its
definition must be provided.
It seems that this exception to the rule was expressly introduced in order
to permit circular referencing between two different objects.

What exception?
If you cannot see any incoherence in that, AMEN.
I am tired of this thread, and my bad English does not help!

Fine.


Jonathan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,733
Messages
2,569,439
Members
44,829
Latest member
PIXThurman

Latest Threads

Top