#pragmas for portability

S

sandeep

One of the key issues I see with C is that unlike say Java, it is not
fully specified but leaves a lot of decisions up to implementations. In
other words it runs on an actual machine not in a virtual machine.

A great way to mitigate the problem would be to allow Standard C programs
to declare explicitly when they are making implementation-specific
assumptions. A good way to do this would be with compiler #pragma
declarations. Currently these are all defined differently by different
compilers, but why not have a suite of #pragmas defined for everyone to
use in the ISO Standard?

For example, a programmer could use
#pragma int32
....
#end pragma
to indicate that ... assumes that int is 32-bits. Then if it was compiled
on a system with 16-bit ints (or 64-bit ints or 13-bit ints...) then that
compiler would be able to deal with the code successfully. Perhaps it
could emulate a 32-bit system by replacing all ints in the code with a 32-
bit type. Or it could issue a meaningful diagnostic like "Can't compile
file.c: #pragma int32 in force and no 32-bit type available on this
architecture: please amend code".

There are many other #pragmas that might be useful like
#pragma utf16
#pragma signed_char
#pragma void*_can_represent_function_ptrs
#pragma ieee_floating_point
#pragma 8.3_filenames
etc.
 
B

Ben Pfaff

sandeep said:
A great way to mitigate the problem would be to allow Standard C programs
to declare explicitly when they are making implementation-specific
assumptions. A good way to do this would be with compiler #pragma
declarations. Currently these are all defined differently by different
compilers, but why not have a suite of #pragmas defined for everyone to
use in the ISO Standard?

There are already some standard pragmas:
#pragma STDC FP_CONTRACT on-off-switch
#pragma STDC FENV_ACCESS on-off-switch
#pragma STDC CX_LIMITED_RANGE on-off-switch
For example, a programmer could use
#pragma int32
...
#end pragma
to indicate that ... assumes that int is 32-bits. Then if it was compiled
on a system with 16-bit ints (or 64-bit ints or 13-bit ints...) then that
compiler would be able to deal with the code successfully. Perhaps it
could emulate a 32-bit system by replacing all ints in the code with a 32-
bit type. Or it could issue a meaningful diagnostic like "Can't compile
file.c: #pragma int32 in force and no 32-bit type available on this
architecture: please amend code".

Beyond the reasons that this is not going to be implemented
regardless, I think that there are practical reasons why this
wouldn't work. For example, it would require there to be
multiple versions of standard library functions: a new version
for every function that has an "int" parameter for each possible
size of "int".

But if you just want to make sure that your program is compiling
on a system that has, say, 2's complement 32-bit ints, you can
already do that:

#include <limits.h>
#include <stdint.h>

#if INT_MAX != INT32_MAX || INT_MIN != INT32_MIN
#error "This program requires 2's complement 32-bit ints"
#endif

(This is C99-specific, but your proposal would require
implementors to go *beyond* C99 anyhow.)
 
T

Thomas Jollans

#pragma int32

If you require a 32-bit signed integer, use the standard C type int32_t,
defined in said:
#pragma utf16

what's wrong with using wchar_t and the according std library functions?
#pragma signed_char

If you require a signed 8-bit integer, you can specify that by saying
"signed char" or "int8_t".
#pragma void*_can_represent_function_ptrs

what kind of compiler prevents nonsensical pointer casts?
#pragma ieee_floating_point

On a machine that doesn't have I.EEE floating point arithmetic, what
should this do?
#pragma 8.3_filenames

Erm, what?

If you want DOS, go hide in your cave and use DOS. A language like C,
with historical ties to UNIX but highly portable, is clearly in no way
for you.
 
K

Keith Thompson

sandeep said:
One of the key issues I see with C is that unlike say Java, it is not
fully specified but leaves a lot of decisions up to implementations. In
other words it runs on an actual machine not in a virtual machine.

A great way to mitigate the problem would be to allow Standard C programs
to declare explicitly when they are making implementation-specific
assumptions. A good way to do this would be with compiler #pragma
declarations. Currently these are all defined differently by different
compilers, but why not have a suite of #pragmas defined for everyone to
use in the ISO Standard?

For example, a programmer could use
#pragma int32
...
#end pragma
to indicate that ... assumes that int is 32-bits. Then if it was compiled
on a system with 16-bit ints (or 64-bit ints or 13-bit ints...) then that

(13-bit ints would be illegal.)
compiler would be able to deal with the code successfully. Perhaps it
could emulate a 32-bit system by replacing all ints in the code with a 32-
bit type. Or it could issue a meaningful diagnostic like "Can't compile
file.c: #pragma int32 in force and no 32-bit type available on this
architecture: please amend code".

You're talking about two very different things: forcing the compiler
to make int 32 bits (a bad idea IMHO) or causing it to reject your
program if int isn't 32 bits (the latter is a kind of compile-time
assertion).

If you want a 32-bit integer type, use int32_t, not int. Your code
shouldn't need to depend on one particular type, int, being 32 bits;
if it does, fix it.

If you want to have a compile-time assertion that int is 32 bits, you
can already use a combination of #if and #error directives (see
elsethread).
There are many other #pragmas that might be useful like
#pragma utf16

Not sure about this one.
#pragma signed_char

If you want a signed character type, use signed char. Using plain char
implies that you don't char whether it's signed or not.
#pragma void*_can_represent_function_ptrs

Code that assumes this is non-portable.
#pragma ieee_floating_point

An implementation can predefine __STDC_IEC_559__ if it supports
Annex F (IEC 60559 floating-point arithmetic).
#pragma 8.3_filenames

And what exactly would this mean? I know what 8.3 filenames are;
what would the #pragma do?
 
N

Nick

sandeep said:
There are many other #pragmas that might be useful like
#pragma 8.3_filenames

And would that mean that the system /has/ to support 8.3 filenames, or
that the code doesn't use anything other than 8.3 filenames or what.

Would fopen("INPUT","r"); be valid on a machine where INPUT wasn't a
file, but had been associated with one in a DD statement in the JCL that
called the C. If not, what pragma should I use for that?

You really still haven't grasped the range of things C can work very
successfully on.
 
R

robertwessel2

But if you just want to make sure that your program is compiling
on a system that has, say, 2's complement 32-bit ints, you can
already do that:

        #include <limits.h>
        #include <stdint.h>

        #if INT_MAX != INT32_MAX || INT_MIN != INT32_MIN
        #error "This program requires 2's complement 32-bit ints"
        #endif

(This is C99-specific, but your proposal would require
implementors to go *beyond* C99 anyhow.)


Since int32_t is already required to be two's complement, he could
just use that. Admittedly, in addition to being the exact width
specified, the (u)intNN_t types also require no padding, so this might
exclude a case the OP was interested in supporting: an implementation
that had exact width 32 bit two's complement ints, but *with* padding.
 
S

sandeep

Nick said:
And would that mean that the system /has/ to support 8.3 filenames, or
that the code doesn't use anything other than 8.3 filenames or what.

All that #pragma would say is that between the #pragma statement and the
corresponding #end pragma, the code would be entitled to assume that
filenames were in 8.3 format. For example, this code:

int convert_file(char *filename)
{
#pragma 8.3_filenames
char bak[13];
*strchr(filename, ".") = '\0';
strcpy(bak, filename);
strcat(bak, ".BAK");
#end pragma
file_copy(filename, bak);
....
}
Would fopen("INPUT","r"); be valid on a machine where INPUT wasn't a
file, but had been associated with one in a DD statement in the JCL that
called the C. If not, what pragma should I use for that?

You really still haven't grasped the range of things C can work very
successfully on.

Look I don't think anyone has understood what I was suggesting.

The problem I was addressing is this: there is Standard C, great. But
many programs make assumptions beyond what is guaranteed by Standard C:
for example they have to do this to implement network functions or read
directories etc. At the moment the only way for a program to record the
non-Standard/non-portable assumptions it is making is in external
documentation, either comments in the source code or README files with
the code etc. Now someone on a different system compiles their code,
maybe it doesn't compile or maybe it seems to compile fine, but there are
strange runtime errors. The reason is a non-portable assumption in the
original code, and the second person didn't read the documentation or it
wasn't even documented at all.

Isn't it better if Standard C provides a common mechanism for all
programs to record what non-Standard assumptions they are making? Then
instead of successful compilation and mysterious runtime errors, instead
the compiler can say

Warning: code operates under #pragma IEEE_floating_point but this is not
supported in hardware on this platform. Switching to software emulation
for floating point - possible performance loss

A completely clear and full explanation! Of course developers may choose
not to use the #pragmas, but having a standard way of documenting non-
portable assumptions can't be a bad thing surely.
 
E

Eric Sosman

Nick said:
And would that mean that the system /has/ to support 8.3 filenames, or
that the code doesn't use anything other than 8.3 filenames or what.

All that #pragma would say is that between the #pragma statement and the
corresponding #end pragma, the code would be entitled to assume that
filenames were in 8.3 format. For example, this code:

int convert_file(char *filename)
{
#pragma 8.3_filenames
char bak[13];
*strchr(filename, ".") = '\0';
strcpy(bak, filename);
strcat(bak, ".BAK");
#end pragma
file_copy(filename, bak);
...
}

What should happen if this code is compiled on a system that
does not limit file names to 8-dot-3? What should happen if this
code is compiled on a system able to mount multiple different
file systems, some that use only 8-dot-3 and some that use other
forms? What if some of the 8-dot-3 file systems insist on 8-dot
and others allow a bare 8 when there's no extension? What about
case sensitivity? In short, don't just exhibit one half-baked use
of one form of your #pragma, *define* it.
Isn't it better if Standard C provides a common mechanism for all
programs to record what non-Standard assumptions they are making? Then
instead of successful compilation and mysterious runtime errors, instead
the compiler can say

Warning: code operates under #pragma IEEE_floating_point but this is not
supported in hardware on this platform. Switching to software emulation
for floating point - possible performance loss

A completely clear and full explanation! Of course developers may choose
not to use the #pragmas, but having a standard way of documenting non-
portable assumptions can't be a bad thing surely.

As a practical matter, I think you'll find that the number of
non-portable assumptions that can be made is very large indeed.
As an exercise, propose #pragma's that cover just a few of the
characteristics of integer types: Widths, endiannesses, encoding
for negative values, behavior when right-shifting a negative,
behavior when left-shifting an opposite-sense sign bit, number and
arrangement of padding bits, alignment requirements, ... Think up
the necessary #pragma's, write a short description (one or two
paragraphs, say) of each, realize that when carefully nailed down
in full-fledged Standardese your descriptions will inflate by a
factor of four or five, and ponder the point that you have only
scratched the surface. (Hint: Study the POSIX sysconf() call and
the machinery that surrounds it; I think you'll agree that it covers
only a small part of what you suggest, and is already daunting.)
 
K

Keith Thompson

sandeep said:
All that #pragma would say is that between the #pragma statement and the
corresponding #end pragma, the code would be entitled to assume that
filenames were in 8.3 format.

Saying that "the code would be entitled to assume" doesn't really mean
anything.

Do you really mean that the *code* is entitled to make this assumption,
or is the *implementation* allowed to do so? And what *exactly* are the
consequences if the assumption is violated?

Usually when an implementation is allowed to assume something,
the consequence of violating the assumption is undefined behavior.
Permitting a program's behavior to be undefined if it uses a filename
like "ninechars.text" doesn't seem particularly useful. Typically
the benefit is that the assumption permits some optimization;
where is the benefit here?
For example, this code:

int convert_file(char *filename)
{
#pragma 8.3_filenames
char bak[13];
*strchr(filename, ".") = '\0';
strcpy(bak, filename);
strcat(bak, ".BAK");
#end pragma
file_copy(filename, bak);
...
}

For starters, syntactically "8.3_filenames" isn't a valid argument
to #pragma. A #pragma must be followed by an optional sequence of
preprocessing-tokens; you have ``8'', followed by ``.'', followed
by ``3_filenames'', which is not a valid preprocessing-token.
If you're going to propose a change to the language, you have to
think about that kind of detail.

You've shown us an example, but I still don't have a clue what this
#pragma is actually supposed to do. Is the behavior of convert_file
going to be different with your #pragma than without it? If so,
how? Please describe the difference in terms of how the function
behaves, or is permitted to behave, not in terms of what either
the implementation or the program may or may not assume.

How on Earth is the implementation or the program supposed to know
that either filename or bak is a file name? Or does your #pragma
affect all strings?

Incidentally, your example is broken. Once you change the second
argument to strchr from "." to '.' (please compile your code before
posting it), this:
char name[] = "hello.txt";
convert_file(name);
will invoke
file_copy("hello", "hello.BAK");
which I don't think is what you had in mind. Also, convert_file()
modifies the string to which its first parameter points, so the above
convert_file(name);
changes name from "hello.txt" to just "hello", and
convert_file("hello.txt");
invokes undefined behavior.

[...]
Look I don't think anyone has understood what I was suggesting.

I agree -- and I suggest that that includes you. We don't understand
what you're suggesting because you haven't described it clearly
enough. I speculate that you haven't described it clearly because
you don't have a clear or consistent idea of just what your proposal
really is.
The problem I was addressing is this: there is Standard C, great. But
many programs make assumptions beyond what is guaranteed by Standard C:
for example they have to do this to implement network functions or read
directories etc. At the moment the only way for a program to record the
non-Standard/non-portable assumptions it is making is in external
documentation, either comments in the source code or README files with
the code etc. Now someone on a different system compiles their code,
maybe it doesn't compile or maybe it seems to compile fine, but there are
strange runtime errors. The reason is a non-portable assumption in the
original code, and the second person didn't read the documentation or it
wasn't even documented at all.

Network functions and directory operations are generally covered by
secondary standard such as POSIX. If your program has, for example,
#include <sys/types.h>
#include <sys/socket.h>
#include <dirent.h>
then it won't compile on a system that doesn't provide those headers.

There may be better examples of what you're talking about. I suggest
you think of some.
Isn't it better if Standard C provides a common mechanism for all
programs to record what non-Standard assumptions they are making? Then
instead of successful compilation and mysterious runtime errors, instead
the compiler can say

Warning: code operates under #pragma IEEE_floating_point but this is not
supported in hardware on this platform. Switching to software emulation
for floating point - possible performance loss

Ah, so your #pragma IEEE_floating_point indicates that IEEE floating
point is supported *in hardware*, and you're assuming that if it's
not supported in hardware then it must be supported in software.
I don't think you mentioned that before.

The C standard doesn't generally concern itself with performance
issues. On some systems, *some* FP operations might be supported in
hardware, and others in software -- and some software implementations
might be faster than some hardware implementations. Other systems
might use entirely different floating-point representations and
not support IEEE at all. Others might use the IEEE floating-point
format, but not support all the semantics. And so forth. How many
kinds of #pragma would it take to cover all those possibilities,
plus the ones I haven't thought of?

I think I already mentioned the optional __STDC_IEC_559__ predefined
macro. Think about how it relates to what you're trying to do.
(I can't help you with that, since I don't know what you're trying
to do.)
A completely clear and full explanation! Of course developers may choose
not to use the #pragmas, but having a standard way of documenting non-
portable assumptions can't be a bad thing surely.

It might not be a bad thing if it could be defined properly.
 
S

sandeep

Keith said:
Saying that "the code would be entitled to assume" doesn't really mean
anything.

Do you really mean that the *code* is entitled to make this assumption,
or is the *implementation* allowed to do so? And what *exactly* are the
consequences if the assumption is violated?

Usually when an implementation is allowed to assume something, the
consequence of violating the assumption is undefined behavior.

OK, I'm not an expert on Standardese wording, but here is a try.

Define: a "hanging statement" is a statement involving the terms defined
by the ISO C Standard, such that there are (in principle) conforming C
implementations where that statement is true and also conforming C
implementations where it is false.

Examples of hanging statements:
* sizeof(int) == 2.
* examining the bytes of an int via an unsigned char pointer reveals that
the int is stored in 2s complement big endian format with no padding.
* any function pointer can be typecast to void* and back without loss.

Intuitively there should be a close correspondence between hanging
statements, and assumptions that can invoke an undefined/implementation
defined behavior.

Now define: a "pragma" is an identifier signifying a hanging statement. Eg
the three examples above might be called

* #pragma int16
* #pragma bigendian
* #pragma function_ptr_castable

Now define: a pragma P is "in force" at a point of code if that code is
contained within a #pragma P ... #end pragma block.

When a compiler is compiling code with various pragmas in force it must:
* either reject the code, or
* compile the code according to the ISO Standard PLUS the hanging
statements resolved as specified in each pragma. It can issue an optional
diagnostic if it wants: for example, this would be good QOI if the only
way to satisfy the pragma was with software emulation that would probably
be much slower than a hardware implementation.
Permitting a program's behavior to be undefined if it uses a filename
like "ninechars.text" doesn't seem particularly useful. Typically the
benefit is that the assumption permits some optimization; where is the
benefit here?

Yes, it was just an example of the concept. The idea would be to provide
a Standard way of treating essential non-portable parts of the code. A
bit like what Autoconf tries to do, except right within C itself.
Ah, so your #pragma IEEE_floating_point indicates that IEEE floating
point is supported *in hardware*, and you're assuming that if it's not
supported in hardware then it must be supported in software. I don't
think you mentioned that before.

No! - see above.
 
E

Eric Sosman

OK, I'm not an expert on Standardese wording, but here is a try.

Define: a "hanging statement" is a statement involving the terms defined
by the ISO C Standard, such that there are (in principle) conforming C
implementations where that statement is true and also conforming C
implementations where it is false.

Examples of hanging statements:
* sizeof(int) == 2.
* examining the bytes of an int via an unsigned char pointer reveals that
the int is stored in 2s complement big endian format with no padding.
* any function pointer can be typecast to void* and back without loss.

More examples:

* Converting a value to a signed integer type for which the
value is out of range raises the SIGBAD signal
* Converting a value to a signed integer type for which the
value is out of range yields the most negative or most
positive value of the destination type, depending on the
sign of the original value
* Converting a value to a signed integer type for which the
value is out of range yields a pseudo-random value
* Converting a value to a signed integer type for which the
value is out of range yields forty-two
* ...

That is, the number of hanging statements is very large, and you
will have great difficulty enumerating them and giving each a pragma.
Intuitively there should be a close correspondence between hanging
statements, and assumptions that can invoke an undefined/implementation
defined behavior.

Now define: a "pragma" is an identifier signifying a hanging statement. Eg
the three examples above might be called

* #pragma int16

Wouldn't "int2" be a better name for the condition you've stated?
* #pragma bigendian

Would this apply only to int, or would it also affect long, short,
long long, ...? (On the PDP-11, int was little-endian but long was
not.)
* #pragma function_ptr_castable

Now define: a pragma P is "in force" at a point of code if that code is
contained within a #pragma P ... #end pragma block.

Note that this notion of a "pragma block" is a new one. You
could (I suppose) invent a "#pragma revert bigendian" or some such
thing to simulate it, but the scope of a #pragma is somewhat uncertain.
 
K

Keith Thompson

sandeep said:
OK, I'm not an expert on Standardese wording, but here is a try.

Define: a "hanging statement" is a statement involving the terms defined
by the ISO C Standard, such that there are (in principle) conforming C
implementations where that statement is true and also conforming C
implementations where it is false.

Examples of hanging statements:
* sizeof(int) == 2.
* examining the bytes of an int via an unsigned char pointer reveals that
the int is stored in 2s complement big endian format with no padding.
* any function pointer can be typecast to void* and back without loss.

Intuitively there should be a close correspondence between hanging
statements, and assumptions that can invoke an undefined/implementation
defined behavior.

Now define: a "pragma" is an identifier signifying a hanging statement. Eg
the three examples above might be called

* #pragma int16
* #pragma bigendian
* #pragma function_ptr_castable

Now define: a pragma P is "in force" at a point of code if that code is
contained within a #pragma P ... #end pragma block.

When a compiler is compiling code with various pragmas in force it must:
* either reject the code, or
* compile the code according to the ISO Standard PLUS the hanging
statements resolved as specified in each pragma. It can issue an optional
diagnostic if it wants: for example, this would be good QOI if the only
way to satisfy the pragma was with software emulation that would probably
be much slower than a hardware implementation.

So your proposed #pragma is a compile-time assertion, where the
implementation is encouraged but not required to take some action
to make the asserted condition true.
Yes, it was just an example of the concept. The idea would be to provide
a Standard way of treating essential non-portable parts of the code. A
bit like what Autoconf tries to do, except right within C itself.

I think that "#pragma 8.3_filenames" was a very poor example.
I still haven't a clue what actual effect it would have. I suggest
we ignore that example.
No! - see above.

Ok, so it would mean one of the following things:

If the implementation already supports IEEE floating-point, it would
have no effect.

If the implementation doesn't support IEEE floating-point by default,
but is able to do so, it would enable it by some unspecified means.

If the implementation doesn't support IEEE floating-point at all, it
would reject the translation unit. Note that the only case where a
translation unit *must* be rejected is the #error directive; this would
be a second such case.

In practice, most implementations either support IEEE FP by default
or don't support it at all. For such implementations, this:

#ifndef __STDC_IEC_559__
#error "IEEE floating-point is not supported"
#endif

will already do the job.

The language currently defines 3 standard #pragmas (see C99 6.10.6):

#pragma STDC FP_CONTRACT on-off-switch
#pragma STDC FENV_ACCESS on-off-switch
#pragma STDC CX_LIMITED_RANGE on-off-switch

I suggest that, if your proposal were to be added to the standard,
a better syntax would be:

#pragma STDC ASSERT identifier

where the identifier is defined either by the standard or by the
implementation.

In my opinion, the set of possible conditions that could be tested is so
vast that standardizing them is impractical. See Eric's response for
some examples.
 
S

sandeep

Keith said:
Ok, so it would mean one of the following things:

If the implementation already supports IEEE floating-point, it would
have no effect.

If the implementation doesn't support IEEE floating-point by default,
but is able to do so, it would enable it by some unspecified means.

If the implementation doesn't support IEEE floating-point at all, it
would reject the translation unit. Note that the only case where a
translation unit *must* be rejected is the #error directive; this would
be a second such case.

I think that should be the effect of it - yes.
In practice, most implementations either support IEEE FP by default or
don't support it at all. For such implementations, this:

#ifndef __STDC_IEC_559__
#error "IEEE floating-point is not supported" #endif

will already do the job.

However, not all non-portable features have macro definitions!

For example, in a
#pragma bigendian
section of code, a high-QOI compiler on a littleendian platform could
insert htonl() and similar functions automatically at each access of a 16-
bit or wider type. This has the potential to simplify lots of network
code - moving responsibility for dealing with non-portablility out of the
user code and into the compiler.
I suggest that, if your proposal were to be added to the standard, a
better syntax would be:

#pragma STDC ASSERT identifier

where the identifier is defined either by the standard or by the
implementation.

This seems like an ugly syntax, but if it conforms better with existing
pragmas then maybe you're right and it would be best. Thanks.
In my opinion, the set of possible conditions that could be tested is so
vast that standardizing them is impractical. See Eric's response for
some examples.

I see what point you and Mr Sosman are making. However, there is a
saying... "The perfect is the enemy of the good". And there is a simple
way to get a list of the most common portability concerns in current C
code - download a large repository of software using the GNU autotools (eg
any Linux distribution) and then search the configure scripts of all the
software to find which autoconf macros are invoked the most. Of course
this could evolve over time as the ISO Standard evolves too.
 
E

Eric Sosman

I think that should be the effect of it - yes.


However, not all non-portable features have macro definitions!

For example, in a
#pragma bigendian
section of code, a high-QOI compiler on a littleendian platform could
insert htonl() and similar functions automatically at each access of a 16-
bit or wider type. This has the potential to simplify lots of network
code - moving responsibility for dealing with non-portablility out of the
user code and into the compiler.

How do you move values between the bigendian region and the
unknownendian native environment? From the way you describe the
effect of the #pragma, I don't see how it's possible.
I see what point you and Mr Sosman are making. However, there is a
saying... "The perfect is the enemy of the good".

So are the not good enough and the half-baked.
And there is a simple
way to get a list of the most common portability concerns in current C
code - download a large repository of software using the GNU autotools (eg
any Linux distribution) and then search the configure scripts of all the
software to find which autoconf macros are invoked the most. Of course
this could evolve over time as the ISO Standard evolves too.

Note that this approach completely ignores several platforms,
including one platform family that is fairly widely used. As long as
you don't care about portability to the platform that dominates the
desktop universe and much of the server universe, too, you're golden.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top