How often do you have to work around implementations?

Seebs · Apr 15, 2010

Obviously, in general, you ought to write clean, portable, code.

Something that's bitten me a few times recently is cases in which
implementations were buggy -- rarely, to be fair, in the core C language,
but "standard" system extensions like POSIX conformance.

Does this happen to other people? (The recent example of MSVC++ having a
buggy preprocessor is presumably one example.) What do you do about it?
Do you use #ifdefs? Do you include both the correct code and the code
which works on a particular target? If you have to support multiple
targets, only some of which are broken in a given way, do you try to
handle determination by testing things in your code, or outside the code
in a build system?

The example that recently came up involves a UNIX extension. While the
details are unportable, the underlying issue is something you could get
wrong in any environment.

There's a function, which takes function pointers as arguments. (For the
UNIX weenies: scandir().) One of the function pointers is declared
differently on different machines I have access to. So in essence, I
have one system which declares:

extern int foo(int (*compare)(struct foomagic **a, struct foomagic **b));

and another which declares

extern int foo(int (*compare)(void *a, void *b));

Imagine that you needed to interact with this function, across these two
systems. How would you do it? Assume for the sake of argument that you
can't compel the vendor to fix the broken implementation, and that "we
don't support that" is not one of your options.

-s

Ben Pfaff · Apr 15, 2010

Seebs said:
Something that's bitten me a few times recently is cases in which
implementations were buggy -- rarely, to be fair, in the core C language,
but "standard" system extensions like POSIX conformance.

Does this happen to other people? (The recent example of MSVC++ having a
buggy preprocessor is presumably one example.)

Sure. I've run into a few GCC and glibc bugs over the years, for
example. Usually I submit bug reports, and usually the bugs get
fixed, but in the meantime I have to work around it somehow.

What do you do about it? Do you use #ifdefs? Do you include
both the correct code and the code which works on a particular
target? If you have to support multiple targets, only some of
which are broken in a given way, do you try to handle
determination by testing things in your code, or outside the
code in a build system?

These days, I usually contribute a fix to the "gnulib" library,
which is specifically designed to work around bug and missing
features on Unix-like platforms.

There's a function, which takes function pointers as arguments. (For the
UNIX weenies: scandir().) One of the function pointers is declared
differently on different machines I have access to. So in essence, I
have one system which declares:

extern int foo(int (*compare)(struct foomagic **a, struct foomagic **b));

and another which declares

extern int foo(int (*compare)(void *a, void *b));

This is the sort of problem that gnulib commonly works around.
It doesn't have a fix for this particular issue. It's even
documented not to fix this problem. From
http://www.gnu.org/software/gnulib/manual/html_node/scandir.html:

Portability problems fixed by Gnulib:

* This function is missing on some platforms: Solaris 9,
mingw, BeOS.

Portability problems not fixed by Gnulib:

* The fourth parameter of this function is declared as
int (*) (const void *, const void *) on some platforms:
glibc 2.3.6, MacOS X 10.3, FreeBSD 6.0, NetBSD 3.0,
OpenBSD 3.8, Interix 3.5.

* The fourth parameter of this function is declared as
int (*) (void *, void *) on some platforms: AIX 5.1.

Seebs · Apr 16, 2010

These days, I usually contribute a fix to the "gnulib" library,
which is specifically designed to work around bug and missing
features on Unix-like platforms.

Good point!

I hadn't thought of that, but yes, that's the sort of thing gnulib tends to
cover... But in this case, omits.

In my case, since I'm just doing wrappers, it's enough to just omit
the argument types from the function pointer declaration.

-s

Ersek, Laszlo · Apr 16, 2010

Obviously, in general, you ought to write clean, portable, code.

Something that's bitten me a few times recently is cases in which
implementations were buggy -- rarely, to be fair, in the core C
language, but "standard" system extensions like POSIX conformance.

Does this happen to other people? (The recent example of MSVC++ having
a buggy preprocessor is presumably one example.)

(You'll regret this question.) Yes. I seem to remember the following cases
(all free software (C) by me):

1)

/*
I know about the "%N$*M$lu" conversion specification, but the Tru64 system
I tested on chokes on it, even though it is certified UNIX 98 (I believe):

$ uname -s -r -v -m
OSF1 V5.1 2650 alpha
$ c89 -V
Compaq C V6.5-011 on Compaq Tru64 UNIX V5.1B (Rev. 2650)
Compiler Driver V6.5-003 (sys) cc Driver

http://www.opengroup.org/openbrand/register/brand2700.htm
*/

2)

# Under SUSv2, the word "time" is not a reserved word in the shell. (Or this
# may not be prohibited, but then the reserved word "time" has to support
# option "-p", and it is strange for a reserved word to take an option (think
# "if", "for")). I'm sure SUSv2 doesn't allow a reserved word to shadow a
# standard utility in an incompatible way. Well, some systems certified UNIX 98
# don't seem to care.
#
# ----------------------------------------------------------------------
# Standards, Environments, and Macros standards(5)
#
# SUSv2 superset of SUS extended to sup- Solaris 7
# port POSIX.1b-1993, POSIX.1c-
# 1996, and ISO/IEC 9899 (C Stan-
# dard) Amendment 1
#
# http://www.opengroup.org/openbrand/register/xx.htm
# http://www.opengroup.org/openbrand/register/xw.htm

[snip SUSv2 env setup]

# $ time -p true
# sh: -p: not found
# [...]
# $ command -V time
# time is a reserved shell keyword
# $ 'time' -p true
# real 0.00
# user 0.00
# sys 0.00
# $ command time -p true
# Segmentation Fault (core dumped)
# ----------------------------------------------------------------------
#
# So I have to suppress reserved word recognition by quoting at least one
# character of the string "time". 28-JAN-2009 lacos

3)

/*
Albeit we didn't install any signal handlers, on "cygwin-1.5.19-4" these
primitives still get interrupted by (SIGTSTP, SIGCONT). We have to handle
"EINTR" - the semaphore implementation above is compatible, and the
workaround doesn't disturb us on platforms where "EINTR" is not possible in
this context.

Note that we treat "EINTR" specially only in the "*_reel()"'s.
*/

4)

/*
Close "stdout", ignore return value. Don't do this with "Compaq C V6.2-003
on OpenVMS Alpha V7.3-2", because if "stdin" is redirected from a file (not
a terminal), then the following sequence leads, for some reason, to
"%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual
address=000000000000001C, PC=FFFFFFFF809CE448, PS=0000001B":

(1) fclose(stdout);
(2) STDOUT_FILENO == socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
(3) read(STDIN_FILENO, &x, 1);

(1) was executed here, (2) was executed in "sock_init()", (3) triggered the
crash in "pkt_ckack()".
*/

5) (This comment talks about a SIGSTOP being delivered to the process
while it is blocked in select(), and then continued with SIGCONT. SIGCONT
is caught and handled by the process, thus the correct way should be most
likely to return -1/EINTR.

/*
Otherwise, repeat. On "OSF1 V4.0 1091 alpha", if the process is
stopped inside "select()", and is continued only after the specified
timeout has passed (in real time), "select()" returns 0. In this
case, similar to the one with EINTR, we need to re-"select()".

6)

/*
Originally, the second argument to "recvfrom()" and "sendto()" was "0"
(NULL), but using "Compaq C V6.2-003 on OpenVMS Alpha V7.3-2", the system
insists on a valid address, even if the size of the buffer is zero. No
conditional preprocessing is done, since the current soulution doesn't
hurt on UNIX either.

This workaround is required also for "cygwin-1.5.19-4 + winsock".
Moreover, specifying a non-NULL buffer doesn't suffice, I have to pass a
nonzero buffer size too: "cygwin-1.5.19-4 + winsock" seem unable to
send/receive empty UDP packets.
*/

7) This is about cygwin and Debian/kFreeBSD not knowing some errno macros.
(Not that the related functionality should be supported, for example,
STREAMS with Debian/kFreeBSD, but the symbolic names were mandated by the
SUSv2.)

----------------------------
revision 1.66
date: 2009/12/01 11:09:04; author: lacos; state: Exp; lines: +155 -4; kopt: kv; ...
conditionally compile in errno macros, part1 (lbzip2-0.18-2 doesn't compile
on kfreebsd)
----------------------------
revision 1.65
date: 2009/11/29 12:47:35; author: lacos; state: Exp; lines: +11 -6; kopt: kv; ...
errstr[]:
- beautify error strings for EBUSY, ECHILD, ENOSPC, ERANGE, EXDEV
- ECANCELED is conditional because Cygwin doesn't seem to know it
----------------------------

8) A "getconf" command line bug. After I worked it around for glibc and
also reported it, it hit me again when Debian switched to EGLIBC, and the
fix was not yet ported over.

# http://sources.redhat.com/bugzilla/show_bug.cgi?id=7095
# Fixed by Ulrich Drepper on 07-FEB-2009.
if ! getconf --version 2>&1 | grep -E -q 'getconf $(GNU libc|EGLIBC)$' \
|| ! getconf $SPEC"_$1" 2>/dev/null
then
getconf -v $SPEC $SPEC"_$1"
fi

9)

/*
SUSv1 doesn't exclude a mode which was later called LP64. An
example for such an implementation is "OSF1 V4.0 1091 alpha",
where "8 == sizeof(size_t)", and still, socket functions
modifying address lengths take pointers to "size_t" objects. This
shows that it's possible to satisfy SUSv1 and still have a 64-bit
"size_t".

Unfortunately, the following components:
- "Sun C 5.8 2005/10/13"
- "SunOS 5.10 Generic_118822-25 sun4u sparc
SUNW,Ultra-Enterprise-10000"
- "@(#)socket.h 1.74 05/08/02 SMI"
render "socklen_t" visible and different from "size_t" when in
LP64 mode ("-xarch=generic64"), even while in SUSv1. So let's use
this beautiful generic workaround below and elsewhere where
pointers to socket addresses must be passed.
*/
struct msghdr hdr;

hdr.msg_namelen = sizeof local;
if (-1 == getsockname(sock, (struct sockaddr *)&local,
&hdr.msg_namelen))

What do you do about it?

If the system is FLOSS, I report a bug, at least when I know that the
developers originally intended to support what I was doing. (For example,
FreeBSD, AFAICT, doesn't intend to support SUSv2 explicitly, which is a
completely logical decision, SUSv2 going back to 1997-1998.)

Do you use #ifdefs? Do you include both the correct code and the code
which works on a particular target?

I try to work it around in a way that is also standards conformant and
does no harm on conformant systems.

1) Replace "%N$*M$lu" with more "normal" format specifiers and repeat
printf arguments explicitly that were initially re-used by the original
format specifiers.

2) Write "'time'" instead of "time".

3) Handle -1/EINTR.

4) #ifndef __VMS

5) Don't just believe a select() timeout, check for a delivered SIGCONT
first.

6) Increase UDP packet payload to 1 octet (all bits zero).

7) #ifdef EXXXXXX

8) Make concession towards the known implementation. The default should
remain to complain loudly on non-conformant plaftorms.

9) Abuse "struct msghdr / msg_namelen" having type size_t or socklen_t
exactly in synch with all relevant socket functions taking pointers to
size_t or socklen_t.

If you have to support multiple targets, only some of which are broken
in a given way, do you try to handle determination by testing things in
your code, or outside the code in a build system?

I'd respond to this more wrt. correct but implementation-dependent
behavior. If I can genuinely do a test with the preprocessor, relying only
on standardized macros, I do it that way. Otherwise, I do it in code,
eliciting swaths of "condition always true" and "condition always false /
code will never be executed" warnings from gcc. (They mean "condition
always true *on this platform*", in fact.)

For example, how do you check if time_t is an integer type? The
mathematical value of 1/FLT_RADIX can be represented exactly by all
floating point types, and its value is in (0, 1).

(time_t)(1.0 / FLT_RADIX) == (time_t)0

(Or perhaps scalb(1.0, -1.0) should be used instead of the division.)

I allow my "build systems" to rely only on getconf, make, and sh, all of
which are standard. I don't use autoconf/automake or simliar tools. I test
for known bugs only exceptionally, if writing a common solution
(simultaneously for buggy and correct) is not possible. See the getconf
example.

The example that recently came up involves a UNIX extension. While the
details are unportable, the underlying issue is something you could get
wrong in any environment.

There's a function, which takes function pointers as arguments. (For the
UNIX weenies: scandir().) One of the function pointers is declared
differently on different machines I have access to. So in essence, I
have one system which declares:

extern int foo(int (*compare)(struct foomagic **a, struct foomagic **b));

and another which declares

extern int foo(int (*compare)(void *a, void *b));

scandir() has been standardized by the SUSv4.

http://www.opengroup.org/onlinepubs/9699919799/functions/scandir.html

If you code for SUSvX, x <= 3, I don't know where at all the idea of
scandir() came to you from

SUSv4 dictates the prototype precisely.

(Of course, the GNU project has a completely different stance, as
described by Ben, and it is only logical for them, maintaining a huge code
base, to develop entire libraries of workarounds.)

Imagine that you needed to interact with this function, across these two
systems. How would you do it? Assume for the sake of argument that you
can't compel the vendor to fix the broken implementation, and that "we
don't support that" is not one of your options.

What do you mean by interact? Does your code call scandir(), or does some
external library call back into your_compare()? In the former case, you're
free to implement your own scandir() function with opendir(), readdir(),
closedir(), malloc(), strcoll() or strxfrm(), and qsort().

Of course, if you're paid to meet a deadline, you'll reach for gnulib, if
the license permits, or you'll write an autoconf test.

Cheers,
lacos

Keith Thompson · Apr 16, 2010

Ersek said:
For example, how do you check if time_t is an integer type? The
mathematical value of 1/FLT_RADIX can be represented exactly by all
floating point types, and its value is in (0, 1).

(time_t)(1.0 / FLT_RADIX) == (time_t)0

(Or perhaps scalb(1.0, -1.0) should be used instead of the division.)

[...]

Wouldn't this be simpler and just as reliable?

(time_t)1 / 2 == 0

Ersek, Laszlo · Apr 16, 2010

Ersek said:
Ersek said:

For example, how do you check if time_t is an integer type? The
mathematical value of 1/FLT_RADIX can be represented exactly by all
floating point types, and its value is in (0, 1).

(time_t)(1.0 / FLT_RADIX) == (time_t)0

(Or perhaps scalb(1.0, -1.0) should be used instead of the division.)

Click to expand...

[...]

Wouldn't this be simpler and just as reliable?

(time_t)1 / 2 == 0

It is simpler to write, and probably just as reliable (even with an
FLT_RADIX greater than two), but it makes me think longer about implicit
conversions. We could even test (for the opposite) with

if ((time_t)0.5) { ... }

but it seems to involve more reasoning for me.

if (!(time_t)0.5) { ... }

looks plain crazy.

Thanks,
lacos

Michael Tsang · Apr 19, 2010

Richard said:
A couple of Borland examples spring to mind. Firstly, I had no end of
trouble getting Borland to compile anything that uses errno (see below).
Secondly, the math library is a bit screwed.

I deal with the first problem by simply not using errno if there is even
the slightest possibility that the code will be compiled under Borland
at some point (and, for most of the code I write, that's a distinct
possibility, as a result of which I have basically stuffed errno onto
the same shelf as goto and gets.

The second problem is rather more interesting, in that it can be fixed
easily as follows:

double pointless_double = 1.0;

or, if I feel so inclined:

double Borland_really_should_fix_this_stupid_bug = 3.14159;

<snip>

Is the compiler outdated!?

Michael Foukarakis · Apr 19, 2010

Obviously, in general, you ought to write clean, portable, code.

Something that's bitten me a few times recently is cases in which
implementations were buggy -- rarely, to be fair, in the core C language,
but "standard" system extensions like POSIX conformance.

Does this happen to other people? (The recent example of MSVC++ having a
buggy preprocessor is presumably one example.) What do you do about it?
Do you use #ifdefs? Do you include both the correct code and the code
which works on a particular target? If you have to support multiple
targets, only some of which are broken in a given way, do you try to
handle determination by testing things in your code, or outside the code
in a build system?

(Oh the EINTR vs ERESTART nightmares...they're back!)

I'm working with the Linux kernel all the time. It gets ugly quite
fast; if you see my code you'll find it littered with preprocessor
checks about the kernel version. There's no mechanism to get around
that, and the kernel (not the developers...no, where did you get
that?) won't export a concise API for device drivers, ever.
If I want to support multiple targets, then I usually end up with an
svn branch and porting the code over, but that's not really
undesirable, especially true for device drivers rather than
applications.

On Windows, it's a totally different nightmare. They don't conform to
POSIX, which I assumed they did, but I had to find out in a lot of
different little ways over the years..

There's a function, which takes function pointers as arguments. (For the
UNIX weenies: scandir().) One of the function pointers is declared
differently on different machines I have access to. So in essence, I
have one system which declares:

extern int foo(int (*compare)(struct foomagic **a, struct foomagic **b));

and another which declares

extern int foo(int (*compare)(void *a, void *b));

Imagine that you needed to interact with this function, across these two
systems. How would you do it? Assume for the sake of argument that you
can't compel the vendor to fix the broken implementation, and that "we
don't support that" is not one of your options.

Look, all wrapper modules/functions are ugly. It's in their nature.
The solution depends on a lot of factors, too - what I would do with
the C preprocessor someone else would've done with automake/autoconf
(which is a better choice, imo) because they're more familiar with
them. My sister would've reimplemented scandir() from scratch, but
she's a student, she's got loads of spare time.

Whatever the
choice though, I'd make sure I'd document it REALLY well.

Seebs · Apr 19, 2010

Look, all wrapper modules/functions are ugly. It's in their nature.
The solution depends on a lot of factors, too - what I would do with
the C preprocessor someone else would've done with automake/autoconf
(which is a better choice, imo)

I beg to differ.

Seriously, autoconf is not the right solution for most problems; it used to
be sort of plausible, but over time it's become a nightmare. You cannot make
me believe that a program which checks for and tells you about sizeof(char)
is a good choice for working on C portability.

More generally, it's got IMMENSE amounts of time, effort, and code, devoted
to making sure that it will still work on Ultrix and SunOS 4 and other systems
which are largely irrelevant to me; a framework that assumed a basically
POSIX-like system and looked only for quirks might be useful to me, autoconf
isn't.

because they're more familiar with
them. My sister would've reimplemented scandir() from scratch, but
she's a student, she's got loads of spare time. Whatever the
choice though, I'd make sure I'd document it REALLY well.

I've actually had to reimplement a few things from scratch, but haven't
liked it.

-s

Mark · Apr 19, 2010

Michael Foukarakis said:
I'm working with the Linux kernel all the time. It gets ugly quite
fast; if you see my code you'll find it littered with preprocessor
checks about the kernel version.

The kernel is written in something which Torvalds has made clear is not
close to "standard C" and only related to "C". Once you get into OS
development, there is no high-level language which can offer the
guarantees required. One simple example: getting guarantees about
atomicity across multiple cores is *tough*, particularly if you want to
do it without crippling performance.

This varies hugely between processors of the same family let alone
across architectures. The Linux kernel has a lot of code which must
work correctly on different architectures, and the only solution
(sometimes) is to vary the solution.

Even for the portions of code which *looks* like standard C, the target
is "gcc" not "C"*. Yes, you can normally build it with the Intel C
compiler suite, but that's not guaranteed.

* And even the version or versions targeted are specified.

In short, the Linux kernel isn't (and has never been intended to be) a
conformant C executable.

Hosted or freestanding. ;-)

There's no mechanism to get around
that, and the kernel (not the developers...no, where did you get
that?) won't export a concise API for device drivers, ever.

Not quite true. They are not keen on developing interfaces which can be
used to develop kernel code outside of the kernel - that would mean
freezing development of the API for fear of breaking code they have no
knowledge of. It could also create problems with destabilising the
kernel itself (as things like NDISwrapper has done).

Work to provide this sort of functionality without these downsides
(e.g. FUSE - Filesystem in Userspace) hasn't been stopped.

Michael Foukarakis · Apr 20, 2010

The kernel is written in something which Torvalds has made clear is not
close to "standard C" and only related to "C". Once you get into OS
development, there is no high-level language which can offer the
guarantees required. One simple example: getting guarantees about
atomicity across multiple cores is *tough*, particularly if you want to
do it without crippling performance.

This varies hugely between processors of the same family let alone
across architectures. The Linux kernel has a lot of code which must
work correctly on different architectures, and the only solution
(sometimes) is to vary the solution.

Even for the portions of code which *looks* like standard C, the target
is "gcc" not "C"*. Yes, you can normally build it with the Intel C
compiler suite, but that's not guaranteed.

Nonono, I'm not talking about the kernel here - of course it has to be
architecture dependent, etc. I'm talking about things like LKMs -
sure, they don't fall in the standard C domain either, but that's not
really relevant, it'd be the same if it was written in any other
language, because there's deeper things wrong with it.

* And even the version or versions targeted are specified.

In short, the Linux kernel isn't (and has never been intended to be) a
conformant C executable.

Thank Gawd for that. ;-)

Hosted or freestanding. ;-)

Not quite true. They are not keen on developing interfaces which can be
used to develop kernel code outside of the kernel - that would mean
freezing development of the API for fear of breaking code they have no
knowledge of. It could also create problems with destabilising the
kernel itself (as things like NDISwrapper has done).

Please. EXPORT(fname) is already an interface. Instead of providing
1000 different functions that change with every major version (usually
because the Linux kernel is like a sieve, full of security holes),
they should develop at least a clear way of obtaining available
functions; that way we'd avoid the infinity of regressions that
present each week..

People are not going to stop writing kernel code outside of the kernel
until the kernel stops exporting functions to the userland and LKMs -
and that ain't going to happen. Of course, "standardizing" an API is
difficult, I'm not saying otherwise. It's better formalizing one than
not, though.

Michael Foukarakis · Apr 20, 2010

I beg to differ.

Wow, you're right. I mistyped that, I think the preprocessor is much
better than auto[make|conf], which are the work of something worse
than all variations of hell.

How do I make this craftinfsystem Work	1	Feb 9, 2023
What do you think about this script?	0	Aug 11, 2023
Unicode and Python - how often do you index strings?	33	Jun 4, 2014
Engineering a List container Part 2: Implementations	20	Dec 8, 2013
How to have a horizontal slider on pygame screen?	0	Dec 24, 2022
How to have two html audio players on one page?	0	May 3, 2022
How to make a div select work?	5	Jan 13, 2022
How Do I See math.h Implementations	24	Sep 29, 2010

How often do you have to work around implementations?

Seebs

Ben Pfaff

Seebs

Ersek, Laszlo

Keith Thompson

Ersek, Laszlo

Michael Tsang

Michael Foukarakis

Seebs

Mark

Michael Foukarakis

Michael Foukarakis

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads