non-standard functions in libc -- bad design?

B

blmblm

I've been involved, in another context, in a long and contentious
discussion about whether functions that are part of the POSIX
standard but not the C standard -- getpid() in particular --
should be regarded as "third-party". One of my arguments against
so regarding them is that gcc packages them in libc along with
C-standard functions such as printf(). The other side responded
by saying that this just shows how "broken" UNIX is -- that libc
should include only functions that are part of the C standard.
Both sides in this debate, however, seem to have their biases :),
so I thought I'd try to get a wider range of opinions .... :

(*) Is "third-party" an accurate term to apply to functions that
are not part of the C standard (but might be part of some other
standard supported by a compiler/library, such as POSIX)?

(*) Is there a compelling reason to avoid including anything
that's not part of the C standard library in libc (and instead
put it in a separate library)? If there is, does anyone know
why gcc doesn't do it that way?

I *think* this is mostly on-topic, despite the references to POSIX
and gcc, since the point is mostly to explore the definition of
"third-party" and the packaging of non-standard library functions.
 
G

Gene

I've been involved, in another context, in a long and contentious
discussion about whether functions that are part of the POSIX
standard but not the C standard -- getpid() in particular --
should be regarded as "third-party".  One of my arguments against
so regarding them is that gcc packages them in libc along with
C-standard functions such as printf().  The other side responded
by saying that this just shows how "broken" UNIX is -- that libc
should include only functions that are part of the C standard.
Both sides in this debate, however, seem to have their biases :),
so I thought I'd try to get a wider range of opinions .... :

(*) Is "third-party" an accurate term to apply to functions that
are not part of the C standard (but might be part of some other
standard supported by a compiler/library, such as POSIX)?

(*) Is there a compelling reason to avoid including anything
that's not part of the C standard library in libc (and instead
put it in a separate library)?  If there is, does anyone know
why gcc doesn't do it that way?

I *think* this is mostly on-topic, despite the references to POSIX
and gcc, since the point is mostly to explore the definition of
"third-party" and the packaging of non-standard library functions.

Seems like a meaningless distinction to quibble over. The only issues
that make a difference are platforms and licenses. If the current and
future platforms support POSIX, and the license costs manageable, then
use POSIX. Else don't. POSIX has stuff you can't get in clib. That's
one of the reasons it was created.

The clib/POSIX split makes a fair amount of sense if you consider that
C runs on many systems far, far away from Unix, such as embedded ones.

Worrying about whether POSIX functions are in the same library module
as clib is truly counting the angels dancing on a pinhead. People
need to get a life.
 
P

Peter Nilsson

I've been involved, in another context, in a long and
contentious discussion about whether functions that are
part of the POSIX standard but not the C standard --
getpid() in particular -- should be regarded as "third-
party".  One of my arguments against so regarding them
is that gcc packages them in libc along with C-standard
functions such as printf().  The other side responded
by saying that this just shows how "broken" UNIX is --
that libc should include only functions that are part
of the C standard. Both sides in this debate, however,
seem to have their biases :), so I thought I'd try to
get a wider range of opinions .... :

(*) Is "third-party" an accurate term to apply to
functions that are not part of the C standard (but might
be part of some other standard supported by a compiler/
library, such as POSIX)?

Non-standard is how they're referred to here. How about,
non C standard. Note that it not only functions. There
are typedefs and macros as well.

Perhaps a better view is non-implementation namespace
identifiers!
(*) Is there a compelling reason to avoid including
anything that's not part of the C standard library in
libc (and instead put it in a separate library)?  If
there is, does anyone know why gcc doesn't do it that
way?

That would depend on how gcc links. If a user supplied
function takes precedence over libc, then all is well
and there's no reason not to package the function inside
libc.

What is more critical is whether the headers declare
non standard functions when gcc is invoked in conforming
mode.
I *think* this is mostly on-topic, despite the
references to POSIX and gcc, since the point is mostly
to explore the definition of "third-party" and the
packaging of non-standard library functions.

The standard doesn't say how linking is to be implemented,
only that it must be done. How individual implementations
choose to do so is largely going to be off-topic beyond
answering whether the implementation is conforming or not.
 
S

Seebs

(*) Is "third-party" an accurate term to apply to functions that
are not part of the C standard (but might be part of some other
standard supported by a compiler/library, such as POSIX)?

I wouldn't call them that, I'd call them extensions.
(*) Is there a compelling reason to avoid including anything
that's not part of the C standard library in libc (and instead
put it in a separate library)? If there is, does anyone know
why gcc doesn't do it that way?

gcc doesn't get a vote. You might be thinking of glibc, uclibc, Berkeley
libc, klibc... gcc is just a compiler, not an implementation.
I *think* this is mostly on-topic, despite the references to POSIX
and gcc, since the point is mostly to explore the definition of
"third-party" and the packaging of non-standard library functions.

A conforming implementation ought to make sure that they don't trample user
namespace.

Example from my own experience: I wrote a program once which worked fine if
and only if I did not break it into two modules. I had a global variable
named "end". Some nutjob had chosen the name "end" as a magic symbol marking
the end of the address space... As long as the variable was used only in
one file, I got the intended behavior.

Solution: The linker has a flag for making something a "weak reference",
meaning that it's linked only if nothing else with that name shows up. The
extensions are carefully made into weak references. In general, this means
that you get the expected behavior for portable code, but if you want to
request special local extensions, you can and then you have them.

I do not think there is any particular advantage to not including local
extensions as part of "the C library", as long as it's possible to build
programs with those extensions disabled in some way. (This is not entirely
consistently true in Linux these days, but it is certainly becoming more
true over time in most of the Unix-like systems I use.)

-s
 
K

Kenny McCormack

Gordon Burditt said:
Do you know of a court case where you need the legal definition of
"third-party" in this context? It seems like trying to nail down
what the definition of "is" is when you don't have a President
trying to weasel out of something.

This is CLC. Nitpicking over definitions and trivia and arguing about
how many angels can dance on a pin is what we do.

You've been around long enough to know that, haven't you?

Seriously. If it weren't for this angels-on-pins stuff, you could hear
crickets in here... Well, that and all the character assasination and
bullying.
 
K

Kenny McCormack

On Tue, 29 Sep 2009 21:27:49 -0700 (PDT), Peter Nilsson

[snip]
Non-standard is how they're referred to here. How about,
non C standard. Note that it not only functions. There
are typedefs and macros as well.

I like "non C standard" - saying something is non standard when
one really means non C standard sounds as though the speaker
thinks the only "real" standard is one of the C standards.

And we don't know any people like that now, do we?
 
B

blmblm

I wouldn't call them that, I'd call them extensions.

That would be my preferred term too. Just sayin', maybe.
gcc doesn't get a vote. You might be thinking of glibc, uclibc, Berkeley
libc, klibc... gcc is just a compiler, not an implementation.

Ah. Thanks for pointing out this distinction -- obvious once you've
done so, but not something I had thought about.

(So, maybe I should try this question in comp.unix.programmer .... )
A conforming implementation ought to make sure that they don't trample user
namespace.

For the record, if I write a C program that contains functions
with the same names as some POSIX-extension functions (I chose
getpid() and write()), gcc seems happy to compile and link it
without warnings or errors, and the resulting executable uses my
functions rather than the POSIX ones. That's what you mean here,
right? And those POSIX-extension functions -- the declarations
are in unistd.h rather than one of the C-standard headers, which
I'm thinking is also a sign of not-broken-ness.
Example from my own experience: I wrote a program once which worked fine if
and only if I did not break it into two modules. I had a global variable
named "end". Some nutjob had chosen the name "end" as a magic symbol marking
the end of the address space... As long as the variable was used only in
one file, I got the intended behavior.

Good story ....
 
B

blmblm

If it comes with the OS, and it's used by the C implementation, I
don't think "third-party" is the right term to use for it.

The C standard doesn't use that term, and I don't think it
appears in licenses referring to functions in a library.
Does it matter?

Not really; it's a terminology quibble, but sometimes those are
hard to resist. Sort of a :).
If you take out all the functions not defined by the C standard,
the functions defined by the C standard on many systems won't work.
(I think Windows would have much the same problem, just with
differently-named functions.) C implementations often use POSIX
functions, on an OS that supports it, to do I/O. That doesn't mean
they have to invade the user namespace, so defining your own function
write() shouldn't mess up the library and your calls to your write()
should end up calling your write().

Which seems to happen, based on a quick experiment.

But, um, the "C library" appears to be packaged in at least two
distinct parts already, one with the math functions and one with
the rest. So why not further separate out the not-C-standard
extensions?
I don't think that's an accurate term to apply to OS functions
that are needed to make the C implementation work.


The compelling reason to NOT do that is that it would break a lot
of stuff, including C.

So, replacing libc.{so,a} on an existing system with one that
doesn't include the POSIX extensions would break a lot of things.
But that doesn't really address the question of whether putting
them together in the first place was a good design, right?

I'm thinking now that comp.unix.programmer may be the right place
to ask about that.
gcc doesn't supply C libraries. (It may supply some compiler support
routines like doing a 64-bit by 64-bit multiplication on machines
whose hardware doesn't do that natively.) OS vendors supply libraries.
gcc isn't a complete implementation.

Yes, quite -- obvious once it's pointed out, but not something I had
thought of ....
Do you know of a court case where you need the legal definition of
"third-party" in this context? It seems like trying to nail down
what the definition of "is" is when you don't have a President
trying to weasel out of something.

Eh. It's nitpicking and quibbling about terminology and like that,
but really, isn't that a big part of what people *DO* here? Sort
of a :).
 
K

Keith Thompson

For the record, if I write a C program that contains functions
with the same names as some POSIX-extension functions (I chose
getpid() and write()), gcc seems happy to compile and link it
without warnings or errors, and the resulting executable uses my
functions rather than the POSIX ones. That's what you mean here,
right? And those POSIX-extension functions -- the declarations
are in unistd.h rather than one of the C-standard headers, which
I'm thinking is also a sign of not-broken-ness.
[...]

Some of them are in unistd.h, but some functions defined by POSIX but
not by C are required (by POSIX) to be declared in <stdio.h>. popen()
and pclose() are two examples.

POSIX defines macros that you have to define before #include <stdio.h>
to make these functions visible. Consult comp.unix.programmer for
details.
 
K

Keith Thompson

But, um, the "C library" appears to be packaged in at least two
distinct parts already, one with the math functions and one with
the rest. So why not further separate out the not-C-standard
extensions?
[...]

For one thing, in many cases the not-C-standard extensions might be
used to implement the C standard functions.

For example, if the implementation of fopen() calls open(), there's
not much advantage in putting open() in a separate library (except
maybe for programs that use open() but not fopen().
 
R

Richard Tobin

The other side responded
by saying that this just shows how "broken" UNIX is -- that libc
should include only functions that are part of the C standard.

You would really have to be amazingly ignorant of the history of
C to suggest that Unix is broken because its libraries aren't
organised according to the C standard.

-- Richard
 
B

blmblm

For one thing, it would break the compiler, which generally needs
to do I/O to function.

Well, only if the compiler doesn't look in more than one place
for library functions, right? Is there some reason it couldn't
be modified to do that? (And "the compiler" is maybe a misnomer
anyway, since there could be more than one, no?)
If it doesn't work, it seems like a pretty bad design decision. It
should not be necessary for someone who wishes to compile a C program
to have to know which non-standard libraries have to be included
in order to make it function. These will almost certainly vary
from one OS to another, making it rather difficult to come up with
a build script that works on multiple platforms that C is supposed
to be portable to.

But doesn't this -- where to find library functions -- already vary
from platform to platform, with the differences hidden from users?
I mean, Windows doesn't have a file called /lib/libc.so.something,
does it? The link phase of a compiler already has some predefined
list of where to find library functions, no? Why couldn't this
list have more than one entry (libc and something else)?
C compiler vendors may not really have much choice in the matter.
It's the OS vendor that decide on what gets included in their OS,
and even if a C compiler is not included, the C library is (on OSs
written primarily in C, and especially those using shared libraries)
needed to get much farther than booting.

Yes .... So splitting libc into more than one piece would require
changes there too.

I'm not saying it would be trivial to do, just that it seems to me
to be possible.
Usually the nitpicking is about *C* terminology, although there
is occasionally nitpicking about non-C terms like "Heathfield",
"Bullschildt", and "troll".

Well, I thought at least part of my question *was* about C
terminology, sort of. If I was wrong about that, then I apologize
to the group! but this has been very interesting and educational,
so my thanks to those who've responded.
 
N

Nobody

But, um, the "C library" appears to be packaged in at least two
distinct parts already, one with the math functions and one with
the rest. So why not further separate out the not-C-standard
extensions?
[...]

For one thing, in many cases the not-C-standard extensions might be
used to implement the C standard functions.

For example, if the implementation of fopen() calls open(), there's
not much advantage in putting open() in a separate library (except
maybe for programs that use open() but not fopen().

Also, while the above case wouldn't make much sense, it would at least be
straightforward to implement. It gets harder when the POSIX functions are
more tightly integrated. E.g. POSIX defines popen(), which returns a FILE*
which can be used anywhere that a FILE* returned from fopen() could be
used.
 
N

Nick Keighley

I don't think "third-part" is a very useful/helpful term.

Not really; it's a terminology quibble, but sometimes those are
hard to resist.  Sort of a :).

but the litmus test for correct vocabulary is "is it useful?".
I don't think "third-party" is useful enough in this context.



I'm pretty sure *any* OS has this "problem". In the end stdio
has to which things to devices. The Standard doesn't explain
how this occurs. Then there's of my head malloc, signals etc.
Most of the Standard Library can't be implemented without
underlying support from the implementation.

But, um, the "C library" appears to be packaged in at least two
distinct parts already, one with the math functions and one with
the rest.  So why not further separate out the not-C-standard
extensions?  

You are drifting a bit here. You started talking about header files
now you are talking about library files. P.J.Plauger wrote an
excellent book about the implementaion of a Standrad C library (C89
only).
As I remember his standard headers were pretty clean in that they
only had stuff from the standard in them; except he had some magic
implementation defined header files which he included. But to
make the library actually work you had to link with something
implementation specific.

How will dividing the library up help the developer. I confess
I wish POSIX were cleaner but in practice it doesn't seem to make much
odds.
Wishing things were different probably isn't going to make much
difference.

So, replacing libc.{so,a} on an existing system with one that
doesn't include the POSIX extensions would break a lot of things.
But that doesn't really address the question of whether putting
them together in the first place was a good design, right?

I'm thinking now that comp.unix.programmer may be the right place
to ask about that.

You'll get even less sympathy there! "Why would we want to break
our Posix compliant code to make clc happy?".

Yes, quite -- obvious once it's pointed out, but not something I had
thought of ....

I just think "third-party" is a bad term. I'd use "Non-standard".
or "Not C Standard"
Eh.  It's nitpicking and quibbling about terminology and like that,
but really, isn't that a big part of what people *DO* here?  Sort
of a :).


A lot of the c.l.c. verbiage seems to be devoted to the numerical
density of cavorting nubile seraphim upon pinheads.
CBFalconer
 
N

Nobody

I just think "third-party" is a bad term. I'd use "Non-standard".
or "Not C Standard"

"Non-standard" isn't any more helpful when more than one standard may be
involved. POSIX, XPG/* et al are all standards, not to mention /de facto/
standards (which are frequently more important than the actual /de jure/
standards).
 
N

Nick Keighley

"Non-standard" isn't any more helpful when more than one standard may be
involved. POSIX, XPG/* et al are all standards, not to mention /de facto/
standards (which are frequently more important than the actual /de jure/
standards).

if I'm talking about C then I'm talking about the C standard
 
S

Stephen Sprunk

[ My apologies for the flood of delayed responses; my old news server
was silently eating all my posts for the last week or so.]

Gordon Burditt wrote:

Please do not remove attribution lines when you're quoting other people.
If you take out all the functions not defined by the C standard,
the functions defined by the C standard on many systems won't work.
(I think Windows would have much the same problem, just with
differently-named functions.) C implementations often use POSIX
functions, on an OS that supports it, to do I/O. That doesn't mean
they have to invade the user namespace, so defining your own function
write() shouldn't mess up the library and your calls to your write()
should end up calling your write().

On a system that supports "weak" symbols, this simply isn't an issue.
Since libc is linked together independently, functions such as fwrite()
that call write() would always get the POSIX write() since that's the
only write() that exists at the time. When you linked your program to
libc, any calls you make to write() would only use the weak symbol for
the POSIX write() if you did not provide your own write(), which would
have a strong symbol. (i.e. strong symbols are searched first when
linking, then weak symbols.)

On a system that doesn't support weak symbols, the implementor could put
the "extra" functions into the implementation namespace, e.g.
__posix_write(), and provide a POSIX header with macros that changed
calls to write() into calls to __posix_write(). libc would use this
header internally so fwrite()'s call to write() worked, as would any
other library or user program that needed the POSIX write(). Without
that header, though, calls to write() would NOT get the POSIX write().

(The only limitation on this tactic is that it can only be done by the
implementor, but that's what we're discussing at the moment.)
The compelling reason to NOT do that is that it would break a lot
of stuff, including C.

It wouldn't break anything, as there are examples of implementations
which do _not_ include all that cruft in libc.

The POSIX world is full of disagreements over which libraries should
contain which "standard" functions, which can make porting a pain if you
don't use tools like GNU's autoconf. Most of these disagreements
predate POSIX itself, otherwise logically they'd all be put in a single
libposix...

S
 
B

blmblm

I don't think "third-part" is a very useful/helpful term.



but the litmus test for correct vocabulary is "is it useful?".
I don't think "third-party" is useful enough in this context.

Could be. I'm inclined to think that the other party to the
original dispute used "third party" mostly as a way to disparage --
something, I'm not sure what -- but I agree that arguing about
the definition doesn't really advance a discussion of relevant
technical issues.

[ snip ]
You are drifting a bit here. You started talking about header files
now you are talking about library files.

Well, one of the questions in my initial post was about what
should be included in a particular library file (see below).
P.J.Plauger wrote an
excellent book about the implementaion of a Standrad C library (C89
only).
As I remember his standard headers were pretty clean in that they
only had stuff from the standard in them; except he had some magic
implementation defined header files which he included. But to
make the library actually work you had to link with something
implementation specific.

How will dividing the library up help the developer. I confess
I wish POSIX were cleaner but in practice it doesn't seem to make much
odds.
Wishing things were different probably isn't going to make much
difference.

I'm not sure it's a question so much of what would help the developer
as of what can be defended as "good design" -- or not.

[ snip ]
You'll get even less sympathy there! "Why would we want to break
our Posix compliant code to make clc happy?".

Maybe it's not clear (though maybe it doesn't need to be) .... :

I'm not the one who said that putting C-standard and not-C-standard
functions in the same library file was "broken"; I started
this thread rather hoping someone would help me come up with
convincing arguments to the contrary, though I'm also willing to
hear arguments supporting the other party to the initial dispute.
As best I can tell, the majority view has been "this is how
things are, and to change it would be a lot of disruption for
no particular payoff", which isn't quite the same thing as
"this was a brilliant design, and here's why .... " <shrug>
It's been educational anyway.

[ snip ]
A lot of the c.l.c. verbiage seems to be devoted to the numerical
density of cavorting nubile seraphim upon pinheads.
CBFalconer

I like it. :)
 
J

James Kuyper

Could be. I'm inclined to think that the other party to the
original dispute used "third party" mostly as a way to disparage --
something, I'm not sure what -- but I agree that arguing about
the definition doesn't really advance a discussion of relevant
technical issues.

I doubt it. I suspect that the issue was about conformance. The C
standard imposes some requirements that an implementation of C must
satisfy to be considered conforming, and other requirements that source
code must satisfy in order to produce defined behavior when compiled by
such implementations.

In principle, a standard like POSIX could have taken either of two
routes: it could have defined it's library to meet the requirements
imposed on conforming extensions to the C standard library, or it could
have defined it's library to be a third-party library independent of the
C standard library, to be linked to by user code. The most important
distinction between those two options is the naming conventions: to
qualify as a conforming extension to the C standard library, POSIX
identifiers would have had to use exclusively names reserved to the
implementation; as a third-party library it would have been prohibited
from doing so.

POSIX didn't follow either of those approaches; it's both part of and in
addition to the C standard library. It imposes additional requirements
on functions in the C standard library, but also adds additional
functions with names that are not reserved by the C standard to the
implementation. Historically, this is because both standards were
standardizing existing practice that pre-dated either library, and I'm
not sure there were any significant better options available to the
standardizers - but it makes describing the relationship between the
libraries defined by the two standards somewhat confusing.
 
B

blmblm

I doubt it. I suspect that the issue was about conformance.

With all due respect, I based my comment above on the overall tenor
of the initial dispute, which has been contentious and sometimes
more about scoring points than actual technical discussion.

Still, it's possible that I misread tone .... And in any case,
the question of conformance is an interesting one!
The C
standard imposes some requirements that an implementation of C must
satisfy to be considered conforming, and other requirements that source
code must satisfy in order to produce defined behavior when compiled by
such implementations.

In principle, a standard like POSIX could have taken either of two
routes: it could have defined it's library to meet the requirements
imposed on conforming extensions to the C standard library, or it could
have defined it's library to be a third-party library independent of the
C standard library, to be linked to by user code. The most important
distinction between those two options is the naming conventions: to
qualify as a conforming extension to the C standard library, POSIX
identifiers would have had to use exclusively names reserved to the
implementation; as a third-party library it would have been prohibited
from doing so.

POSIX didn't follow either of those approaches; it's both part of and in
addition to the C standard library. It imposes additional requirements
on functions in the C standard library, but also adds additional
functions with names that are not reserved by the C standard to the
implementation. Historically, this is because both standards were
standardizing existing practice that pre-dated either library, and I'm
not sure there were any significant better options available to the
standardizers - but it makes describing the relationship between the
libraries defined by the two standards somewhat confusing.

I'll say. I've just made an attempt to read, in the C standard
at http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf,
what it means to be a conforming implementation .... Have I got
this straight:

A conforming implementation can include functions that are not
part of the standard library, but if it does, the names of these
functions can't be ones that could appear in strictly-conforming
programs. ?

And .... POSIX defines some functions that are not C-standard but
also don't follow this rule?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,060
Latest member
BuyKetozenseACV

Latest Threads

Top