Which lean C compiler for 32-bit OS development

jacob navia · Dec 13, 2011

Le 13/12/11 07:32, Robert Wessel a écrit :

While the hardware supports it, the standard tools for Windows don't
expose the 80-bit format, except in assembler. So the majority of
people may be running on machines with native 80 bit floats, but
mostly don't get to see anything longer than 64.

Unless they use lcc-win of course, that provides true 80 bits
floating point, and true output of that data with printf.

James Harris · Dec 14, 2011

Then there are two possibilities:
1. You are DRAMATICALLY smarter than everyone who has ever worked on this.
2. What you are doing is absolutely doomed to irrelevance from the start.

I like the dramatic way you express things.

It does help me
understand your point of view. Are you thinking that the calling
conventions are an end in themselves?

In fact, the calling conventions we are discussing are designed to
support the tasking model which is somewhat different from the norm.
For example, individual functions should be replaceable while they are
running. Yes, really. I'm not going to go into the details here as the
OS design, so far, has many parts which interrelate and one part in
isolation would likely lead to a long and unsatisfactory discussion.

I will, however, mention something more conventional to illustrate
some of the reasons for the design of a specific calling convention.
As well as the dynamic replacement, the calling supports non-mandatory
dynamic linking and lazy linking. I had to find a way to make these
very efficient. And, for various reasons, I needed a way for user-mode
code to make system calls without knowing whether the called routine
was privileged or not. Don't worry about the details. The point is
that the preferred calling convention is specific for very good
reasons.

I know I could have worked with a shell round an existing calling
convention but whatever I considered would not have been even remotely
as efficient. In this case the calling convention does influence the
design because I wouldn't have been able to come up with the design
that I have if I couldn't make the calling efficient.

....

There can be *at least* three distinct classes of pointers. Real systems
have existed on which pointer-to-function, pointer-to-struct, and
pointer-to-void were three different kinds of objects, and not
interchangeable. I suspect at least a few are still in use.

What sort of systems? Do you mean, "compilers" or "operating systems"
or something else?

James

James Harris · Dec 14, 2011

[...]

That's OK. I was saying C was the most suitable language I know
because the source can use the simple concepts necessary to carry out
what I want to do. Other languages generally have programmer-friendly
features that, while convenient, could be translated into something
very inefficient or practically impossible to interface with. A C
optimiser rearranging things doesn't change the essential simplicity
of the correspondence between the C source and the CPU the machine
code will run on.

Click to expand...

I wonder if C-- would suit your purposes better.
<http://www.goosee.com/cmm/>.

Thanks for the suggestion. I'll certainly take a look.

James

Seebs · Dec 14, 2011

I like the dramatic way you express things. It does help me
understand your point of view. Are you thinking that the calling
conventions are an end in themselves?

No, rather I was thinking the opposite; unless your purpose is specifically
to research calling conventions, it's highly likely that the penalties
from replacing calling conventions will trump the benefits.

And, for various reasons, I needed a way for user-mode
code to make system calls without knowing whether the called routine
was privileged or not. Don't worry about the details. The point is
that the preferred calling convention is specific for very good
reasons.

I am not comprehending this part. Existing calling conventions are like
that too. You don't have to know anything about the called routine; that's
why it's a convention.

In systems where there's a distinction between "library calls" and
"system calls", the practical reality is that nearly always, the library
contains callable functions (using standard calling conventions)
which wrap the "system calls".

I know I could have worked with a shell round an existing calling
convention but whatever I considered would not have been even remotely
as efficient.

It doesn't seem to be hurting performance in any of the systems I know
currently doing it.

What sort of systems? Do you mean, "compilers" or "operating systems"
or something else?

"Systems" meaning "implementations", which tends to be the trio of compiler,
operating system, and hardware. So for instance, given an x86 system,
you could have different operating systems, and different compilers running
under those operating systems. And you could have the same compiler running
under two of those operating systems. And each of these might give you
different rules in some cases; although we've mostly left the days of having
to care about "near" and "far" pointers, there were certainly cases where
there were distinct pointer types in living memory, and there likely will
be again.

-s

Ian Collins · Dec 14, 2011

On 12/15/11 10:00 AM, James Harris wrote:

I know I could have worked with a shell round an existing calling
convention but whatever I considered would not have been even remotely
as efficient.

The wrapper, if present (some compilers are smart/OS aware enough to
inline them), the overhead of a wrapper is trivial compared to the
actual call into kernel space.

In this case the calling convention does influence the
design because I wouldn't have been able to come up with the design
that I have if I couldn't make the calling efficient.

You appear to be hung up on potential efficiency, have you actually made
some real world measurements?

What sort of systems? Do you mean, "compilers" or "operating systems"
or something else?

I've used systems with separate data and programme memory (with
different bus widths), so data pointers were a different beast form
function pointers. Google Harvard architecture

James Harris · Dec 15, 2011

On 12/15/11 10:00 AM, James Harris wrote:

The wrapper, if present (some compilers are smart/OS aware enough to
inline them), the overhead of a wrapper is trivial compared to the
actual call into kernel space.

There's a lot of truth in this but kernel calls on some machines are
much faster than they used to be. Also, 1) especially for machines
that need it I have come up with a number of ways to reduce the number
of transitions, 2) at least in the bit you quoted I was talking about
calling between functions in general, not just privilege transitions.

You appear to be hung up on potential efficiency, have you actually made
some real world measurements?

Guilty as charged. I am highly focussed on efficiency for the OS both
at the micro and, more importantly, the macro levels. Yes, I have made
a lot of real-world measurements and studied CPU internals.

James

James Harris · Dec 15, 2011

[...]
There's very little mandated by the CPU that I can think of. What did
you have in mind? A given CPU may have an instruction pointer
register, a stack pointer register and/or dedicated registers for use
in call and return instructions but the definitions of how parameters
are passed, which registers are caller-save and which are callee-save
and how return addresses are stored when a sub-sub-routine is called
etc are generally just conventions aren't they?

Click to expand...

You're probably right that the CPU doesn't "mandate" subroutine
linkage conventions. You're free to ignore that tempting stack, those
automatically-turned register windows, and whatever other doodads the
CPU provides for your convenience. If you feel like it, you can write
all the arguments to a disk file and have the subroutine read them
back again.

Well, I could, but I am looking for slightly better performance than
that. :^)

If you plan to write everything yourself, and to make no use of
tools or components developed in the outside world, this is viable.
But if you want to leverage independent work, you must give thought to
doing things in a way that will ease, or at least not impede, the
integration of the parts.

Real-world example: In a former life I worked for the late, great
Sun Microsystems. One continuing and exasperating problem was getting
good device drivers for desirable gadgets dreamed up by third parties.
Company X has a super-duper-speed fibre channel adapter, Company Y has
a video card that is all the rage, and so on. The first thing the
companies do is provide device drivers for a couple flavors of Windows,
then maybe Linux -- but because Solaris had its own device driver
framework, we often wound up with no driver, or paid Companies X,Y,Z
for slapdash inferior drivers ... because we were "better."

Principle understood and agreed with. For the specifics of device
drivers any OS developer is in a no-win situation. It is quite
impossible to develop for the vast number of devices out there. There
are simply far too many of them. At my level all one can do is control
standard devices and provide an environment for other device drivers
to run in. Even that is a daunting task and I've hardly scratched the
surface.

Arthur Clarke wrote a story called "Superiority." Read it, if
you can find it somewhere.

I remember it. I'm a big fan of Arthur C Clarke's stories.

James

Seebs · Dec 15, 2011

Guilty as charged. I am highly focussed on efficiency for the OS both
at the micro and, more importantly, the macro levels. Yes, I have made
a lot of real-world measurements and studied CPU internals.

I'd be really interested in seeing what kinds of numbers you're seeing
for real-world code. I've seen far too many benchmarks wherein people
establish that they are able to trim 10% off a particular hunk of syscall
operations... without considering that when actual data are involved, that
hunk of syscall operations is maybe 2% of the total time spent processing
the syscall.

-s

James Harris · Dec 15, 2011

....

Like with the VAX example I used,SPARC's register windows offer some
pretty strong nudges, but doesn't quite mandate the calling
convention. It's entirely possible (although not necessarily a good
idea) to program SPARC while ignoring the register windows(IOW, avoid
SAVE/RESTORE and treat the machine as having ~32 conventional
registers). At least up to the point where you need to call/be called
by code using the standard conventions.

Sure. So far I have looked at Sparc, MIPS and Arm but worked out the
most detail on x86 so I can't tell you what I would do on Sparc but
since its rotating register windows are at least reputed to be very
efficient they may well figure in the OS-internal calling convention
for that machine.

James

James Harris · Dec 15, 2011

I'd be really interested in seeing what kinds of numbers you're seeing
for real-world code. I've seen far too many benchmarks wherein people
establish that they are able to trim 10% off a particular hunk of syscall
operations... without considering that when actual data are involved, that
hunk of syscall operations is maybe 2% of the total time spent processing
the syscall.

I understand the point and it's a good one. I don't normally approach
it that way. To say something is x% faster than something else might
be valid for an app but it is irrelevant at the per-instruction level
I have been looking at. My interest is principally in two things:

1. Which options are as fast as expected and are any unexpectedly slow
(and if so, why)?

2. Can I learn to predict performance on a given CPU?

The second is especially relevant. For example, on an AMD CPU that I
have I can clearly see effects showing the CPU complete three single-
cycle-type instructions per clock and can determine which instructions
are that fast.

Once one knows which instructions are single-cycle-type (there must be
a better term) and identify dependencies it is not too hard to write
very efficient code sequences.

As an approach it's not perfect but it does demystify what is
otherwise a black art.

There are helpful optimisation guides from AMD and Intel but they tend
to be lists of rules without too much explanation. The best info I
have found to help understand what's going on is Agner Fog's,
especially the microarchitecture manual.

http://www.agner.org/optimize/

He has a manual for a HLL too but it's C++. :-(

James

Malcolm McLean · Dec 16, 2011

I'd be really interested in seeing what kinds of numbers you're seeing
for real-world code. I've seen far too many benchmarks wherein people
establish that they are able to trim 10% off a particular hunk of syscall
operations... without considering that when actual data are involved, that
hunk of syscall operations is maybe 2% of the total time spent processing
the syscall.

You write an applications level program to, say dynamically update a
list of references in a word processing document. It updates 2%
faster. Not much has been achieved.

But at the systems level, if you make all programs on the computer 2%
faster, that's worth a great deal. Not by itself, but cumulatively
with other improvements, 2% here and there has a real impact on
performance.

Seebs · Dec 16, 2011

But at the systems level, if you make all programs on the computer 2%
faster, that's worth a great deal. Not by itself, but cumulatively
with other improvements, 2% here and there has a real impact on
performance.

Maybe. But we weren't talking about 2%. We were talking about 10% of 2%.

And at that point, it's quite possible that the side-effect costs ("we
can't upgrade to a newer gcc yet, no one's ported it to support this")
could easily exceed the nominal benefits.

-s

James Harris · Dec 17, 2011

....

I'm not saying this discussion doesn't have some interesting posts here
and there, but I don't understand why is everyone trying so passionately
to convince the OP to not try out his ideas about different calling
conventions in his own operating system.

It has been fun responding to the challenges. I think like many of us
I don't mind at all dealing with objections as long as I feel people
are hearing what I say, er, what I write..., er reading what I ...,
well, you know what I mean. Fortunately, the objections and comments,
while negative, have all been on topic and making intelligent points.

The passionate nature of the objections is a bit of a mystery
especially since no-one knows any of the details of what I have
planned. I deliberately haven't explained them - mainly because they
cannot be taken in isolation. However, I have mentioned *why* I have a
need for something new and that it doesn't preclude interfacing with
existing systems.

So why the objections? Some possible reasons:

1. FUD.

2. Misoneism. http://en.wiktionary.org/wiki/misoneism

3. Inertia. (The current systems were alright for my father....)

4. Ignorance. Not knowing the reason for something new (and not
asking).

5. Accustomisation with seeing cranky ideas.

I think the last is most likely. This is Usenet, after all.

James

32/64 bit cc differences	110	Jan 10, 2014
64-bit integers where the implementation supports max 32-bit ints	37	Aug 5, 2013
C++ hardware library for (small) 32-bit micro-controllers	18	Dec 4, 2013
SuperKISS for 32- and 64-bit RNGs in both C and Fortran.	11	Nov 27, 2009
Any compiler backend for the Z380?	0	May 29, 2014
How do you get gnu gcc to build a 64 bit binary compiler?	2	Aug 9, 2011
Cross-compiling and linking 32-bit applications	3	May 28, 2007
C/C++ 64-bit compilers for Vista?	4	Aug 23, 2008

Which lean C compiler for 32-bit OS development

jacob navia

James Harris

James Harris

Seebs

Ian Collins

James Harris

James Harris

Seebs

James Harris

James Harris

Malcolm McLean

Seebs

James Harris

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads