Reimplenting Linux Kernel in Python

Grant Edwards · Oct 14, 2004

As for C being a natural choice as far as the kernel part is
concerned... it may be because it compiles to efficient code,
or because it facilitates integration with assembler, or
because it can efficiently uses registers...

All of those.

Who knows? But the main reason is probably simple: it's
because everyone is just used to it.

Exactly. Any number of "low level", statically typed,
compiled, languages would work equally as well. Modula-2 (not
sure about M-3, the GC stuff maybe a bit iffy in a kernel),
Ada, BLISS, PL/I, Pascal, etc.

Mike Meyer · Oct 14, 2004

I had once given serious thought to what it would take to write an OS only
in a high-level language. (Standard ML was my language of choice at the
time.) The short answer is, no, it can't be done with Python as it currently
stands.

But it can be done. It *has* been done.

First, you would need a Python compiler that can compile Python programs to
machine code. Otherwise, what would you write your interpreter in? Python?
What would you write the interpreter's interpreter in? It can't be Python
all the way down. (Turtles are a different story, though.

If you've got a Python compiler that generates machine code, you most
certainly *can* write your Python interpreter in Python. That's the
way most LISP systems are built.

Second, in order to communicate with hardware, you would need modifications
to the Python language so that you can read and write bytes to specific
memory locations. You would also need to be able to signal interrupts. Plus,
you would also need to be able get the exact memory location of functions,
in order register interrupt handlers.

Actually, you don't need those modifications to Python. What you need
is a module written in something other than Python that lets you do
all those things. All unix systems I'm aware of require code in
something other than C to let them do things that C can't do. Nothing
wrong about this - it's just the nature of the beast.

<mike

Michael Hobbs · Oct 15, 2004

Mike Meyer said:
I had once given serious thought to what it would take to write an OS
only in a high-level language. (Standard ML was my language of choice
at the time.) The short answer is, no, it can't be done with Python as
it currently stands.

Click to expand...

But it can be done. It *has* been done.
[snip]
Actually, you don't need those modifications to Python. What you need
is a module written in something other than Python that lets you do
all those things.

Sorry, I should have emphasized the word *only*. I'm talking about writing
an OS *only* with Python. No C, no assembly, just Python. Writing an OS
without assembly would be impossible unless the compiler can somehow map
required processor opcodes to a function or statement in the higher-level
language. Python does not currently have a proper compiler, much less
functions or statements that would, say, put the processor into protected
mode.

-- Michael Hobbs

Cliff Wells · Oct 15, 2004

Yup. I admit that was sub-optimal phrasing. I should have
said something more like "The first version wasn't but the rest
were."

And I thought you were just showing your age

Grant Edwards · Oct 15, 2004

And I thought you were just showing your age

Nah, I'm not quite that old -- my Unix experiece only goes back
20 years to the days of v7 on a PDP11.

Bengt Richter · Oct 15, 2004

Mike Meyer said:
Mike Meyer said:

I had once given serious thought to what it would take to write an OS
only in a high-level language. (Standard ML was my language of choice
at the time.) The short answer is, no, it can't be done with Python as
it currently stands.

Click to expand...

But it can be done. It *has* been done.
[snip]
Actually, you don't need those modifications to Python. What you need
is a module written in something other than Python that lets you do
all those things.

Click to expand...

Sorry, I should have emphasized the word *only*. I'm talking about writing
an OS *only* with Python. No C, no assembly, just Python. Writing an OS
without assembly would be impossible unless the compiler can somehow map
required processor opcodes to a function or statement in the higher-level
language. Python does not currently have a proper compiler, much less
functions or statements that would, say, put the processor into protected
mode.

OTOH, it's a bootstrapping process. You can start by putting together
a python string that contains an image of the boot block on a bootable floppy,
and with suitable permissions you can write it to an actual floppy.

Next you can create a python string that is the image of a boot loader, that
the boot block will know where to find and read into memory using BIOS, if you
put it in a certain place on the floppy. And then you can get the loader to load
your OS image, which is a big bunch of bytes. You might want to structure the floppy
info for access as a primitive (non-patent-infringing ;-/) file system.
Maybe you will want to do a bootable CD instead ;-)

Anyway, CPU instructions are just bytes. As a start, it might be interesting to try to build
a dll extension in an mmap array and import it (be prepared for lots of crashes ;-).

Obviously mmap etc depend on the OS, but if you had prepared a suitable bootable OS
with a primitive file system and mmap functionality, and a userland python VM, ... well,
it's involved, but you can get there. I'd bet pypy will eventually be at the center of
something like that, with maybe a monolithic boot image in flash, and BIOS to boot it.
At some point, on a suitable platform, it will be able to self-host its own development, I'd bet.

Regards,
Bengt Richter

Caleb Hattingh · Oct 15, 2004

That's a very sarcastic reply.

The question, really, is what the effect would be of writing an OS in a
HLL (particularly on speed of development, bug-fixes, and feature
addition).

I think that is worth asking.

Granted, as a minimum Python would need to compile to native machine code
(or interface with a really detailed native virtual machine) but I expect
(hope!) that is going to get done anyway.

Alex Martelli · Oct 15, 2004

Phil Frost said:
I do agree though that there is little point in rewriting Linux in
Python. The advantage in Python is its expressiveness, but by rewriting
an existing system you gain nothing.

By rewriting any existing system in a higher-level language, you may
gain the benefits of higher-level languages -- more concise and readable
expression, enabling maintenance, modification and enrichment.

This is certainly the advantage we're after with the pypy project, which
is based on rewriting the Python interpreter in Python. Google for
pypy, you'll find lots of materials about it on the net. If we can
shrink the interpreter and built-in modules from about 350K lines of
code (based on a rough find/xargs cat/wc on the 2.4 release) down to
about 30k (and we're on target for that -- pypy isn't anywhere like
ready, but already does a lot), the benefits in maintenance, further
future modification, enabling experimentations, etc, should follow.

Re Linux, we just got a fascinating talk at CodeJam Italia, last week,
about the new virtual memory twists that are likely going to be in
kernel 2.6.9. Part of it was this Indian mathematician appearing out of
nowhere with a wonderful new algorithm designed exactly to solve some
subtle problems in the new ("pagetable-less", pardon the
oversimplification!-) architecture for mmap on huge-memory machines.
This is wonderful, BUT, if the source could shrink by a factor of 10 or
more, making them hugely more accessible, think of how much more such
useful, nay, precious, contributions, that might elicit -- and how much
easier it would be for contributors to experiment and play around with a
huge variety of ideas...

The obvious problem might be performance. For that, all of us in the
pypy project (except the few others who can actually follow what's going
on;-) rely on wizard Armin Rigo, of psyco fame -- the idea is that the
pypy framework will "easily" (ha!-) generate machine code (on the fly or
in advance) for the (runtime-restricted) subset of Python which is used
in "interpreter level"... leading in the end to a _faster_ virtual
machine, wrt current pure-C Python, one competitive with CPython+psyco
or even beating it (thanks to advances in "symbolic execution" and type
annotation which are already enabled by using a higher-level language,
rather than C, for the interpreter/VM/execution framework).

pypy still needs to be completed, and proven in the field. But I'm
offering it as an example of the potential benefits of rewriting
existing software in higher-level languages...

Alex

Mike Meyer · Oct 15, 2004

Grant Edwards said:
Exactly. Any number of "low level", statically typed,
compiled, languages would work equally as well. Modula-2 (not
sure about M-3, the GC stuff maybe a bit iffy in a kernel),
Ada, BLISS, PL/I, Pascal, etc.

Haven't operating systems been written in most of those languages? For
a sufficiently loose definition of OS, of course.

<mike

Grant Edwards · Oct 16, 2004

Haven't operating systems been written in most of those languages?

Sure. They weren't Unix, but a kernel's a kernel.

For a sufficiently loose definition of OS, of course.

VMS was mostly in BLISS, I believe. I think MacOS (and maybe
parts of MS-Windows way back when) were written in Pascal. I'm
sure IBM must have written at least one OS in PL/I (whether it
escaped into the wild or not, I don't know). I _think_ Intel
wrote iRMX in PL/M (a PL/I derivitive). I can't think of any
OSes written in Modula-[23] off the top of my head.

Paul Rubin · Oct 16, 2004

Mike Meyer said:
Haven't operating systems been written in most of those languages? For
a sufficiently loose definition of OS, of course.

I'm amazed no one has mentioned the Lisp machine.

Mike Meyer · Oct 16, 2004

Grant Edwards said:
Haven't operating systems been written in most of those languages?

Click to expand...

VMS was mostly in BLISS, I believe. I think MacOS (and maybe
parts of MS-Windows way back when) were written in Pascal. I'm
sure IBM must have written at least one OS in PL/I (whether it
escaped into the wild or not, I don't know). I _think_ Intel
wrote iRMX in PL/M (a PL/I derivitive). I can't think of any
OSes written in Modula-[23] off the top of my head.

DECWRL was heavy into both Modula and new machines/OS's. I'd be
surprised if they didn't write at least one OS in Modula.

<mike

Rod Haper · Oct 16, 2004

Mike said:
Grant Edwards said:

Exactly. Any number of "low level", statically typed,
compiled, languages would work equally as well. Modula-2 (not
sure about M-3, the GC stuff maybe a bit iffy in a kernel),
Ada, BLISS, PL/I, Pascal, etc.

Haven't operating systems been written in most of those languages?

Click to expand...

VMS was mostly in BLISS, I believe. I think MacOS (and maybe
parts of MS-Windows way back when) were written in Pascal. I'm
sure IBM must have written at least one OS in PL/I (whether it
escaped into the wild or not, I don't know). I _think_ Intel
wrote iRMX in PL/M (a PL/I derivitive). I can't think of any
OSes written in Modula-[23] off the top of my head.

Click to expand...

DECWRL was heavy into both Modula and new machines/OS's. I'd be
surprised if they didn't write at least one OS in Modula.

<mike

Hermes was a real-time OS written in Modula-2. The Lilith workstation
used Modula-2 as the system language but the lowlevel OS was in microcode.

SPIN was a UNIX-like research OS written in Modula-3. There are
probably others.

The Oberon OS is written in Oberon the Modula successor.

Operating systems written in the Modula family (including Oberon) tend
to merge the OS with the application level probably due the influence of
Wirth's ideas. They are more like operating environments.

Dennis Lee Bieber · Oct 16, 2004

Haven't operating systems been written in most of those languages? For
a sufficiently loose definition of OS, of course.

Ada was spec'd for use on embedded systems -- with no OS; so it
is full of constructs for specifying the size of a data field, the
address of the data field, etc. (allowing access to memory mapped I/O
hardware, for example). Even the multi-tasking was supposed to be part
of the language run-time library. Most current compilers (or, at least,
GNAT) short-cut this by using OS level tasking features and I/O (via a
wrapper of the C runtime library).

BLISS was DEC's systems programming language. As such, it too
offered direct hardware capability -- at least on the same level as C
(shove a hardware address into a long int, then use it as a
pointer)(probably with link options for absolute addressing rather than
virtual).

I've not looked at PL/I; as I recall from the short coverage in
college, PL/I had a reputation of taking features from Algol, FORTRAN,
COBOL, and anything else available... No doubt it too had a means of
direct hardware access.

Not sure if Pascal had the ability; not as originally defined
(the teaching language, not the extended mutants that made it into
industry).

<mike

--

Bengt Richter · Oct 17, 2004

maybe misleading lack of mention of benefits of compatibility with existing stuff ;-) ...]

Anyway, CPU instructions are just bytes. As a start, it might be interesting to try to build
a dll extension in an mmap array and import it (be prepared for lots of crashes ;-).

Obviously mmap etc depend on the OS, but if you had prepared a suitable bootable OS
with a primitive file system and mmap functionality, and a userland python VM, ... well,
it's involved, but you can get there. I'd bet pypy will eventually be at the center of
something like that, with maybe a monolithic boot image in flash, and BIOS to boot it.
At some point, on a suitable platform, it will be able to self-host its own development, I'd bet.

An OS in the larger sense is of course more than a kernel and some primitives, and even if it
is feasible to re-implement all kinds of utility functions in a new language, it is not very
practical[1] replace a lot of stuff except as a slow migration, happening in cases where someone
would rather rebuild than remodel. So then you wind up asking how to make your bootstrapped OS
able to run legacy binaries, or relink legacy object files with your newly-implemented primitive
libraries, and then whose linker you want to have do that, and how to execute *that* etc. So you
will probably either adopt a standard existing ABI or build some adaptation layer for your kernel
that provides it. Anyway, I just wanted to mention that I think there is enormous mass and inertia
in the way legacy tools and applications are represented in various format files, and it takes
a long time for something new to take hold. And a kludge that works to accomplish a money-making-critical
task today pretty much beats any vision for tomorrow, until someone has a new today thing ready
to compete in the tomorrow. Even then, the tendency is for the new to contain a 1401 emulator or such ;-/
[1] Of course, practicality may beat purity, but fun beats practicality ;-)

Regards,
Bengt Richter

New way of writing socket servers in #Linux kernel 3.9 (and in#Python too)	3	Aug 24, 2013
Linux: using "clone3" and "waitid"	0	Oct 17, 2023
[C Language] Need help transferring Linux CodeBlocks Project to Windows CodeBlocks Project	1	Jun 19, 2023
Pointers in python?	1	Feb 6, 2024
Is there a way to create kernel log messages via Python?	4	Oct 16, 2012
Running a python script under Linux	8	Dec 13, 2012
How does one install Python 3.7 in HPC without root access on Linux?	1	Jul 27, 2023
Reimplenting Linux Kernel in Python	0	Oct 17, 2004

Reimplenting Linux Kernel in Python

Grant Edwards

Mike Meyer

Michael Hobbs

Cliff Wells

Grant Edwards

Bengt Richter

Caleb Hattingh

Alex Martelli

Mike Meyer

Grant Edwards

Paul Rubin

Mike Meyer

Rod Haper

Dennis Lee Bieber

Bengt Richter

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads