Safe subset of C?

R

Robert Vazan

I am looking for other people's attempts to create safe subset of C and
enforce it with scripts. Does anybody know about anything like this?

By "safe", I mean the following:
* Strongly typed memory. No way to reinterpret it as bunch of bytes
* Recovery from invalid and NULL pointers other than crash
* Possibility to isolate piece of code by not giving it key pointers

Library used to support such safe subset must not introduce its own flaws.
For example, it is not a good idea to use int proxies for pointers like
Unix API does, because this allows pointer guessing and consequently
prevents isolation.
 
M

Morris Dovey

Robert said:
I am looking for other people's attempts to create safe subset of C and
enforce it with scripts. Does anybody know about anything like this?

By "safe", I mean the following:
* Strongly typed memory. No way to reinterpret it as bunch of bytes
* Recovery from invalid and NULL pointers other than crash
* Possibility to isolate piece of code by not giving it key pointers

Library used to support such safe subset must not introduce its own flaws.
For example, it is not a good idea to use int proxies for pointers like
Unix API does, because this allows pointer guessing and consequently
prevents isolation.

Robert...

Search in Google groups (comp.lang.c). There have already been a
number of threads discussing this topic.
 
M

Mark Haigh

Robert said:
I am looking for other people's attempts to create safe subset of C and
enforce it with scripts. Does anybody know about anything like this?

By "safe", I mean the following:
* Strongly typed memory. No way to reinterpret it as bunch of bytes
* Recovery from invalid and NULL pointers other than crash
* Possibility to isolate piece of code by not giving it key pointers

Library used to support such safe subset must not introduce its own flaws.
For example, it is not a good idea to use int proxies for pointers like
Unix API does, because this allows pointer guessing and consequently
prevents isolation.

Look at MISRA C guidelines, at www.misra.org.uk, which is enforcable
with commercial lint-like tools. You must order a hardcopy. I did and
found it to be an interesting read.

However, if you're really interested in high-integrity coding, perhaps
something like SPARK (Ada subset) may interest you as well.

If you insist on something C-based, MISRA-C with something like VxWorks
for Safety Critical Systems (www.windriver.com) may be a candidate,
depending on what you're looking for.


Mark F. Haigh
(e-mail address removed)
 
K

Kelsey Bjarnason

I am looking for other people's attempts to create safe subset of C and
enforce it with scripts. Does anybody know about anything like this?

By "safe", I mean the following:
* Strongly typed memory. No way to reinterpret it as bunch of bytes

It occurs to me that this requirement alone pretty much removes your
quest from anything remotely related to C.
 
S

Simon Biber

Kelsey Bjarnason said:
It occurs to me that this requirement alone pretty much removes
your quest from anything remotely related to C.

It depends how you define "remotely related to C". You would essentially
have to disallow pointer arithmetic and therefore change the way that
arrays work. It starts to look quite like Java, and indeed Java has many
features for limited 'sandbox' operation inbuilt, for running unauthorised
code on client machines. I'd say Java is still related to C.
 
J

James Hu

It occurs to me that this requirement alone pretty much removes your
quest from anything remotely related to C.

It is a "safe subset". You could define a subset that:

* does not allow the use of unions;
* does not allow declarations of void pointers;
* does not allow defining functions that return void pointers; and
* does not allow casts.

You could write a lint like tool to help enforce the subset. Also, any
diagnostic emitted by the compiler would have to be addressed.

-- James
 
R

Robert Vazan

* does not allow the use of unions;
* does not allow declarations of void pointers;
* does not allow defining functions that return void pointers; and
* does not allow casts.

Add arrays, ellipsis arguments, and memory deallocation and I can start
considering it safe. Some replacement must be provided for arrays and
memory management. Your rules also prohibit interfaces.
 
J

James Hu

Add arrays,

Why? I was questioning myself about disallowing unions. Since acquiring
the value of a member union other than the last one stored into invokes
unspecified behavior, a "good enough" lint should be able to flag this.

Similarly for arrays, bounds checking should be enforced.

I guess the safe subset should explicitly state:

* does not allow unspecified or undefined behaviors.
ellipsis arguments,

Yes, I suppose definining your own varable argument functions should be
disallowed. Using printf and friends should be allowed, though.
and memory deallocation and I can start
considering it safe.

Hmm? If you disallow memory deallocation, you have to disallow memory
allocation as well.

If unspecified and undefined behaviors are not allowed, memory
deallocation should be safe.

I guess one could require an interface for each type to be allocated and
deallocated:

int * malloc_int_array(int number_of_ints);
void free_int_array(int *int_array);

And have the free wrapper do whatever it needed to do to make sure it
was freeing something its corresponding wrapper allocated. But a good
memory error debugging tool can already help enforce that.
Some replacement must be provided for arrays and
memory management.

I think I took care of that.
Your rules also prohibit interfaces.

How so?

-- James
 
G

Gordon Burditt

It is a "safe subset". You could define a subset that:
* does not allow the use of unions;
* does not allow declarations of void pointers;
* does not allow defining functions that return void pointers; and
* does not allow casts.

You could write a lint like tool to help enforce the subset. Also, any
diagnostic emitted by the compiler would have to be addressed.

A really "safe" subset of C needs to disallow:

- Pointers or pointer-valued expressions, including (library or
otherwise) functions that accept or return them.
- Variables.
- Side effects, particularly including assignment, op= operators,
I/O, and memory allocation/deallocation.
- Casts.

I think it is possible to have the compiler compile this to "safe"
assembly language with one of three opcodes: halt EXIT_SUCCESS,
halt EXIT_FAILURE, or branch-to-self.

Gordon L. Burditt
 
R

Robert Vazan


Bounds of simple C arrays can be looked up, but it is computationally
costly. It is better to store item count next to the array, which
implies custom array type, so no raw C arrays.
I was questioning myself about disallowing unions. Since acquiring
the value of a member union other than the last one stored into invokes
unspecified behavior, a "good enough" lint should be able to flag this.

That's heuristics. It catches such behavior sometimes, but not always.
Heuristic tools increase in complexity without bounds and they never quite
make it.
If unspecified and undefined behaviors are not allowed, memory
deallocation should be safe.

Deallocation invalidates all variables that pointed into freed area. I
need working verifier, not just 1000 pages of rules. Undefined behaviors
that appear during memory deallocation cannot be catched without aiding
verifier with extra syntax.
I guess one could require an interface for each type to be allocated and
deallocated:

int * malloc_int_array(int number_of_ints);
void free_int_array(int *int_array);

And have the free wrapper do whatever it needed to do to make sure it
was freeing something its corresponding wrapper allocated.

Standard malloc and free can do this already.

I should have said virtual functions. Virtual functions need to downcast
pointer passed to them. C++ will do it invisibly and safely, but C
requires cast from void pointer to structure pointer.
 
R

Robert Vazan

A really "safe" subset of C needs to disallow:

Here I assume that you claim that this is best (least restrictive) safe
subset that can be made.
- Pointers or pointer-valued expressions, including (library or
otherwise) functions that accept or return them.

Too restrictive. You cannot show that it is necessary.
- Variables.
Why?

- Side effects, particularly including assignment, op= operators,
Why?

I/O,

Standard I/O maybe, but why all I/O?
and memory allocation/deallocation.

Unnecessarily restrictive, once again.
I think it is possible to have the compiler compile this to "safe"
assembly language with one of three opcodes: halt EXIT_SUCCESS,
halt EXIT_FAILURE, or branch-to-self.

Sure, but I am uncertain whether your subset is really the only option.
 
S

Simon Biber

Robert Vazan said:
Sure, but I am uncertain whether your subset is really the only option.

I am reasonably certain that Gordon was joking!

However, it does bear some wisdom -- a completely 'safe subset' is
a pipe dream.

A static compile-time lint checker is quite limited; you can do a lot
more with run-time checking for array bounds, format specifiers,
generally regulating access to memory.
 
J

James Hu

Bounds of simple C arrays can be looked up, but it is computationally
costly. It is better to store item count next to the array, which
implies custom array type, so no raw C arrays.

You want a safe C subset with built-in runtime protection? Just use
a safer language.

In C, I would say your best option is to use tests to achieve code
coverage and boundary conditions on code that is instrumented
specifically to catch such errors, and this instrumentation should be
compile time removable once verification is complete.

Some of waht you want to do can be achieved through static analysis,
but requires extra hints provided in the form of stylized comments
that the preprocessor can understand.
That's heuristics. It catches such behavior sometimes, but not always.
Heuristic tools increase in complexity without bounds and they never
quite make it.

Of course they are complex. But writing provably correct code can also
increase in complexity without bounds (the complexity of writing the
code increases with the complexity of the software specification), and
some would argue they never quite make it either.
Deallocation invalidates all variables that pointed into freed area. I
need working verifier, not just 1000 pages of rules. Undefined
behaviors that appear during memory deallocation cannot be catched
without aiding verifier with extra syntax.

A runtime diagnostic tool, such as purify, can verify the correctness
of your program with proper test coverage.
Standard malloc and free can do this already.

My suggestion prevents allocating a structure and assigning it to some
other pointer type.
I should have said virtual functions. Virtual functions need to
downcast pointer passed to them. C++ will do it invisibly and safely,
but C requires cast from void pointer to structure pointer.

Downcasting can be performed safely with the proper instrumentation.
The objects that is the context of the interface should be opaque,
and the function that creates such objects can set a special
field with a signature value that the other routines can check
against before attempting the downcast.

If my rules are relaxed to remove the union restriction (but still
prohibit unspecified and undefined behavior), as I had suggested
earlier, then the downcasting can be safely achieved via accessing union
members at the cost of explicitly enumerating the types that are safe to
downcast to.

-- James
 
R

Robert Vazan

I am reasonably certain that Gordon was joking!

I understood it too. Jokes are often used to make a claims that nobody can
argue with (it was joke, so what), but that still make it into minds of
people. I wanted to show that I don't share his pessimistic view.
However, it does bear some wisdom -- a completely 'safe subset' is
a pipe dream.

What, Java sandbox doesn't work? I must disable it in my browser...
Processes don't work? Poor ISPs granting shell access to customers. I know
that both Java and Unix have security holes, but the concept is good.
A static compile-time lint checker is quite limited; you can do a lot
more with run-time checking for array bounds, format specifiers,
generally regulating access to memory.

Supporting library can do run-time checking instead of language. Verifier
can then enforce use of the library. The art is to design it so that the
result still looks like C.
 
R

Robert Vazan

You want a safe C subset with built-in runtime protection? Just use
a safer language.

C has certain advantages like tool support, simplicity, and large share of
smart programmers. The only debugger for Java is in Microsoft's J++, AFAIK.
Some of waht you want to do can be achieved through static analysis,
but requires extra hints provided in the form of stylized comments
that the preprocessor can understand.

Stylized comments are acceptable.
But writing provably correct code can also
increase in complexity without bounds (the complexity of writing the
code increases with the complexity of the software specification), and
some would argue they never quite make it either.

Proof for certain aspects can be easy to inline into code and easy to
verify. Complexity grows up only if you try to prove everything. Allowing
small library to protect itself is just about enough for me.
 
S

Sheldon Simms

Supporting library can do run-time checking instead of language. Verifier
can then enforce use of the library. The art is to design it so that the
result still looks like C.

Why? If you don't like how C works, why not just use a different language?
 
R

Richard Heathfield

Robert Vazan wrote:

[...] C requires cast from void pointer to structure pointer.

No, it doesn't.

#include <time.h>
void foo(void *p)
{
struct tm *ptm = p; /* no cast required */
}
 
J

James Hu

Stylized comments are acceptable.
http://www.google.com/search?q=splint&btnI=I'm+Feeling+Lucky


Proof for certain aspects can be easy to inline into code and
easy to verify.

That is rather simplistic view, and it is a naive application that
leaves such verification code enabled all the time (e.g., verifying
qsort really sorted the array after each invocation).
Complexity grows up only if you try to prove everything.
Allowing small library to protect itself is just about
enough for me.

Complexity grows whenever the system you are verifying becomes more
complex. Suppose you are just verifying a small library. Whenever
you add a new interface, you have increased the complexity and the
proof burden. This is true both of interfaces you expose to clients
of the library, but also of interfaces to other sub-systems that
the small library is dependent upon.

Program correctness is getting to be off-topic for this newsgroup.
If you want to pursue the issue further, I would suggest following
up in comp.software-eng.

Anyway, most C programmers will use assert() (or implement their own
variation of it) to verify assumptions.

-- James
 
S

Sheldon Simms

C has certain advantages like tool support, simplicity, and large share of
smart programmers.

I guess you figure that all those "smart programmers" are incapable of
using any other language. I can't speak for anyone else, but I wouldn't be
interested in working in crippled C. However, I have no problem learning a
new language if that's what the project requires.
The only debugger for Java is in Microsoft's J++, AFAIK.

There are many debuggers for Java, usually integrated in one of the very
many IDEs for Java. There is also a command line debugger that comes with
the standard java distribution.
 
S

Simon Biber

Robert Vazan said:
I understood it too. Jokes are often used to make a claims that nobody can
argue with (it was joke, so what), but that still make it into minds of
people. I wanted to show that I don't share his pessimistic view.


What, Java sandbox doesn't work? I must disable it in my browser...

It has the potential for misuse, such as spamming lots of windows or
unkillable dialog boxes... see even the javascript (yes I know it's
not Java, but it's still an example of a sandboxed language):
while(1) alert("Please Click OK");
which on many (older) browsers required a forced kill of the program.
Processes don't work? Poor ISPs granting shell access to customers. I know
that both Java and Unix have security holes, but the concept is good.

Fewer and fewer ISPs do grant shell access in my experience. The costs
associated with system admin and general policing of customers are high.
Supporting library can do run-time checking instead of language. Verifier
can then enforce use of the library. The art is to design it so that the
result still looks like C.

So you need to regulate array access; how? Your supporting library must
hook into every single array access:

int int_item( const int *array, size_t index);
long long_item( const long *array, size_t index);
short short_item( const short *array, size_t index);
double double_item(const double *array, size_t index);
float float_item( const float *array, size_t index);
char char_item( const char *array, size_t index);
etc.

Then you must redefine every single library function so it accesses arrays in
terms of these accessor functions?!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top