Safe subset of C?

Discussion in 'C Programming' started by Robert Vazan, Nov 21, 2003.

  1. Robert Vazan

    Robert Vazan Guest

    I am looking for other people's attempts to create safe subset of C and
    enforce it with scripts. Does anybody know about anything like this?

    By "safe", I mean the following:
    * Strongly typed memory. No way to reinterpret it as bunch of bytes
    * Recovery from invalid and NULL pointers other than crash
    * Possibility to isolate piece of code by not giving it key pointers

    Library used to support such safe subset must not introduce its own flaws.
    For example, it is not a good idea to use int proxies for pointers like
    Unix API does, because this allows pointer guessing and consequently
    prevents isolation.
     
    Robert Vazan, Nov 21, 2003
    #1
    1. Advertising

  2. Robert Vazan

    Morris Dovey Guest

    Robert Vazan wrote:

    > I am looking for other people's attempts to create safe subset of C and
    > enforce it with scripts. Does anybody know about anything like this?
    >
    > By "safe", I mean the following:
    > * Strongly typed memory. No way to reinterpret it as bunch of bytes
    > * Recovery from invalid and NULL pointers other than crash
    > * Possibility to isolate piece of code by not giving it key pointers
    >
    > Library used to support such safe subset must not introduce its own flaws.
    > For example, it is not a good idea to use int proxies for pointers like
    > Unix API does, because this allows pointer guessing and consequently
    > prevents isolation.


    Robert...

    Search in Google groups (comp.lang.c). There have already been a
    number of threads discussing this topic.

    --
    Morris Dovey
    West Des Moines, Iowa USA
    C links at http://www.iedu.com/c
    Read my lips: The apple doesn't fall far from the tree.
     
    Morris Dovey, Nov 21, 2003
    #2
    1. Advertising

  3. Robert Vazan

    Mark Haigh Guest

    Robert Vazan wrote:
    > I am looking for other people's attempts to create safe subset of C and
    > enforce it with scripts. Does anybody know about anything like this?
    >
    > By "safe", I mean the following:
    > * Strongly typed memory. No way to reinterpret it as bunch of bytes
    > * Recovery from invalid and NULL pointers other than crash
    > * Possibility to isolate piece of code by not giving it key pointers
    >
    > Library used to support such safe subset must not introduce its own flaws.
    > For example, it is not a good idea to use int proxies for pointers like
    > Unix API does, because this allows pointer guessing and consequently
    > prevents isolation.
    >


    Look at MISRA C guidelines, at www.misra.org.uk, which is enforcable
    with commercial lint-like tools. You must order a hardcopy. I did and
    found it to be an interesting read.

    However, if you're really interested in high-integrity coding, perhaps
    something like SPARK (Ada subset) may interest you as well.

    If you insist on something C-based, MISRA-C with something like VxWorks
    for Safety Critical Systems (www.windriver.com) may be a candidate,
    depending on what you're looking for.


    Mark F. Haigh
     
    Mark Haigh, Nov 21, 2003
    #3
  4. On Fri, 21 Nov 2003 13:47:49 +0100, Robert Vazan wrote:

    > I am looking for other people's attempts to create safe subset of C and
    > enforce it with scripts. Does anybody know about anything like this?
    >
    > By "safe", I mean the following:
    > * Strongly typed memory. No way to reinterpret it as bunch of bytes


    It occurs to me that this requirement alone pretty much removes your
    quest from anything remotely related to C.
     
    Kelsey Bjarnason, Nov 22, 2003
    #4
  5. Robert Vazan

    Simon Biber Guest

    "Kelsey Bjarnason" <> wrote:
    > On Fri, 21 Nov 2003 13:47:49 +0100, Robert Vazan wrote:
    > > By "safe", I mean the following:
    > > * Strongly typed memory. No way to reinterpret it as bunch of
    > > bytes

    >
    > It occurs to me that this requirement alone pretty much removes
    > your quest from anything remotely related to C.


    It depends how you define "remotely related to C". You would essentially
    have to disallow pointer arithmetic and therefore change the way that
    arrays work. It starts to look quite like Java, and indeed Java has many
    features for limited 'sandbox' operation inbuilt, for running unauthorised
    code on client machines. I'd say Java is still related to C.

    --
    Simon.
     
    Simon Biber, Nov 22, 2003
    #5
  6. Robert Vazan

    James Hu Guest

    On 2003-11-22, Kelsey Bjarnason <> wrote:
    > On Fri, 21 Nov 2003 13:47:49 +0100, Robert Vazan wrote:
    >
    >> I am looking for other people's attempts to create safe subset of C and
    >> enforce it with scripts. Does anybody know about anything like this?
    >>
    >> By "safe", I mean the following:
    >> * Strongly typed memory. No way to reinterpret it as bunch of bytes

    >
    > It occurs to me that this requirement alone pretty much removes your
    > quest from anything remotely related to C.


    It is a "safe subset". You could define a subset that:

    * does not allow the use of unions;
    * does not allow declarations of void pointers;
    * does not allow defining functions that return void pointers; and
    * does not allow casts.

    You could write a lint like tool to help enforce the subset. Also, any
    diagnostic emitted by the compiler would have to be addressed.

    -- James
     
    James Hu, Nov 22, 2003
    #6
  7. Robert Vazan

    Robert Vazan Guest

    On Sat, 22 Nov 2003 00:08:59 -0600, James Hu wrote:

    > * does not allow the use of unions;
    > * does not allow declarations of void pointers;
    > * does not allow defining functions that return void pointers; and
    > * does not allow casts.


    Add arrays, ellipsis arguments, and memory deallocation and I can start
    considering it safe. Some replacement must be provided for arrays and
    memory management. Your rules also prohibit interfaces.
     
    Robert Vazan, Nov 22, 2003
    #7
  8. Robert Vazan

    James Hu Guest

    On 2003-11-22, Robert Vazan <> wrote:
    > On Sat, 22 Nov 2003 00:08:59 -0600, James Hu wrote:
    >
    >> * does not allow the use of unions;
    >> * does not allow declarations of void pointers;
    >> * does not allow defining functions that return void pointers; and
    >> * does not allow casts.

    >
    > Add arrays,


    Why? I was questioning myself about disallowing unions. Since acquiring
    the value of a member union other than the last one stored into invokes
    unspecified behavior, a "good enough" lint should be able to flag this.

    Similarly for arrays, bounds checking should be enforced.

    I guess the safe subset should explicitly state:

    * does not allow unspecified or undefined behaviors.

    > ellipsis arguments,


    Yes, I suppose definining your own varable argument functions should be
    disallowed. Using printf and friends should be allowed, though.

    > and memory deallocation and I can start
    > considering it safe.


    Hmm? If you disallow memory deallocation, you have to disallow memory
    allocation as well.

    If unspecified and undefined behaviors are not allowed, memory
    deallocation should be safe.

    I guess one could require an interface for each type to be allocated and
    deallocated:

    int * malloc_int_array(int number_of_ints);
    void free_int_array(int *int_array);

    And have the free wrapper do whatever it needed to do to make sure it
    was freeing something its corresponding wrapper allocated. But a good
    memory error debugging tool can already help enforce that.

    > Some replacement must be provided for arrays and
    > memory management.


    I think I took care of that.

    > Your rules also prohibit interfaces.


    How so?

    -- James
     
    James Hu, Nov 22, 2003
    #8
  9. >It is a "safe subset". You could define a subset that:
    >
    > * does not allow the use of unions;
    > * does not allow declarations of void pointers;
    > * does not allow defining functions that return void pointers; and
    > * does not allow casts.
    >
    >You could write a lint like tool to help enforce the subset. Also, any
    >diagnostic emitted by the compiler would have to be addressed.


    A really "safe" subset of C needs to disallow:

    - Pointers or pointer-valued expressions, including (library or
    otherwise) functions that accept or return them.
    - Variables.
    - Side effects, particularly including assignment, op= operators,
    I/O, and memory allocation/deallocation.
    - Casts.

    I think it is possible to have the compiler compile this to "safe"
    assembly language with one of three opcodes: halt EXIT_SUCCESS,
    halt EXIT_FAILURE, or branch-to-self.

    Gordon L. Burditt
     
    Gordon Burditt, Nov 23, 2003
    #9
  10. Robert Vazan

    Robert Vazan Guest

    On Sat, 22 Nov 2003 12:45:39 -0600, James Hu wrote:

    > On 2003-11-22, Robert Vazan <> wrote:
    >> Add arrays,

    >
    > Why?


    Bounds of simple C arrays can be looked up, but it is computationally
    costly. It is better to store item count next to the array, which
    implies custom array type, so no raw C arrays.

    > I was questioning myself about disallowing unions. Since acquiring
    > the value of a member union other than the last one stored into invokes
    > unspecified behavior, a "good enough" lint should be able to flag this.


    That's heuristics. It catches such behavior sometimes, but not always.
    Heuristic tools increase in complexity without bounds and they never quite
    make it.

    > If unspecified and undefined behaviors are not allowed, memory
    > deallocation should be safe.


    Deallocation invalidates all variables that pointed into freed area. I
    need working verifier, not just 1000 pages of rules. Undefined behaviors
    that appear during memory deallocation cannot be catched without aiding
    verifier with extra syntax.

    > I guess one could require an interface for each type to be allocated and
    > deallocated:
    >
    > int * malloc_int_array(int number_of_ints);
    > void free_int_array(int *int_array);
    >
    > And have the free wrapper do whatever it needed to do to make sure it
    > was freeing something its corresponding wrapper allocated.


    Standard malloc and free can do this already.

    >> Your rules also prohibit interfaces.

    >
    > How so?


    I should have said virtual functions. Virtual functions need to downcast
    pointer passed to them. C++ will do it invisibly and safely, but C
    requires cast from void pointer to structure pointer.
     
    Robert Vazan, Nov 23, 2003
    #10
  11. Robert Vazan

    Robert Vazan Guest

    On Sun, 23 Nov 2003 01:21:18 +0000, Gordon Burditt wrote:

    > A really "safe" subset of C needs to disallow:


    Here I assume that you claim that this is best (least restrictive) safe
    subset that can be made.

    > - Pointers or pointer-valued expressions, including (library or
    > otherwise) functions that accept or return them.


    Too restrictive. You cannot show that it is necessary.

    > - Variables.


    Why?

    > - Side effects, particularly including assignment, op= operators,


    Why?

    > I/O,


    Standard I/O maybe, but why all I/O?

    > and memory allocation/deallocation.


    Unnecessarily restrictive, once again.

    > I think it is possible to have the compiler compile this to "safe"
    > assembly language with one of three opcodes: halt EXIT_SUCCESS,
    > halt EXIT_FAILURE, or branch-to-self.


    Sure, but I am uncertain whether your subset is really the only option.
     
    Robert Vazan, Nov 23, 2003
    #11
  12. Robert Vazan

    Simon Biber Guest

    "Robert Vazan" <> wrote:
    > On Sun, 23 Nov 2003 01:21:18 +0000, Gordon Burditt wrote:
    > > I think it is possible to have the compiler compile this to "safe"
    > > assembly language with one of three opcodes: halt EXIT_SUCCESS,
    > > halt EXIT_FAILURE, or branch-to-self.

    >
    > Sure, but I am uncertain whether your subset is really the only option.


    I am reasonably certain that Gordon was joking!

    However, it does bear some wisdom -- a completely 'safe subset' is
    a pipe dream.

    A static compile-time lint checker is quite limited; you can do a lot
    more with run-time checking for array bounds, format specifiers,
    generally regulating access to memory.

    --
    Simon.
     
    Simon Biber, Nov 23, 2003
    #12
  13. Robert Vazan

    James Hu Guest

    On 2003-11-23, Robert Vazan <> wrote:
    > On Sat, 22 Nov 2003 12:45:39 -0600, James Hu wrote:
    >
    >> On 2003-11-22, Robert Vazan <> wrote:
    >>> Add arrays,

    >>
    >> Why?

    >
    > Bounds of simple C arrays can be looked up, but it is computationally
    > costly. It is better to store item count next to the array, which
    > implies custom array type, so no raw C arrays.


    You want a safe C subset with built-in runtime protection? Just use
    a safer language.

    In C, I would say your best option is to use tests to achieve code
    coverage and boundary conditions on code that is instrumented
    specifically to catch such errors, and this instrumentation should be
    compile time removable once verification is complete.

    Some of waht you want to do can be achieved through static analysis,
    but requires extra hints provided in the form of stylized comments
    that the preprocessor can understand.

    >> I was questioning myself about disallowing unions. Since acquiring
    >> the value of a member union other than the last one stored into
    >> invokes unspecified behavior, a "good enough" lint should be able to
    >> flag this.

    >
    > That's heuristics. It catches such behavior sometimes, but not always.
    > Heuristic tools increase in complexity without bounds and they never
    > quite make it.


    Of course they are complex. But writing provably correct code can also
    increase in complexity without bounds (the complexity of writing the
    code increases with the complexity of the software specification), and
    some would argue they never quite make it either.

    >> If unspecified and undefined behaviors are not allowed, memory
    >> deallocation should be safe.

    >
    > Deallocation invalidates all variables that pointed into freed area. I
    > need working verifier, not just 1000 pages of rules. Undefined
    > behaviors that appear during memory deallocation cannot be catched
    > without aiding verifier with extra syntax.


    A runtime diagnostic tool, such as purify, can verify the correctness
    of your program with proper test coverage.

    >> I guess one could require an interface for each type to be allocated and
    >> deallocated:
    >>
    >> int * malloc_int_array(int number_of_ints);
    >> void free_int_array(int *int_array);
    >>
    >> And have the free wrapper do whatever it needed to do to make sure it
    >> was freeing something its corresponding wrapper allocated.

    >
    > Standard malloc and free can do this already.


    My suggestion prevents allocating a structure and assigning it to some
    other pointer type.

    >>> Your rules also prohibit interfaces.

    >>
    >> How so?

    >
    > I should have said virtual functions. Virtual functions need to
    > downcast pointer passed to them. C++ will do it invisibly and safely,
    > but C requires cast from void pointer to structure pointer.


    Downcasting can be performed safely with the proper instrumentation.
    The objects that is the context of the interface should be opaque,
    and the function that creates such objects can set a special
    field with a signature value that the other routines can check
    against before attempting the downcast.

    If my rules are relaxed to remove the union restriction (but still
    prohibit unspecified and undefined behavior), as I had suggested
    earlier, then the downcasting can be safely achieved via accessing union
    members at the cost of explicitly enumerating the types that are safe to
    downcast to.

    -- James
     
    James Hu, Nov 23, 2003
    #13
  14. Robert Vazan

    Robert Vazan Guest

    On Mon, 24 Nov 2003 01:37:17 +1100, Simon Biber wrote:

    > I am reasonably certain that Gordon was joking!


    I understood it too. Jokes are often used to make a claims that nobody can
    argue with (it was joke, so what), but that still make it into minds of
    people. I wanted to show that I don't share his pessimistic view.

    > However, it does bear some wisdom -- a completely 'safe subset' is
    > a pipe dream.


    What, Java sandbox doesn't work? I must disable it in my browser...
    Processes don't work? Poor ISPs granting shell access to customers. I know
    that both Java and Unix have security holes, but the concept is good.

    > A static compile-time lint checker is quite limited; you can do a lot
    > more with run-time checking for array bounds, format specifiers,
    > generally regulating access to memory.


    Supporting library can do run-time checking instead of language. Verifier
    can then enforce use of the library. The art is to design it so that the
    result still looks like C.
     
    Robert Vazan, Nov 23, 2003
    #14
  15. Robert Vazan

    Robert Vazan Guest

    On Sun, 23 Nov 2003 10:41:40 -0600, James Hu wrote:

    > You want a safe C subset with built-in runtime protection? Just use
    > a safer language.


    C has certain advantages like tool support, simplicity, and large share of
    smart programmers. The only debugger for Java is in Microsoft's J++, AFAIK.

    > Some of waht you want to do can be achieved through static analysis,
    > but requires extra hints provided in the form of stylized comments
    > that the preprocessor can understand.


    Stylized comments are acceptable.

    > But writing provably correct code can also
    > increase in complexity without bounds (the complexity of writing the
    > code increases with the complexity of the software specification), and
    > some would argue they never quite make it either.


    Proof for certain aspects can be easy to inline into code and easy to
    verify. Complexity grows up only if you try to prove everything. Allowing
    small library to protect itself is just about enough for me.
     
    Robert Vazan, Nov 23, 2003
    #15
  16. On Sun, 23 Nov 2003 18:40:21 +0100, Robert Vazan wrote:

    > On Mon, 24 Nov 2003 01:37:17 +1100, Simon Biber wrote:
    >
    >> A static compile-time lint checker is quite limited; you can do a lot
    >> more with run-time checking for array bounds, format specifiers,
    >> generally regulating access to memory.

    >
    > Supporting library can do run-time checking instead of language. Verifier
    > can then enforce use of the library. The art is to design it so that the
    > result still looks like C.


    Why? If you don't like how C works, why not just use a different language?
     
    Sheldon Simms, Nov 23, 2003
    #16
  17. Robert Vazan wrote:

    <snip>

    > [...] C requires cast from void pointer to structure pointer.


    No, it doesn't.

    #include <time.h>
    void foo(void *p)
    {
    struct tm *ptm = p; /* no cast required */
    }

    --
    Richard Heathfield :
    "Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
    C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
    K&R answers, C books, etc: http://users.powernet.co.uk/eton
     
    Richard Heathfield, Nov 23, 2003
    #17
  18. Robert Vazan

    James Hu Guest

    On 2003-11-23, Robert Vazan <> wrote:
    > On Sun, 23 Nov 2003 10:41:40 -0600, James Hu wrote:
    >
    >> Some of waht you want to do can be achieved through static analysis,
    >> but requires extra hints provided in the form of stylized comments
    >> that the preprocessor can understand.

    >
    > Stylized comments are acceptable.


    http://www.google.com/search?q=splint&btnI=I'm+Feeling+Lucky

    >> But writing provably correct code can also increase in complexity
    >> without bounds (the complexity of writing the code increases with the
    >> complexity of the software specification), and some would argue they
    >> never quite make it either.

    >
    > Proof for certain aspects can be easy to inline into code and
    > easy to verify.


    That is rather simplistic view, and it is a naive application that
    leaves such verification code enabled all the time (e.g., verifying
    qsort really sorted the array after each invocation).

    > Complexity grows up only if you try to prove everything.
    > Allowing small library to protect itself is just about
    > enough for me.


    Complexity grows whenever the system you are verifying becomes more
    complex. Suppose you are just verifying a small library. Whenever
    you add a new interface, you have increased the complexity and the
    proof burden. This is true both of interfaces you expose to clients
    of the library, but also of interfaces to other sub-systems that
    the small library is dependent upon.

    Program correctness is getting to be off-topic for this newsgroup.
    If you want to pursue the issue further, I would suggest following
    up in comp.software-eng.

    Anyway, most C programmers will use assert() (or implement their own
    variation of it) to verify assumptions.

    -- James
     
    James Hu, Nov 23, 2003
    #18
  19. On Sun, 23 Nov 2003 18:55:58 +0100, Robert Vazan wrote:

    > On Sun, 23 Nov 2003 10:41:40 -0600, James Hu wrote:
    >
    >> You want a safe C subset with built-in runtime protection? Just use
    >> a safer language.

    >
    > C has certain advantages like tool support, simplicity, and large share of
    > smart programmers.


    I guess you figure that all those "smart programmers" are incapable of
    using any other language. I can't speak for anyone else, but I wouldn't be
    interested in working in crippled C. However, I have no problem learning a
    new language if that's what the project requires.

    > The only debugger for Java is in Microsoft's J++, AFAIK.


    There are many debuggers for Java, usually integrated in one of the very
    many IDEs for Java. There is also a command line debugger that comes with
    the standard java distribution.
     
    Sheldon Simms, Nov 24, 2003
    #19
  20. Robert Vazan

    Simon Biber Guest

    "Robert Vazan" <> wrote in message
    news:p...
    > On Mon, 24 Nov 2003 01:37:17 +1100, Simon Biber wrote:
    >
    > > I am reasonably certain that Gordon was joking!

    >
    > I understood it too. Jokes are often used to make a claims that nobody can
    > argue with (it was joke, so what), but that still make it into minds of
    > people. I wanted to show that I don't share his pessimistic view.
    >
    > > However, it does bear some wisdom -- a completely 'safe subset' is
    > > a pipe dream.

    >
    > What, Java sandbox doesn't work? I must disable it in my browser...


    It has the potential for misuse, such as spamming lots of windows or
    unkillable dialog boxes... see even the javascript (yes I know it's
    not Java, but it's still an example of a sandboxed language):
    while(1) alert("Please Click OK");
    which on many (older) browsers required a forced kill of the program.

    > Processes don't work? Poor ISPs granting shell access to customers. I know
    > that both Java and Unix have security holes, but the concept is good.


    Fewer and fewer ISPs do grant shell access in my experience. The costs
    associated with system admin and general policing of customers are high.

    > > A static compile-time lint checker is quite limited; you can do a lot
    > > more with run-time checking for array bounds, format specifiers,
    > > generally regulating access to memory.

    >
    > Supporting library can do run-time checking instead of language. Verifier
    > can then enforce use of the library. The art is to design it so that the
    > result still looks like C.


    So you need to regulate array access; how? Your supporting library must
    hook into every single array access:

    int int_item( const int *array, size_t index);
    long long_item( const long *array, size_t index);
    short short_item( const short *array, size_t index);
    double double_item(const double *array, size_t index);
    float float_item( const float *array, size_t index);
    char char_item( const char *array, size_t index);
    etc.

    Then you must redefine every single library function so it accesses arrays in
    terms of these accessor functions?!

    --
    Simon.
     
    Simon Biber, Nov 24, 2003
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Gabriel Rossetti
    Replies:
    0
    Views:
    1,366
    Gabriel Rossetti
    Aug 29, 2008
  2. Replies:
    1
    Views:
    356
    Brian Candler
    Aug 12, 2003
  3. Aredridel

    Not just $SAFE, but damn $SAFE

    Aredridel, Sep 2, 2004, in forum: Ruby
    Replies:
    19
    Views:
    254
  4. Farrel Lifson

    $SAFE =4 safe enough?

    Farrel Lifson, Aug 29, 2006, in forum: Ruby
    Replies:
    7
    Views:
    112
    Eric Hodel
    Aug 31, 2006
  5. John Nagle
    Replies:
    5
    Views:
    487
    John Nagle
    Mar 12, 2012
Loading...

Share This Page