[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
CapC: making C programs safe
CapC: making C programs safe
Mon, 31 Dec 2001 14:48:04 GMT
Hello Guile developers,
Back in September I e-mailed this list about a scheme I had been
working on for translating C programs into a memory-safe language.
Since then I've been implementing this, writing a C compiler which I'm
calling `CapC' (as it implements C using capabilities). It's now at a
point where it will run a demonstration program. I've put a copy of
The relevance of this to Guile is that it lets C programs be subjected
to precise garbage collection. So a C program using Guile could be
recompiled with this scheme to give a potential performance
improvement, as well as improving reliability. Also, Guile itself
could be recompiled with CapC, after stripping out its garbage
collector, to use the garbage collector provided by whatever backend
CapC uses. This could be applied to other interpreters, like Emacs
Here's the longer introduction from the Web site:
CapC is a C compiler which aims to convert any C program into one that
is memory-safe, without human intervention. The behaviour of a C
program will only change in the cases where it is buggy or malicious,
in which case it will throw an error at run-time.
How is this done? It is not enough to check that every memory access
is for a location in an allocated block (as various debuggers do),
since it is easy to overflow an array bound and get a pointer to the
wrong memory block.
CapC assumes that a block of memory should only be accessed through
pointer values that are dependent on the block's address (a number
which is arbitrarily chosen by `malloc'). This can be approximated by
storing, with each word-sized value, the set of blocks whose address
the value is dependent on. A numeric constant would evaluate to a
value associated with an empty set, while a call to `malloc' would
return a value associated with a set containing one block. These sets
are merged on arithmetic operations, and checked on pointer accesses.
Effectively, an abstract `word' data type is provided, through which
blocks containing further words can be indirectly accessed.
I am optimistic that the resulting safe program can be made fast
enough that CapC can be used not just as a debugging aid, but as a way
to compile programs normally. This can be done using static analysis.
What are the implications of this?
* It can be used as a debugging aid, to catch bugs during testing.
* It can be used to improve security, preventing buffer overrun
* It lets precise garbage collection be applied to C programs.
* It lets C programs be made portably persistent, allowing the state
of a program to be saved to disc periodically so that it could be
restored after a system crash or migrated to a different machine --
and without relying on operating-system specific features for
* It makes it easier to interoperate languages:
* A C library and a program in a high-level language can run under
the same run-time system (once the C code is recompiled). The
high-level language's run-time system no longer has to be
designed to work with traditionally-compiled C code.
* CapC could be used to process C header files to provide
high-level programs with direct access to C structs and
* An ageing language implementation, such as Emacs Lisp, could be
rejuvenated by removing its garbage collector and compiling it
with CapC, which can provide a more efficient garbage collector.
* CapC turns C's pointers into references (which memory-safe
languages like Scheme, Java and ML have), also known as
capabilities. The same approach can be used to turn filenames into
capabilities. This can be used to eliminate the Confused
Deputy Problem, which is caused by Unix's setuid feature, and by
principal/ACL-based security systems (such as Unix) in general.
* The CapC compiler can serve as a basis for experimenting with
extensions to C: it is hopefully easier to understand than a large
compiler like gcc because it is small and written in a high-level
CapC is based on a formal semantics of C written by Nikolaos
Current progress: CapC is currently an interpreter and can run a test
program slowly. Some language features (eg. gotos, declaration
initializers) have not been implemented yet. It's slow because the
Word type hasn't been optimised yet, and run-time variable lookups are
done via a binary tree; I'm going to add a compiler backend.
- address@hidden - http://www.srcf.ucam.org/~mrs35/ -
A bad tool blames its workman.
|[Prev in Thread]
||[Next in Thread]|
- CapC: making C programs safe,
Mark Seaborn <=