[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Tinycc-devel] Proposal for handling alloca(). Anyone see a problem with

From: David A. Wheeler
Subject: [Tinycc-devel] Proposal for handling alloca(). Anyone see a problem with it?
Date: Mon, 07 May 2007 12:31:00 -0400 (EDT)

I'm looking into implementing alloca() in tinycc; the patch in grischka's 
proposal doesn't do the job correctly at ALL.  Instead, I'm thinking of using 
an approach similar to the public-domain code by D.A. Gwyn (which uses a linked 
list of malloc'ed areas and their stack positions), and then adding some 
tinycc-specific tweaks to make the result MUCH more effective (so it'll ALWAYS 
correctly deallocate on function return or longjmp).  Below is what I'm 
planning to do... and why.

Please speak up ASAP if you see a big problem, or think of a better idea. And 
if you have a better idea, please let us know if you're willing to code it :-).


The function alloca allocates memory (like malloc): given a size (in 
chars/bytes), it allocates that memory and returns a pointer to it.  The 
difference is that on function return, all memory allocated by alloca() is 
supposed to be automatically released.   A number of programs REQUIRE alloca.

Many C compilers implement alloca() by simply manipulating the stack pointer.  
Indeed, grischka's patch does this (though only for x86-32 Windows).  But in 
tinycc's case, the obvious way to do this will fail horrifically in many cases, 
because alloca() can be embedded inside an expression, like this:
  a( b(4+c()), alloca(x*g()), h()+n());
In this example, a naive alloca implementation would be in the middle of 
computing this expression, with some intermediate values on the stack, and the 
alloca() call would then screw up the stack DURING the expression 
calculation... causing the whole expression to miscalculate.  Trying to handle 
alloca() greatly complicates declaring other values later (including nested 
{...} with declarations in them).  grischk's approach also appears to silently 
fail for other reasons in some cases (e.g., when you declare large variables 
after the alloca), but I'm beating a dead horse.

In other C compilers, handling alloca() in the middle of an expression is a 
mere matter of a few more lines of code that split up handling of the 
allocation through alloca() separately from the handling of expressions, 
declarations, etc.   That's no big deal in a multi-pass compiler.  But tinycc 
doesn't HAVE an extra processing phase where that kind of logic can be easily 
inserted - we've ALREADY generated code by that point, and we won't know till 
later what declarations are ahead.  There are possible tricks, like modifying 
already-generated code to self-patch, but they're hideous, hard to get right, 
and easy to get wrong.  It's very painful to handle {..} inside blocks, or 
later declarations in the same function, in this way.  Something SIMPLE and 
CORRECT is important; absolute maximum performance is NOT required, it just has 
to have 'reasonable' performance.

Another naive approach is to simply map alloca() to malloc().  At least that 
gets programs running, and it IS simple, but it leaks memory like no tomorrow.  
Not nice.  We can do MUCH better.

Instead, I intend to start with public-domain code by D.A. Gwyn.  It uses 
malloc() to obtain memory, and maintains a linked list containing stack depth 
and allocation information.  Every time you use alloca(), it first frees all 
previous alloca() allocations that were from a deeper stack position.  This 
means that alloca() allocations get freed once a less-deep alloca() call 
occurs.  It's not perfect, but for a lot of circumstances this is "more than 
good enough".  Though if all the alloca allocations are at the same level, an 
occasional call to alloca(0) at the "main loop" may be needed to fully free 
them all.

I think I can easily tweak this slightly so that alloca() would REALLY keep its 
semantics.   In essence, modify the code generators so that any function that 
calls alloca() would use a DIFFERENT function epilogue. (You can easily detect 
an alloca() use and set a flag, and then look at this flag when generating the 
epilogue.  This means the single-pass nature of tinycc is no problem here.)  
The alloca-specialized epilogue would walk the linked list and free all the 
allocations at that stack depth or greater before returning.  I would also 
modify longjmp/siglongjmp so that after they reset the stack pointer, they'd 
free everything allocated at a deeper location in the stack.  The advantages of 
this approach are:
1. alloca() would work EXACTLY correctly.  ALL memory allocations would ALWAYS 
be freed, immediately, on function return or stack-unwind due to 
longjmp/siglongjmp.  And you can use alloca() anywhere it's legal to use, 
including deeply embedded inside expressions.  Nice!
2. It doesn't require major brain surgery to tinycc.  The changes are 
relatively local, and much of the functionality can be in the tinycc library 
(easing maintenance).
3. It only impacts functions that USE alloca().  Functions that don't call 
alloca() don't pay the cost of checking if there's anything to free().  If you 
don't use alloca, it only adds to longjmp a quick check of a pointer (if the 
pointer is non-null, you have work to do anyway).

[Oh, why have a single linked list, instead of a per-function linked list?  
Because on a longjmp you may unwind many functions; it's easier to unwind 
starting from a single master list.  When  freeing, you stop at the first item 
that isn't relevant, so it's probably faster too.]

There are minor downsides. It's not QUITE as efficient as a traditional 
alloca(), because you're fiddling with malloc() instead of changing the stack 
pointer.  If threading were introduced, these alloca lists would need to be 
stored per-thread and malloc would have to be thread-safe, which is doable 
though annoying.  This approach could introduce memory fragmentation via 
malloc() in theory, though in practice I think that's a non-issue.  I think 
it's worth it.

A possible future optimization would be noticing special cases where it'd be 
okay to use stack pointer manipulations directly, and do that instead.  That'd 
be nifty.  But it's quite complicated even outside complex expressions, because 
blocks can LATER contain declarators (even standard C requires support for 
this, via embedded {...}).

I think this approach will be "efficient enough", it will work correctly in ALL 
cases, the implementation will be small, and it should be fairly easy to show 
correct.  It's quite likely that no one will bother to implement that future 
optimization, since I expect this approach to work "good enough" in all cases.  
I'd rather support the general case first, and then only if it MATTERS worry 
about optimizing special cases.

Interestingly, once alloca() is implemented, we can implement extensions like 
non-constant-length automatic arrays.  E.G., we could implement:
double f(int n) {
    double x[n];
by having the compiler detect that "n" is not constant and in such cases 
generate code equivalent to:
 double * const x = alloca(sizeof(double)*n);
This would mean that non-constant-length automatic arrays would NOT be stored 
in the stack but on the heap; code that DEPENDS on exact placement for 
non-constant-length arrays is already insane anyway.

--- David A. Wheeler 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]