lightning
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lightning] Adding support for a multi pass intermediate representat


From: Paulo César Pereira de Andrade
Subject: Re: [Lightning] Adding support for a multi pass intermediate representation
Date: Mon, 8 Nov 2010 17:35:56 -0200

Em 8 de novembro de 2010 15:38, Paolo Bonzini <address@hidden> escreveu:
> 2010/11/6 Paulo César Pereira de Andrade
> <address@hidden>:
>> [snip]
>>
>> The problem obviously is that it forgets about the type when it
>> merges code paths, but this is remediated by having 3 lists, and
>> being able to dynamically construct states on the C stack while
>> generating jit, so that it only merges the code paths when the
>> value type becomes unknown.
>>
>>  Comments?
>
> Looks pretty good!  But about the last paragraph, I think you're
> reinventing dataflow analysis, aren't you?  What do you mean by "3
> lists"?  What would you do about loops?

  I think I am somewhat reinventing dataflow analysis indeed, or,
the concept is a special case. By 3 lists I mean maintain 3 lists
of instructions, but there can be more; one for integer types, one
for float types and one generic to fallback to call C vm functions
that choose the code path for a single operation in switch statements.
With 3 lists, for example, it is easier to have something like a
<int_path> boaddr_i <label_in_unknown_type_path> %r0 %r1
and "update" or create labels in the different list in parallel.

  The previous statement should actually jump to code coercing
the operation to use a mpz_t, but for now the initial code only
keeps track of integers and floats, and the main idea is to keep
the vm state only in registers for as long as possible.

  I know this is somewhat bogus, but, for example, this small
sample of the language code:
-%<-
void test() {
    auto a, b;
    for (a = b = 0; a < 10000000; ++a)
        b += a;
    print("%d\n", b);
}
test();
-%<-

runs in around 0.7 seconds in my i686 computer when it runs
entirely in jit. A statically typed version, declare "a" as int32_t
and "b" as int64_t, runs in 0.25, but, if the entire code is not in
jit, it becomes 4-20 times slower; the more C calls the slower
it is, but there is a big gap of calling only one C function and
having the entire loop in jit. I know not that great, but in the
end, it runs at the same speed of "gcc --O0" when defining
the variables as "long long". To become faster, it would probably
need to avoid storing results to memory in basic block boundaries,
so that it could run the entire loop with state in registers.

  About loops (in the language), the initial code is not that great, it
just ensures that register and global implicit type/value are in sync
on basic block boundaries, but most times it needs to sync anyway
due to register pressure; currently the usage is:

%r0 - scratch
%r1 - most times a pointer to load/store
%r2 - scratch
%v0 - thread pointer (implicit value, vm stack and base pointer, etc
         accessed from it)
%v1 - integer - low word on 32 bits
%v2 - high word on 32 bits, otherwise scratch
%f0 - double implicit value
%f1 - float implicit value
%f2-%f5 - scratch (currently only %f2 used as float operand)

  This means floats are invalidated if need to call a vm function,
and integers only if the vm function changes the implicit value.

  Later on, I should write better code where there are more
registers, for example, on mips or x86_64 I could tie free
registers to variables.

  But, the idea of a generic intermediate representation could
actually provide a good infrastructure to allow having loops in
my language, that do not need sync state in basic block
boundaries. Most of it should work with any lightning port,
but, since it should wrap all calls (due to keeping state in
lists), it should also be able to use extra instructions, e.g.
use 8 bit displacement jumps if available, avoid the need
of saving a register when a temporary is needed as it could
know about free/dead registers, etc.

> Paolo

  I will try to post again soon, when I have a more complete
implementation, for now I am working on it in the sources
of my language, to also let the idea mature a bit more.

Thanks,
Paulo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]