lightning
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: IR / Argument registers


From: Marc Nieper-Wißkirchen
Subject: Re: IR / Argument registers
Date: Tue, 7 Nov 2023 08:08:18 +0100

Am Mo., 6. Nov. 2023 um 19:32 Uhr schrieb Paulo César Pereira de Andrade <paulo.cesar.pereira.de.andrade@gmail.com>:
Em seg., 6 de nov. de 2023 às 14:26, Marc Nieper-Wißkirchen
<marc.nieper+gnu@gmail.com> escreveu:
>
> Am Mo., 6. Nov. 2023 um 17:58 Uhr schrieb Paulo César Pereira de Andrade <paulo.cesar.pereira.de.andrade@gmail.com>:
> [...]
>
>>
>> > Thanks! If the generated code is PC-relative, this looks indeed like the best approach.
>>
>>   Only conditional branches are pc relative.
>>
>> > What do you mean by "non-conditional branches only in special cases"?  This would, of course, break relocating the code. Do you think these cases can be identified and avoided?  Or would a flag be possible that makes GNU lightning avoid non-PC-relative instructions?
>>
>>   Relative non conditional branches when the distance is known to be
>> in range and
>> there is some always true condition.
>>
>>   There are no optimized patches (with small/large relative displacements). They
>> would be backend specific anyway.
>>   The easiest approach is to just use jit_movi() on addresses to be able to
>> record where to patch with jit_address() (jit_address() must be called after
>> code generation), and use only jit_jmpr and jit_callr. Other branches are
>> PC relative, no need to make it worse on purpose.
>
>
> I am not sure whether I understood.  Do you mean that the use of jmpi in the factorial example ("jmpi fact_entry") in the manual (https://www.gnu.org/software/lightning/manual/lightning.html#GNU-lightning-examples) would be problematic?

   Just checked all implementations. It is a  backward jump. All ports do
it as a pc relative unconditional branch, as long as the branch distance
is reasonable. As long as it fits in signed 16 bit, I believe all ports do it
as an unconditional branch (need to verify to make sure).
  PC relative forward branches are implemented only in the most common
ports, and were done recently. It just checks if the branch fits even if it
is in the last byte of the jit buffer. If it fits, it generates a PC
relative jump.
These should  be at least aarch64, arm, mips, loongarch and x86_64.

Thank you for your explanation.  So, how do you handle the situation when an absolute jump may be generated (e.g. when the jump is a forward jump on an uncommon target or when the jump distance is too long for some target)?  Would it be possible to rewrite these jumps into (possibly less efficient) PC-relative ones?  (This was the idea of the potential flag I mentioned above.)
 
>> >> > 2.
>> >> >
>> >> > Paul, also some years ago, sent a patch making JIT_A(n), corresponding to the n-th register argument, available to the user.  Could we eventually get something like it into the upstream version?
>> >>
>> >>   Replying partially for Paul :)
>> >>
>> >>   Paul is using it, but not in any public api. It is not officially
>> >> supported to use the argument
>> >> registers, but it is perfectly valid and is expected to work. Just be
>> >> prepared for things like
>> >> an assertion if running out of registers and needing a temporary that
>> >> cannot be spilled,
>> >> like when needing a temporary to compute a branch target.
>> >>
>> >>   It is "guaranteed" to work on the expected usage, because of the
>> >> stress test in check/carg.c.
>> >> Depending on what you need, it (check/carg.c) is a very good example
>> >> on how to manage
>> >> arguments that might be in registers. This is required because
>> >> arguments in registers might
>> >> be clobbered as the current code does not save/reload it if a function
>> >> call is required or there
>> >> is a jump to an unknown target (jmpr or jmpi used not as an
>> >> unconditional branch).
>> >
>> >

>> > That GNU lightning can run out of registers is not directly related to argument registers, is it?  There are some targets that do not have any register arguments, so even if all register arguments couldn't be clobbered on the some target, the situation wouldn't be worse than on a target without any register arguments, would it?

carg.c showed the need of some changes in some backends due to using
all argument registers, and the need of a temporary in one place or another.
It might generate bad code by spill/reloading too much registers, or have
a condition it cannot handle, usually when asking for a temporary register
with the jit_class_nospill or'ed with flags and none available; using also
jit_class_chk will return JIT_NOREG in such a case.

>> > It would be great if it could be documented how many registers are allowed to be live at each point.  Without it, an aggressive register allocator might accidentally go beyond GNU lightning's limits.

  If possible, it is suggested to keep one gpr and one fpr register free to
guarantee no bad code generation will happen due to spill/reload of
temporaries. Obviously you should check the generated assembly to
choose what is better; frequently using something like:
jit_movi(reg, value); jit_pushargr(reg);
would be better than a simple jit_pushargi(value) that would require
saving/restoring a temporary.

>> > As for the JIT_A(n) arguments, if everything works as "guaranteed", maybe they can be made an official part (with the note that each call and unknown jump will clobber them).
>>
>>   The default number of argument registers is zero. Using JIT_R3 or larger
>> as well as JIT_V3 or larger needs knowing backend specific information.
>>   The only case where there are not registers is the arm port when there
>> is no fpu. In that case registers are faked and emulated in the stack.
>>
>>   Could indeed define JIT_A_NUM, JIT_FA_NUM, and provide the JIT_A()
>> and JIT_FA() macros. Using it also might need several calls to jit_live()
>> and understanding the conditions such a register is considered dead.
>> Well, it is the same as JIT_Rx, so, nothing special.

Making this part of the public, documented API would be nice, especially for ports where half of the registers are used up for argument registers (e.g. x86_64 with System V ABI).
 
>>
>>   There might be special corner cases to be discovered, as usually the
>> first argument is also the return value.  While very unlikely, it is possible.
>
>
> One could add a function ret_register_p, similar to arg_register_p.

  Can just compare with JIT_RET and JIT_FRET :)

>>    Before extending too much Lightning, it is required to have very good
>> error handling and reporting. Just calling abort() is not much helpful, really
>> bad for a shared library, and having more and more ways to allow code
>> generation or runtime abort() calls without a clear explanation is not a
>> good idea. Should also change the assert() calls to some way of having
>> error handling. Possibly not define NDEBUG if not in a DEBUG build, and
>> having some error callback that would allow cleaning any resources.
>
>
> When GNU lightning is driven by a higher-level register allocator and the allocator uses too many registers simultaneously, it is already too late.  Of course, if GNU lightning didn't crash but called some well-defined call-back, it would be nicer but not much more helpful, I think, because the register allocator would not know where to decrease register pressure in its output.

  Just pass jit_class_chk when requesting a register and have the code
handle the case there is no register available. But I strongly suggest to
use fixed values based on the backend. Before starting emit code it
uses jit_regno_patch or'ed. If you do it, should be for very few code, and
absolutely without branches. If there are branches, just have predefined
usage.

  See https://github.com/pcpa/owl/blob/master/lib/oemit.c#L345 for how
explicit register values are choosen in a way that does not depend on
the backend.
  You might also use an approach of implementing your own register
allocator instead of asking Lightning to do it. Lightning does not have
global live state information knowledge, only partial when entering
code emit, but your code can handle it, and will know if a register is
callee save using the proper macro. The argument registers are not
exported, but you can also use it based on the backend, as information
is in public headers, just not documented :). They will work exactly like
a JIT_Rx sans special cases while constructing a function call, where
writing to it would destroy the argument value.

> What's really needed are, in my opinion, hard rules about how many JIT_Rx, JIT_Vx, ... and (if available) JIT_(F)Ax registers can be kept alive simultaneously.  If the code calling GNU lightning does not respect these rules, it is okay for GNU lightning to crash.

  The only way to cause an abort() is to add a jit_live(REG) call for every
non callee save register after a branch that needs a temporary for the
target; usually a very long branch that does not fit in a relative branch.
  Lightning just considers all non callee save registers dead at the branch
point if there is a jump it cannot follow to decide if it is live or dead.

> I guess the really critical places are those before jumps, which may need a scratch register.  At all other places, GNU lightning can insert spill instructions before and after.

  Exactly, but it is still not a good idea to have too much register pressure,
otherwise bad code generation with too much sequential spill/reloads can
happen.

Thank you.  Are the flags jit_class_chk, jit_class_no_spill, jit_regno_patch, ... documented somewhere?  If not, could you briefly describe what they exactly do?

Best,

Marc


reply via email to

[Prev in Thread] Current Thread [Next in Thread]