lightning
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Mips delay slot optimization


From: Paulo César Pereira de Andrade
Subject: Re: Mips delay slot optimization
Date: Fri, 24 Feb 2023 11:25:26 -0300

Em sex., 24 de fev. de 2023 às 10:00, Paul Cercueil
<paul@crapouillou.net> escreveu:
>
> Le vendredi 24 février 2023 à 08:17 -0300, Paulo César Pereira de
> Andrade a écrit :
> > Em sex., 24 de fev. de 2023 às 08:08, Paul Cercueil
> > <paul@crapouillou.net> escreveu:
> > >
> > > Hi Paulo,
> >
> >   Hi Paul,
> >
> > > Le jeudi 23 février 2023 à 13:34 -0300, Paulo César Pereira de
> > > Andrade
> > > a écrit :
> > > >  Hi Paul,
> > > >
> > > >   I wrote a new logic to optimize delay slot usage for the mips
> > > > port.
> > > >
> > > >  It was broken for mips6_p() anyway, so now there is  basically a
> > > > full decoder to check if an instruction can be added to the delay
> > > > slot,
> > > > and a few new helpers to it.
> > > >
> > > >   A basic documentation of how it works is in the commit
> > > > changelog.
> > > >
> > > >   I want to believe it is far more complete than your previous
> > > > set of
> > > > patches, and it also forces adding extra logic to the decoder if
> > > > implementing new lightning functionality.
> > > >
> > > >   To disable it in a hacky mode, just change instr() to never
> > > > keep
> > > > a pending() instruction.
> > > >
> > > >   Please let me know if you see any regressions, or if somehow it
> > > > is missing some delay slot usage optimization that was done in
> > > > your patches.
> >
> >   It was a really very large change, and it would be easy to get
> > regressions.
> >
> >   The new patch is far more aggressive, and relies in both, correct
> > usage
> > of jit_get_reg_for_delay_slot(), flush(), pending() and delay(), as
> > well as
> > jit_get_reg_for_delay_slot() correctly decoding the instruction and
> > internally calling flush() if even if getting a valid register, still
> > moving the
> > 'pending()' instruction to the delay slot would cause a problem.
> >
> > > There are definitely regressions, my emulator just hangs when
> > > running
> > > 8032a68, while 2.2.1 worked fine.
> >
> >   Please try first with 8e5ba87. I did a new review of the patches,
> > and
> > found a major logic error in jmpi. It was passing "make check" in
> > different
> > test environments, but was by accident, surely soon or later the bug
> > would arise.
>
> It seems like 8e5ba87 fixed it, thanks.
>
> Do you think this mechanism could be more backend-agnostic?

  For the approach I did for mips, the main issue is having a complete
decoder in jit_get_reg_for_delay_slot().

  There is also the approach in the ia64 port, that keeps up to two pending
instructions, and uses a jit_regset_t to make sure a register is not
read/written
in the same cycle (there is also a bit mask for predicates).

> I'm thinking in particular about my SH4 port, where some branch opcodes
> also have delay slots.

  It is mandatory to create a decoder in jit_get_reg_for_delay().
  There isn't a magic way to get a generic b<OP>r and b<OP>i, due to
backend specific integer comparison, if it is required to implement some
with inverted arguments, if it is required a SLT and a SLTU, etc.
  Overall, just need toadapt the branching code to use flush(), pending()
and delay() as used in mips, and create the backend specific version of
jit_get_reg_for_delay_slot().

  It is possible to extend the logic to have more pending instructions so that
if one cannot be moved to the delay slot, can choose another. The logic
might not be too complex. It is worth checking the percentage of missed
optimizations, and if it is significant, it would be good to create such code.
This could also be a base for support for instruction scheduling. There is
still a TODO for backend specific optimizations, and this could be the
chance to start working on it.

> Cheers,
> -Paul

Thanks,
Paulo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]