lightning
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 0/9] mips: Fill delay slots v3


From: Paul Cercueil
Subject: Re: [PATCH v2 0/9] mips: Fill delay slots v3
Date: Wed, 18 Jan 2023 12:42:22 +0000

Hi Paulo,

Le dimanche 15 janvier 2023 à 09:54 -0300, Paulo César Pereira de
Andrade a écrit :
> Em dom., 15 de jan. de 2023 às 08:05, Paul Cercueil
> <paul@crapouillou.net> escreveu:
> > 
> > Hi Paulo,
> 
>   Hi Paul,
> 
> > Le samedi 14 janvier 2023 à 22:42 -0300, Paulo César Pereira de
> > Andrade
> > a écrit :
> > > Em sáb., 14 de jan. de 2023 às 12:11, Paul Cercueil
> > > <paul@crapouillou.net> escreveu:
> > > > 
> > > > Hi Paulo,
> > > 
> > >   Hi Paul,
> > > 
> > >   Patches pushed as all tests pass.
> > 
> > Great! ;)
> > 
> > > > Here's the V3 of my patchset that attempts to fill branch delay
> > > > slots on
> > > > MIPS.
> > > > 
> > > > All tests do pass now, except the "float.nodata" check, but
> > > > this
> > > > one
> > > > fails in master as well.
> > > 
> > >   What environment are you using for testing? It works in the
> > > mips
> > > environments I am testing.
> > > 
> > >   Please let me know the output of:
> > > 
> > > $ cpp -dM </dev/null
> > 
> > Just a mips64 toolchain built using buildroot.
> > 
> > I believe you meant
> > cpp -dM -E - < /dev/null
> 
>   It is a very recent gcc, and probably kernel. My guess is that the
> issue
> is caused by some kind of unaligned access. Is it real hardware or
> some
> emulator?
> 
>   Can you please run it as:
> 
> $ cd check
> $ make debug
> ...
> (gdb) r -d float.tst
> 
> and see what if it is just somehow creating a bad immediate or if
> it is triggering some kind of fault, like misaligned or invalid
> instruction?

This was under Qemu with my MIPS64 toolchain.

However today as I try to reproduce it, all the checks do pass. Maybe a
system update on my PC brought a fix (to qemu), or maybe it does happen
only under some conditions.

I'll debug it if I encounter it again.

> > See attachment.
> > 
> > > > I implemented your suggestion and it worked just fine. There
> > > > was
> > > > one
> > > > last problem though, which caused the "clobber" check to fail
> > > > once
> > > > again; code emitters that generated branches to the next
> > > > instruction (to conditionally execute some opcodes) caused
> > > > problems, as
> > > > the next instruction was not detected as a jump target (no
> > > > "jit_flag_patch" set). I solved this by just checking the
> > > > previous
> > > > node's code against a table of accepted values.
> > > 
> > >   Not certain if I fully understand this. It also appears to be a
> > > somewhat
> > > fragile logic.
> > > 
> > >   Everytime the delay slot is used, there should be a comment
> > > telling that instruction is in the delay slot, for example:
> > > 
> > > static void
> > > _ltr_f(jit_state_t *_jit, jit_int32_t r0, jit_int32_t r1,
> > > jit_int32_t
> > > r2)
> > > {
> > >     jit_word_t        w;
> > >     C_OLT_S(r1, r2);
> > >     w = _jit->pc.w;
> > >     BC1T(0);
> > >     /* delay slot */
> > >     movi(r0, 1);
> > >     movi(r0, 0);
> > >     patch_at(w, _jit->pc.w);
> > > }
> > 
> > There is a comment indeed, but the comment is gone as soon as you
> > compile the code.
> > 
> > If you have a 'ltr_f' followed by a 'beqr', the code will try to
> > find a
> > delay slot for the 'beqr', and will detect that the 'movi(r0, 0)'
> > (aka.
> > the last opcode generated by 'ltr_f') can safely be moved, because
> > there is nothing that can tell us that the last movi is a jump
> > target.
> > 
> > So I just blacklist 'ltr_f' as well as others so that the emitted
> > code
> > will always be correct.
> 
>   I understand now.  The problem I see is that it is not very clear
> from
> the patch, and/or changes elsewhere could cause it to no longer be
> required, or have some missing check if new lightning instructions
> are added.
> 
>   The problem is that can_swap_ds() does not check the pattern of a
> jump in 2 (for fpr comparison) or 3 (for compare-and-swap)
> instructions
> before the last machine instruction that it attempts to move.

Yes, it only checks the last instruction. Going further than that would
increase the complexity of the code, I think.

>   Currently there isn't a clear way to pass parameters to jit
> generation.
>   The closest is the jit_cpu_t for arm and x86. Maybe we could add a
> jit_cpu_t for mips, and have an option to make the swap with delay
> slot optional? This would make it easy to validate that if a bug is
> found
> in the future, it can be easily verified that it is not a side effect
> of the
> changes in these patches. That is, have a defined jit_cpu_t for mips
> in include/lightning/jit_mips.h and then in
> lib/jit_mips.c:jit_get_cpu()
> set it to a default value, and use the flag allowing or disallowing
> the
> patch during jit generation.

That should be doable, yes. But what would set the value of the flag?
Would it be a ./configure option?

> > > so, you check the previous opcode for float comparison and
> > > cas{r,i}.
> > > Shouldn't it also check for b{o,x}{add,sub}{r,i} codes? These
> > > also
> > > end with a delay slot usage, and based on the patches and your
> > > description, not checking them might create very difficult to
> > > debug
> > > bugs in jit doing complex branch logic on carry/overflow.
> > 
> > The b{o,x}{add,sub}{r,i} end up with a delay slot, and that's
> > perfectly
> > fine. The algorithm will detect that and won't try to move the
> > opcode
> > (that's the has_delay_slot() check in can_swap_ds()).
> > 
> > Cheers,
> > -Paul
> 
> Thanks,
> Paulo

Cheers,
-Paul



reply via email to

[Prev in Thread] Current Thread [Next in Thread]