lightning
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Idea : load/stores with pre-decrement / post-increment


From: Paulo César Pereira de Andrade
Subject: Re: Idea : load/stores with pre-decrement / post-increment
Date: Fri, 22 Dec 2023 14:10:17 -0300

Em qui., 21 de dez. de 2023 às 16:20, Paul Cercueil
<paul@crapouillou.net> escreveu:
>
> Hi Paulo,
>
> Le jeudi 21 décembre 2023 à 09:05 -0300, Paulo César Pereira de Andrade
> a écrit :
> > Em qui., 21 de dez. de 2023 às 07:23, Paul Cercueil
> > <paul@crapouillou.net> escreveu:
> >
> > [SNIP]
> >
> > > >   So, the idea is the pattern:
> > > >
> > > > jit_ldxbr_T(R0, R1, DISP), jit_ldxar_T(R0, R1, DISP)
> > > > jit_stxbr_T(R0, R1, DISP) and jit_stxar_T(R0, R1, DISP)
> > > >
> > > > where the fallback/generic version does addi of DISP in the base
> > > > register (b)efore
> > > > or (a)fter the load and otherwise is a normal jit_ldr_T or
> > > > jit_str_T.
> > >
> > > Do we need DISP? In your examples above it's always equal to
> > > sizeof(T),
> > > (or I guess the negative sizeof(T) as well) and that would be my
> > > assumption. I'm not against it, but it sounds a bit error-prone, as
> > > well as redundant since the suffix _c _s _i already tells you the
> > > increment/decrement value. Unless you want to accept arbitrary
> > > increment/decrement values (as e.g. ARM supports that) but such
> > > usage
> > > wouldn't be very typical.
> > >
> > > On the other hand... I like that it supports all cases with just 4
> > > new
> > > instructions.
> >
> >   I might still change it. But current WIP test implementation is:
> >
> > ldxbi_T r0 r1 IM
> > ldxai_T r0 r1 IM
> > stxbi_T IM r1 r0
> > stxai_T IM r1 r0
> >
> >   That is, it matches the pattern of the existing ldxi_T and stxi_T.
> > T
> > is the usual c, uc, s, us, i, ui, l, f and d. IM is an immediate
> > integer.
> >   This leaves room for a ldxbr_T, ldxar_T, stxbr_T and stxar_T, that
> > might be implemented based on what supported ports provide and
> > if any port has a version with a variable value to increment the base
> > register.
> >   The IM value can be positive or negative, and can be useful in
> > iterations in a vector of complex types where the loop iteration
> > adjusts
> > the base pointer and loads the element at some offset in a single
> > instruction.
> >   At least arm64 will support it (not certain about float types) in a
> > single
> > instruction. Basically it is an indexed load that also pre or post
> > increments
> > the base register.
>
> Ok, that's fine.
>
> Thanks for working on it!

  First experimental commit pushed. It implements the fallback for
all ports and is tested in most ports.

  Now assuming we are fine with the concept, should check what
can be optimized and in what ports.

> Cheers,
> -Paul

Thanks,
Paulo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]