qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 18/37] target/ppc: implement vgnb


From: Richard Henderson
Subject: Re: [PATCH v3 18/37] target/ppc: implement vgnb
Date: Fri, 11 Feb 2022 17:15:54 +1100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0

On 2/10/22 23:34, matheus.ferst@eldorado.org.br wrote:
+    for (int dw = 1; dw >= 0; dw--) {
+        get_avr64(vrb, a->vrb, dw);
+        for (; in >= 0; in -= a->n, out--) {
+            if (in > out) {
+                tcg_gen_shri_i64(tmp, vrb, in - out);
+            } else {
+                tcg_gen_shli_i64(tmp, vrb, out - in);
+            }
+            tcg_gen_andi_i64(tmp, tmp, 1ULL << out);
+            tcg_gen_or_i64(rt, rt, tmp);
+        }
+        in += 64;
+    }

This is going to produce up to 3*64 operations (n=2).

You can produce more than one output pairing per shift,
and produce the same result in 3*lg2(64) operations.

I've given an example like this on the list before, recently.
I think it was in the context of some riscv bit manipulation.

N = 2

AxBxCxDxExFxGxHxIxJxKxLxMxNxOxPxQxRxSxTxUxVxWxXxYxZx0x1x2x3x4x5x
  & rep(0b10)
A.B.C.D.E.F.G.H.I.J.K.L.M.N.O.P.Q.R.S.T.U.V.W.X.Y.Z.0.1.2.3.4.5.
  << 1
.B.C.D.E.F.G.H.I.J.K.L.M.N.O.P.Q.R.S.T.U.V.W.X.Y.Z.0.1.2.3.4.5..
  |
ABBCCDDEEFFGGHHIIJJKKLLMMNNOOPPQQRRSSTTUUVVWWXXYYZZ001122334455.
  & rep(0b1100)
AB..CD..EF..GH..IJ..KL..MN..OP..QR..ST..UV..WX..YZ..01..23..45..
  << 2
..CD..EF..GH..IJ..KL..MN..OP..QR..ST..UV..WX..YZ..01..23..45....
  |
ABCDCDEFEFGHGHIJIJKLKLMNMNOPOPWQQRSTSTUVUVWXWXYZYZ010123234545..
  & rep(0xf0)
ABCD....EFGH....IJKL....MNOP....QRST....UVWX....YZ01....2345....
  << 4
....EFGH....IJKL....MNOP....QRST....UVWX....YZ01....2345........
  |
ABCDEFGHEFGHIJKLIJKLMNOPMNOPQRSTQRSTUVWXUVWXYZ01YZ0123452345....
  & rep(0xff00)
ABCDEFGH........IJKLMNOP........QRSTUVWX........YZ012345........
  << 8
........IJKLMNOP........QRSTUVWX........YZ012345................
  |
ABCDEFGHIJKLMNOPIJKLMNOPQRSTUVWXQRSTUVWXYZ012345YZ012345........
  & rep(0xffff0000)
ABCDEFGHIJKLMNOP................QRSTUVWXYZ012345................
  deposit(t, 32, 16)
ABCDEFGHIJKLMNOPQRSTUVWXYZ012346................................

and similarly for larger N. For N >= 4, I believe that half of the masking may be elided, because there are already zeros in which to place bits.

N = 5

AxxxxBxxxxCxxxxDxxxxExxxxFxxxxGxxxxHxxxxIxxxxJxxxxKxxxxLxxxxMxxx
  & rep(0b10000)
A....B....C....D....E....F....G....H....I....J....K....L....M...
  << (5 - 1)
.B....C....D....E....F....G....H....I....J....K....L....M.......
  |
AB...BC...CD...DE...EF...FG...GH...HI...IJ...JK...KL...LM...M...
  << (10 - 2)
..CD...DE...EF...FG...GH...HI...IJ...JK...KL...LM...M...
  |
ABCD.BCDE.CDEF.DEFG.EFGH.FGHI.GHIJ.HIJK.IJKL.JKLM.KLM..LM...M...
  & rep(0xf0000)
ABCD................EFGH................IJKL................M...
  << (20 - 4)
....EFGH................IJKL................M...................
  |
ABCDEFGH............EFGHIJKL............IJKLM...............M...
  << (40 - 8)
........IJKLM...............M...................................
  |
ABCDEFGHIJKLM.......EFGHIJKLM...........IJKLM...............M...
  & 0xfff8_0000_0000_0000
ABCDEFGHIJKLM...................................................

It's probably worth working through the various N to make sure you know which masking is required.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]