qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Consult] tilegx: About floating point instructions


From: Richard Henderson
Subject: Re: [Qemu-devel] [Consult] tilegx: About floating point instructions
Date: Mon, 17 Aug 2015 10:31:49 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0

On 08/15/2015 11:16 AM, Chen Gang wrote:
> OK, thanks, but for float(uns)sisf2 and float(uns)sidf2, we can not only
> simply move.  :-(

Oh yes, I see that now.  Unfortunate.

> But what you said is really quite valuable to me!! we can treat the flag
> as a caller saved context, then can let the caller can use callee freely
> (in fact, I guess, the real hardware treats it as caller context, too).
> 
>  - we have to define the flag format based on the existing format in the
>    related docs and tilegx.md (reserve 0-20 and 25-31 bits).
> 
>  - We can only use 21-24 for mark addsub, mul, or typecast result. If
>    21-24 bits are all zero, it means typecast result. For fsingle: 32-63
>    bits is the input integer; for fdouble: srca is the input integer.

Plausible.

> 
>  - For addsub and mul result, we use 32-63 bits for an index of resource
>    handler (like 'fd' returned by open). fsingle_addsub2, fsingle_mul1,
>    fdouble_mul_flags, fdouble_addsub allocate resource, and pack1 free.

No, that's a bad idea.  No state external to the inputs to the insns.

It really would be nice if we had the same documentation that was used
to implement the gcc backend.  Otherwise we have to rely on guesswork.

For single-precision it appears that the format is

  63                                      31          24   10  9     0
  [ mantissa with implicit and guard bits | cmp flags | ?? | s | exp ]

We are able to deduce the bias for the exponent based on the input gcc gives us
for floatunssisf: 0x9e == 2**31 when the mantissa is normalized.

So:

  fsingle_add1, fsingle_sub1: Perform the operation.  Split the result
  such that all of the fields above are filled in.

  fsingle_mul1: Perform the operation.  Split the result such that all
  of the fields above except for cmp-flags are filled in.

  fsingle_addsub2: Nop.
  fsingle_mul2: Move srca to dest.

  fsingle_pack1: Normalize and repack the above.  In the add/sub/mul case,
  no normalization will be required, so no change to the result occurs.

  In the floatunssisf2 case, the input implicit bit may not be set, and
  guard bits may be set, so real rounding and normalization must occur,
  adjusting the exponent constructed by gcc in building the flags.

For double-precision things are more complicated.  Precisely because there is
no dedicated fdouble_mul[1-4] instructions, but instead gcc is to use a normal
128-bit integer multiplication on the mantissa.

For double-precision it appears that the format is

         63               57                           4            0
  unpack [ overflow bits? | mantissa with implicit bit | guard bits ]

         63   31          24   20  19    8    0
  flags  [ ?? | cmp flags | ?? | s | exp | ?? ]

Similarly we can compute the bias for exp as 0x21b == 2**53.
Or is it 20 bits of exponent and 0x21b00 == 2**53?

So:

  fdouble_unpack_max, fdouble_unpack_min: Perform the operation as described,
  extracting the mantissa of the min/max absolute value.

  fdouble_add_flags, fdouble_sub_flags: Extract the signs and exponent of the
  sources, and compute the sign and exponent of the result.  Set a bit,
  presumably one of [24:21] that tell fdouble_addsub whether to perform
  addition or subtraction.  Set the comparison flags.

  fdouble_mul_flags: Extract the signs and exponent of the sources, and compute
  the sign and exponent of the result.  Note that the result of the 128-bit
  multiplication is guaranteed to be non-normalized : the 2 57-bit inputs will
  produce a 114-bit intermediate result.  Which means that bits [63:51] are
  guaranteed to be zero on entry to the pack stages.  Which means that some
  bias will need to be applied to the intermediate exponent.

  fdouble_addsub: Add or subtract the mantissas based on a bit in flags.

  fdouble_pack1: Move flags (srcb) to result (dest).
  fdouble_pack2: Take the 128-bit mantissa of srca+srcb, the flags of dest,
  and normalize and pack the result.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]