[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3 3/4] target-tilegx: Add double floating point
From: |
Chen Gang |
Subject: |
Re: [Qemu-devel] [PATCH v3 3/4] target-tilegx: Add double floating point implementation |
Date: |
Sat, 12 Dec 2015 10:45:40 +0800 |
User-agent: |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 |
On 12/12/15 08:41, Richard Henderson wrote:
> On 12/11/2015 03:38 PM, Chen Gang wrote:
>>
>> On 12/11/15 05:17, Richard Henderson wrote:
>>> On 12/10/2015 06:15 AM, Chen Gang wrote:
>>>> +#define TILEGX_F_MAN_HBIT (1ULL << 59)
>>> ...
>>>> +static uint64_t fr_to_man(float64 d)
>>>> +{
>>>> + uint64_t val = get_f64_man(d) << 7;
>>>> +
>>>> + if (get_f64_exp(d)) {
>>>> + val |= TILEGX_F_MAN_HBIT;
>>>> + }
>>>> +
>>>> + return val;
>>>> +}
>>>
>>> One presumes that "HBIT" is the ieee implicit one bit.
>>> A better name or better comments would help there.
>>>
>>
>> OK, thanks. And after think of again, I guess, the real hardware does
>> not use HBIT internally (use the full 64 bits as mantissa without HBIT).
>
> It must do. Otherwise the arithmetic doesn't work out.
>
Oh, yes, and we have to use my original implementation (60 for mantissa,
4 bits for other using).
>> But what I have done is still OK (use 59 bits + 1 HBIT as mantissa), for
>> 59 bits are enough for double mantissa (52 bits). It makes the overflow
>> processing easier, but has to process mul operation specially.
>
> What you have works. But the mul operation isn't as special as you make it
> out -- aside from requiring at least 104 bits as intermediate -- in that when
> one implements what the hardware does, subtraction also may require
> significant normalization.
>
I guess, you misunderstood what I said (my English is not quite well).
For mul, at least, it needs (104 - 1) bits, At present, we have 120 bits
for it (in fact, our mul generates 119 bits result). So it is enough.
>> According to floatsidf, it seems "4", but after I expanded the bits, I
>> guess, it is "7".
>>
>> /*
>> * Double exp analyzing: (0x21b00 << 1) - 0x37(55) = 0x3ff
>> *
>> * 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
>> *
>> * 1 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0 0
>> *
>> * 0 0 0 0 0 1 1 0 1 1 1 => 0x37(55)
>> *
>> * 0 1 1 1 1 1 1 1 1 1 1 => 0x3ff
>> *
>> */
>
> That's the exponent within the flags temporary. It has nothing to do with
> the position of the extracted mantissa.
>
0x37(55) + 4 (guard bits) + 1 (HBIT) = 60 bits.
So, if the above is correct, the mantissa is 60 bits (with HBIT), and
bit 18 in flags for overflow, bit 19 for underflow (bit 20 must be for
sign).
> FWIW, the minimum shift would be 3, in order to properly implement rounding;
> if the hardware uses a shift of 4, that's fine too.
>
I guess, so it uses 4 guard bits.
> What I would love to know is if the shift present in floatsidf is not really
> required; equally valid to adjust 0x21b00 by 4. Meaning normalization would
> do a proper job with the entire given mantissa. This would require better
> documentation, or access to hardware to verify.
>
I guess, before call any fdouble insns, we can use the low 4 bits as
mantissa (e.g. calc mul), but when call any fdouble insn, we can not use
the lower 4 guard bits, so floatsidf has to shift 4 bits left.
>>>> +uint64_t helper_fdouble_addsub(CPUTLGState
>> And for my current implementation (I guess, it should be correct):
>>
>> typedef union TileGXFPDFmtV {
>> struct {
>> uint64_t mantissa : 60; /* mantissa */
>> uint64_t overflow : 1; /* carry/overflow bit for absolute
>> add/mul */
>> uint64_t unknown1 : 3; /* unknown */
>
> I personally like to call all 4 of the top bits overflow. But I have no idea
> what the real hardware actually does.
>
>> In helper_fdouble_addsub(), both dest and srca are unpacked, so they are
>> within 60 bits. So one time absolute add are within 61 bits, so let bit
>> 61 as overflow bit is enough.
>
> True. But if all 4 top bits are considered overflow, then one could
> implement floatdidf fairly easily. But I suspect that real hw doesn't work
> that way, or it would have already been done.
>
So, I only assumed bit 60 is for overflow, the high 3 bits are unknown.
For me, if one bit for overflow is enough, the hardware will save the
other bits for another using (or are reserved for future).
Thanks.
--
Chen Gang (陈刚)
Open, share, and attitude like air, water, and life which God blessed