[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 12/14] VSX Stage 4: Add Scalar SP Fused Multiply
From: |
Richard Henderson |
Subject: |
Re: [Qemu-devel] [PATCH 12/14] VSX Stage 4: Add Scalar SP Fused Multiply-Adds |
Date: |
Fri, 08 Nov 2013 10:13:06 +1000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 |
On 11/08/2013 09:30 AM, Richard Henderson wrote:
> On 11/08/2013 09:28 AM, Richard Henderson wrote:
>> On 11/07/2013 06:31 AM, Tom Musta wrote:
>>> }
>>> \
>>> +
>>> \
>>> + if (r2sp) {
>>> \
>>> + float32 tmp32 = float64_to_float32(xt_out.fld[i],
>>> \
>>> + &env->fp_status);
>>> \
>>> + xt_out.fld[i] = float32_to_float64(tmp32, &env->fp_status);
>>> \
>>> + }
>>> \
>>> +
>>> \
>>
>> You can't get correct results for a single-precision fma from a
>> double-precision fma and merely rounding the results.
>>
>> See e.g. glibc's sysdeps/ieee754/dbl-64/s_fmaf.c.
>
> Blah, nevermind. That would be using separate add+mul in double-precision,
> not
> using a double-precision fma primitive.
Hmph. I was right the first time. See
> http://www.exploringbinary.com/double-rounding-errors-in-floating-point-conversions/
for example inputs that suffer from double-rounding.
What's needed in each of the examples are infinite precision values containing
55 bits. This is easy to accomplish with fma.
Two 23-bit inputs can create a product with 46 significant bits. One can
append 23 more significant bits by choosing an exponent for the addend that
does not overlap the product. Thus one can create (almost) every intermediate
result with up to 69 consecutive bits (the exception being products without
factors that can be represented in 23-bits).
I'm too lazy to decompose the examples therein to actual single-precision
inputs, but I'm certain it can be done.
Thus you *do* need the round-to-zero plus inexact solution that glibc uses.
(Or to perform the fma in 128-bits and round once, but I think that would be
way more intrusive wrt the rest of the code, and more expensive than necessary.)
r~
- [Qemu-devel] [PATCH 05/14] VSX Stage 4: Add stxsiwx and stxsspx, (continued)
- [Qemu-devel] [PATCH 05/14] VSX Stage 4: Add stxsiwx and stxsspx, Tom Musta, 2013/11/06
- [Qemu-devel] [PATCH 06/14] VSX Stage 4: Add xsaddsp and xssubsp, Tom Musta, 2013/11/06
- [Qemu-devel] [PATCH 07/14] VSX Stage 4: Add xsmulsp, Tom Musta, 2013/11/06
- [Qemu-devel] [PATCH 08/14] VSX Stage 4: Add xsdivsp, Tom Musta, 2013/11/06
- [Qemu-devel] [PATCH 10/14] VSX Stage 4: Add xssqrtsp, Tom Musta, 2013/11/06
- [Qemu-devel] [PATCH 09/14] VSX Stage 4: Add xsresp, Tom Musta, 2013/11/06
- [Qemu-devel] [PATCH 11/14] VSX Stage 4: add xsrsqrtesp, Tom Musta, 2013/11/06
- [Qemu-devel] [PATCH 12/14] VSX Stage 4: Add Scalar SP Fused Multiply-Adds, Tom Musta, 2013/11/06
[Qemu-devel] [PATCH 13/14] VSX Stage 4: Add xscvsxdsp and xscvuxdsp, Tom Musta, 2013/11/06
[Qemu-devel] [PATCH 14/14] VSX Stage 4: Add xxleqv, xxlnand and xxlorc, Tom Musta, 2013/11/06
Re: [Qemu-devel] [PATCH 00/14] VSX Stage 4, Richard Henderson, 2013/11/07
Re: [Qemu-devel] [PATCH 00/14] VSX Stage 4, Andreas Färber, 2013/11/08