qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [Qemu-devel] How to add my implementation of the fmadds i


From: G 3
Subject: Re: [Qemu-ppc] [Qemu-devel] How to add my implementation of the fmadds instruction to QEMU
Date: Tue, 27 Sep 2016 12:51:28 -0400


On Sep 27, 2016, at 12:16 PM, Eric Blake wrote:

On 09/27/2016 09:33 AM, G 3 wrote:
void fmadds(float *frD, float frA, float frC, float frB)
{
        *frD = frA * frC + frB;
}


It sounds like I should change my argument types to double.

Insufficient.  The whole reason that fmadds exists is that there are
provably cases where two operations that both round are GUARANTEED to
get the wrong answer when compared to a single operation, regardless of
the precisions involved.  Widening from float to double does NOT
eliminate the double-rounding problem.


I still want to try implementing this function. I'm thinking rewriting the
helper_fmadd() function in target-ppc/fpu_helper.c. Does that
sound correct?

I seriously doubt you would be able to write a correct implementation,
if you aren't even aware of the double-rounding reasons why fmadds was
added to the IEEE floating point specification in the first place. Your
idea that you would be able to speed things up is probably a premature
optimization, given that you have no realistic clue how hard it is to
CORRECTLY implement fused-multiply-add.

The problem with your reasoning is you assume this instruction has to be 100% correctly implemented. That every single "corner-case" has to be accounted for. I have only just begun my research into the floating point instructions so of course I'm not going to know everything initially. I plan on experimenting
and learning along the way.

My ultimate end goal is to make sound play correctly on a PowerPC-Mac OS guest. The source code to Apple's audio kernel extensions indicate explicit use of certain floating-point instructions. The current theory is audio playback doesn't work because the floating point unit is too slow. So if I implemented a floating point instruction such as fmadds that was optimized for speed, then I could
make sound play better than it does now. Accounting for every single
corner-case may sound like the right thing to do, but it may actually be
causing more harm than good. It takes CPU time to handle the corner- cases.
These corner-cases are not even guaranteed to appear during execution
time. I'm hoping by implementing a scaled down version of the fmadds
instruction, audio playback may actually work.

Maybe some time down the road a command-line switch could be added
that allows the user to decide which is more important: speed or accuracy?




reply via email to

[Prev in Thread] Current Thread [Next in Thread]