|
From: | G 3 |
Subject: | Re: [Qemu-devel] How to add my implementation of the fmadds instruction to QEMU |
Date: | Tue, 27 Sep 2016 12:51:28 -0400 |
On Sep 27, 2016, at 12:16 PM, Eric Blake wrote:
On 09/27/2016 09:33 AM, G 3 wrote:void fmadds(float *frD, float frA, float frC, float frB) { *frD = frA * frC + frB; }It sounds like I should change my argument types to double.Insufficient. The whole reason that fmadds exists is that there are provably cases where two operations that both round are GUARANTEED toget the wrong answer when compared to a single operation, regardless ofthe precisions involved. Widening from float to double does NOT eliminate the double-rounding problem.I still want to try implementing this function. I'm thinking rewriting thehelper_fmadd() function in target-ppc/fpu_helper.c. Does that sound correct?I seriously doubt you would be able to write a correct implementation, if you aren't even aware of the double-rounding reasons why fmadds wasadded to the IEEE floating point specification in the first place. Youridea that you would be able to speed things up is probably a premature optimization, given that you have no realistic clue how hard it is to CORRECTLY implement fused-multiply-add.
The problem with your reasoning is you assume this instruction has to be 100% correctly implemented. That every single "corner-case" has to be accounted for. I have only just begun my research into the floating point instructions so of course I'm not going to know everything initially. I plan on experimenting
and learning along the way.My ultimate end goal is to make sound play correctly on a PowerPC-Mac OS guest. The source code to Apple's audio kernel extensions indicate explicit use of certain floating-point instructions. The current theory is audio playback doesn't work because the floating point unit is too slow. So if I implemented a floating point instruction such as fmadds that was optimized for speed, then I could
make sound play better than it does now. Accounting for every single corner-case may sound like the right thing to do, but it may actually becausing more harm than good. It takes CPU time to handle the corner- cases.
These corner-cases are not even guaranteed to appear during execution time. I'm hoping by implementing a scaled down version of the fmadds instruction, audio playback may actually work. Maybe some time down the road a command-line switch could be addedthat allows the user to decide which is more important: speed or accuracy?
[Prev in Thread] | Current Thread | [Next in Thread] |