The DPPS (Dot Product) instruction is defined to first sum pairs of
intermediate results, then sum those values to get the final result.
i.e. (A+B)+(C+D)
We incrementally sum the results, i.e. ((A+B)+C)+D, which can result
in incorrect rouding.
For consistency, also change the variable names to the ones used
in the Intel SDM and implement DPPD following the manual.
Based on a patch by Paul Brook<paul@nowt.org>.
Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
---
target/i386/ops_sse.h | 67 ++++++++++++++++++++++---------------------
1 file changed, 35 insertions(+), 32 deletions(-)