fluid-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [fluid-dev] Floats and doubles, simd and interpolation


From: David Henningsson
Subject: Re: [fluid-dev] Floats and doubles, simd and interpolation
Date: Sun, 28 Nov 2010 07:11:04 +0100
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6

On 2010-11-22 09:17, David Henningsson wrote:
So the reason I like floats is that with SSE, you can process 4 floats
simultaneously, but only 2 doubles. From running a perf I know that 2/3
of the time (for my testcase) was spent in the interpolation routine.
Can we SIMD:ize that, we might get 3-4x speed improvement, that's at
least what I hope for.

There is a library called "ORC", anybody heard of it? You write some
pseudo-assembly code, and on first run ORC translates it into SSE, MMX,
Altivec, etc, or plain old C depending on your hardware. I think it
sounds interesting, and was hoping to see if I could make a test soon,
but then I got busy trying to find that bug instead.

So a follow-up on this. I have the same testcase as stated earlier (FluidR3 sf2 and Dont_you_worry_about_a_thin.mid).

Rendering with doubles takes ~12.3 s, rendering with floats takes ~11.9 s, that's on a 64 bit Ubuntu Maverick (one core, -z 4096). According to perf, here's where we spend the most time:

    41.47%  fluid_rvoice_dsp_interpolate_4th_order
    21.17%  fluid_iir_filter_apply
    10.05%  fluid_rvoice_buffers_mix
     8.18%  fluid_revmodel_processmix
     5.40%  fluid_chorus_processmix
     2.75%  fluid_rvoice_write

So since fluid_rvoice_buffers_mix was the simplest one to optimize, I tried to make an ORC version. After having downloaded the latest version of ORC from Debian Experimental (the one coming with Ubuntu Maverick was buggy), I ended up with ~11.1 seconds and fluid_rvoice_buffers_mix (or rather a strange orc function) being 5% of the total instead of 10%. I also spent some time looking at the iir_filter_apply and interpolation functions.

So experiences from this experiment:

- ORC is still immature, and does not seem to be able to handle more complex things like iir_filter_apply and 4th interpolation yet.

- I was expecting more improvement from ORC - SSE should be able to process 4 floats at once, so the time should have decreased with a factor of 3-4 rather than a factor of 2. (I haven't tried writing a hand-optimized SSE function to compare with.)

- In addition iir_filter_apply function is difficult to SIMD optimize since every sample depends on the previous sample, via the dsp_centernode variable.

- The interpolate_4th_order function (which is the standard order) is difficult to SIMD optimize due to loop conditions (where you sometimes have to interpolate over sample points in both loop start and loop end for the same destination sample).

- Do we really need more performance? Today's computers can handle thousands of voices in real-time, and if you have an old computer you might not have SSE anyway...

- Even though SIMD doesn't seem worth the effort at this point, I'd still like to revisit the float vs doubles question. On my amd64, floats seem slightly faster than doubles. So my question is: when or what do we gain from the increased precision? So far, the only thing I've heard is this: http://lists.nongnu.org/archive/html/fluid-dev/2010-09/msg00053.html Victor, can you follow up, perhaps redo the listening test with latest trunk with the float bug fixed and see if there still are quality differences?

// David




reply via email to

[Prev in Thread] Current Thread [Next in Thread]