In short, I completely agree with your diagnosis and the solution I'm using in my project (derived from FluidSynth) is essentially the same as yours: using a single clock derived from the audio output. The only difference is that in my design the audio clock is not necessarily real time.