qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] virtio: Make memory barriers be memory barriers


From: David Gibson
Subject: Re: [Qemu-devel] [PATCH] virtio: Make memory barriers be memory barriers
Date: Tue, 6 Sep 2011 13:12:24 +1000
User-agent: Mutt/1.5.21 (2010-09-15)

On Mon, Sep 05, 2011 at 12:19:46PM +0300, Michael S. Tsirkin wrote:
> On Mon, Sep 05, 2011 at 02:43:16PM +1000, David Gibson wrote:
> > On Sun, Sep 04, 2011 at 12:16:43PM +0300, Michael S. Tsirkin wrote:
> > > On Sun, Sep 04, 2011 at 12:46:35AM +1000, David Gibson wrote:
> > > > On Fri, Sep 02, 2011 at 06:45:50PM +0300, Michael S. Tsirkin wrote:
> > > > > On Thu, Sep 01, 2011 at 04:31:09PM -0400, Paolo Bonzini wrote:
> > > > > > > > > Why not limit the change to ppc then?
> > > > > > > >
> > > > > > > > Because the bug is masked by the x86 memory model, but it is 
> > > > > > > > still
> > > > > > > > there even there conceptually. It is not really true that x86 
> > > > > > > > does
> > > > > > > > not need memory barriers, though it doesn't in this case:
> > > > > > > >
> > > > > > > > http://bartoszmilewski.wordpress.com/2008/11/05/who-ordered-memory-fences-on-an-x86/
> > > > > > > >
> > > > > > > > Paolo
> > > > > > > 
> > > > > > > Right.
> > > > > > > To summarize, on x86 we probably want wmb and rmb to be compiler
> > > > > > > barrier only. Only mb might in theory need to be an mfence.
> > > > > > 
> > > > > > No, wmb needs to be sfence and rmb needs to be lfence.  GCC does
> > > > > > not provide those, so they should become __sync_synchronize() too,
> > > > > > or you should use inline assembly.
> > > > > > 
> > > > > > > But there might be reasons why that is not an issue either
> > > > > > > if we look closely enough.
> > > > > > 
> > > > > > Since the ring buffers are not using locked instructions (no xchg
> > > > > > or cmpxchg) the barriers simply must be there, even on x86.  Whether
> > > > > > it works in practice is not interesting, only the formal model is
> > > > > > interesting.
> > > > > > 
> > > > > > Paolo
> > > > > 
> > > > > Well, can you describe an issue in virtio that lfence/sfence help 
> > > > > solve
> > > > > in terms of a memory model please?
> > > > > Pls note that guest uses smp_ variants for barriers.
> > > > 
> > > > Ok, so, I'm having a bit of trouble with the fact that I'm having to
> > > > argue the case that things the protocol requiress to be memory
> > > > barriers actually *be* memory barriers on all platforms.
> > > > 
> > > > I mean argue for a richer set of barriers, with per-arch minimal
> > > > implementations instead of the large but portable hammer of
> > > > sync_synchronize, if you will.
> > > 
> > > That's what I'm saying really. On x86 the richer set of barriers
> > > need not insert code at all for both wmb and rmb macros. All we
> > > might need is an 'optimization barrier'- e.g. linux does
> > >  __asm__ __volatile__("": : :"memory")
> > > ppc needs something like sync_synchronize there.
> > 
> > But you're approaching this the wrong way around - correctness should
> > come first.  That is, we should first ensure that there is a
> > sufficient memory barrier to satisfy the protocol.  Then, *if* there
> > is a measurable performance improvement and *if* we can show that a
> > weaker barrier is sufficient on a given platform, then we can whittle
> > it down to a lighter barrier.
> 
> You are only looking at ppc. But on x86 this code ships in
> production. So changes should be made in a way to reduce
> a potential for regressions, balancing risk versus potential benefit.
> I'm trying to point out a way to do this.

Oh, please.  Adding a stronger barrier has a miniscule chance of
breaking things.  And this in a project that has build-breaking
regressions with tedious frequency.

> > > > But just leaving them out on x86!?
> > > > Seriously, wtf?  Do you enjoy having software that works chiefly by
> > > > accident?
> > > 
> > > I'm surprised at the controversy too. People seem to argue that
> > > x86 cpu does not reorder stores and that we need an sfence between
> > > stores to prevent the guest from seeing them out of order, at
> > > the same time.
> > 
> > I don't know the x86 storage model well enough to definitively say
> > that the barrier is not necessary there - nor to say that it is
> > necessary.  All I know is that the x86 model is quite strongly
> > ordered, and I assume that is why the lack of barrier has not caused
> > an observed problem on x86.
> 
> Please review Documentation/memory-barriers.txt as one reference.
> then look at how SMP barriers are implemented at various systems.
> In particular, note how it says 'Mandatory barriers should not be used
> to control SMP effects'.

No, again, correctness first; the onus of showing it's safe is on
those who want weaker barriers.

> > Again, correctness first.  sync_synchronize should be a sufficient
> > barrier for wmb() on all platforms.  If you really don't want it, the
> > onus is on you
> 
> Just for fun, I did a quick hack replacing all barriers with mb()
> in the userspace virtio test. This is on x386.
> 
> Before:
> address@hidden virtio]$ sudo time ./virtio_test 
> spurious wakeus: 0x1da
> 24.53user 14.63system 0:41.91elapsed 93%CPU (0avgtext+0avgdata
> 464maxresident)k
> 0inputs+0outputs (0major+154minor)pagefaults 0swaps
> 
> After:
> address@hidden virtio]$ sudo time ./virtio_test 
> spurious wakeus: 0x218
> 33.97user 6.22system 0:42.10elapsed 95%CPU (0avgtext+0avgdata
> 464maxresident)k
> 0inputs+0outputs (0major+154minor)pagefaults 0swaps
> 
> So user time went up significantly, as expected. Surprisingly the kernel
> side started working more efficiently - surprising since
> kernel was not changed - with net effect close to evening out.

Right.  So small overall performance impact, and that's on a dedicated
testcase which does nothing but the virtio protocol.  I *strongly*
suspect the extra cost of the memory barriers will be well and truly
lost in the rest of the overhead of the qemu networking code.

> So a risk of performance regressions from unnecessary fencing
> seems to be non-zero, assuming that time doesn't lie.
> This might be worth investigating, but I'm out of time right now.
> 
> 
> > to show that (a) it's safe to do so and
> > (b) it's actually worth it.
> 
> Worth what? I'm asking you to minimuse disruption to other platforms
> while you fix ppc.

I'm not "fixing ppc".  I'm fixing a fundamental flaw in the protocol
implementation.  _So far_ I've only observed the effects on ppc, but
that doesn't mean they don't exist.

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson




reply via email to

[Prev in Thread] Current Thread [Next in Thread]