qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] qemu <-> libvirt communication regressed in QEMU commit 524


From: Laszlo Ersek
Subject: [Qemu-devel] qemu <-> libvirt communication regressed in QEMU commit 5243722376
Date: Wed, 16 Sep 2015 14:13:38 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0

Hi Emilio,

I've arrived at your patch, noted in the subject, with bisection (please
see the bisection log attached).

I'm on RHEL-7.1. Sometimes I have to work with upstream QEMU, and then I
use it with my preexistent libvirt guests, pulling QEMU somewhat
infrequently. My libvirt-related version numbers are:

libvirtd: 1.2.8-16.el7_1.3.x86_64
libvirt-python: 1.2.8-7.el7_1.1.x86_64
libvirt-g*: 0.1.7-3.el7.x86_64
virt-manager: 1.1.0-12.el7.noarch

The symptom is that when your patch is built into QEMU, then QEMU
starts, but hangs as soon as I click the specific VM's entry in
virt-manager's list.

In the process list ("ps"), I can then see two qemu processes, parent
and child. I saved backtraces for both of them, while they were hung.
The command lines are also visible in the attached text files. The line
numbers (ie. the QEMU binary) matches the tree when checked out and
built at exactly your patch.

(I double checked: if I build at 5243722376^, then it works.)

The configure command was:

./configure \
  --audio-drv-list=alsa \
  --target-list=x86_64-softmmu,i386-softmmu,aarch64-softmmu \
  --disable-vde \
  --enable-werror \
  --enable-spice \
  --disable-stack-protector \
  --prefix=/opt/qemu-installed \
  --disable-gtk \
  --enable-debug \
  --enable-trace-backends=stderr

I don't think libvirt, or for that matter, any QMP interfaces, have
anything to do with this. I rather believe that libvirt invokes QEMU for
retrieving the capabilities in a way that exposes a possible problem in
your patch. (Hence I provided my libvirt version numbers just to be sure.)

... In fact I'm confused about your patch. rcu_init() makes sure that at
fork(), the parent will first acquire both "rcu_sync_lock" and
"rcu_registry_lock". Meaning, no other thread in the parent can hold
those mutexen when the parent thread calling fork() actually forks.

Then, in the parent, the original thread simply releases both mutexen,
in rcu_init_unlock(). In the child, only the one thread exists that
called fork() in the parent. However, that one child thread does own the
copies of both mutexen. So it is prudent for the child to release both
copies.

Your patch causes "rcu_registry_lock" to be reinitialized in the child,
rather than released, plus "rcu_sync_lock" remains untouched (ie. locked
by the one thread that exists in the child). Why is that correct?

(Side note: we're talking process-private, not process-shared mutexen.)

I can be easily wrong, but I don't understand the commit message, and
why the patch is correct.

... Hm, I can see the discussion here:

http://thread.gmane.org/gmane.comp.emulators.qemu/356765/focus=360421

Okay... let me see 24fa90499f... "The problem is that releasing
error-checking locks in the child fails under glibc with EPERM". <--
That is a striking surprise to me, but still, the removal of
PTHREAD_MUTEX_ERRORCHECK only justifies why your patch would *not* be
necessary.

The last paragraph of your email that I linked above talks about a
"possibility of corruption". Maybe I've managed to trigger that. If so,
I hope it won't be hard to fix up.

... Hm, apparently Alex had mentioned the same concern as I did now,
about ignoring "rcu_sync_lock" in the child, in message
<http://thread.gmane.org/gmane.comp.emulators.qemu/356765/focus=360602>.
Was that concern cleared up eventually?

Thanks!
Laszlo

Attachment: bisect.log
Description: Text Data

Attachment: parent.txt
Description: Text document

Attachment: child.txt
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]