-----Original Message-----
From: Xen-devel <address@hidden> On Behalf Of
Julien Grall
Sent: 13 December 2019 15:37
To: Ian Jackson <address@hidden>
Cc: Jürgen Groß <address@hidden>; address@hidden; Stefano
Stabellini <address@hidden>; osstest service owner <osstest-
address@hidden>; Anthony Perard <address@hidden>
Subject: Re: [Xen-devel] [xen-4.13-testing test] 144736: regressions -
FAIL
+Anthony
On 13/12/2019 11:40, Ian Jackson wrote:
Julien Grall writes ("Re: [Xen-devel] [xen-4.13-testing test] 144736:
regressions - FAIL"):
AMD Seattle boards (laxton*) are known to fail booting time to time
because of PCI training issue. We have workaround for it (involving
longer power cycle) but this is not 100% reliable.
This wasn't a power cycle. It was a software-initiated reboot. It
does appear to hang in the firmware somewhere. Do we expect the pci
training issue to occur in this case ?
The PCI training happens at every reset (including software). So I may
have confused the workaround for firmware corruption with the PCI
training. We definitely have a workfround for the former.
For the latter, I can't remember if we did use a new firmware or just
hope it does not happen often.
I think we had a thread on infra@ about the workaround some times last
year. Sadly this was sent on my Arm e-mail address and I didn't archive
it before leaving :(. Can you have a look if you can find the thread?
test-armhf-armhf-xl-vhd 18 leak-check/check fail
REGR.
vs. 144673
That one is strange. A qemu process seems to have have died producing
a core file, but I couldn't find any log containing any other
indication
of a crashed program.
I haven't found anything interesting in the log. @Ian could you set up
a repro for this?
There is some heisenbug where qemu crashes with very low probability.
(I forget whether only on arm or on x86 too). This has been around
for a little while. I doubt this particular failure will be
reproducible.
I can't remember such bug been reported on Arm before. Anyway, I managed
to get the stack trace from gdb:
Core was generated by `/usr/local/lib/xen/bin/qemu-system-i386
-xen-domid 1 -chardev socket,id=libxl-c'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x006342be in xen_block_handle_requests (dataplane=0x108e600) at
/home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-
dir/hw/block/dataplane/xen-block.c:531
531
/home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-
dir/hw/block/dataplane/xen-block.c:
No such file or directory.
[Current thread is 1 (LWP 1987)]
(gdb) bt
#0 0x006342be in xen_block_handle_requests (dataplane=0x108e600) at
/home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-
dir/hw/block/dataplane/xen-block.c:531
#1 0x0063447c in xen_block_dataplane_event (opaque=0x108e600) at
/home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-
dir/hw/block/dataplane/xen-block.c:626
#2 0x008d005c in xen_device_poll (opaque=0x107a3b0) at
/home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-dir/hw/xen/xen-
bus.c:1077
#3 0x00a4175c in run_poll_handlers_once (ctx=0x1079708,
timeout=0xb1ba17f8) at
/home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-dir/util/aio-
posix.c:520
#4 0x00a41826 in run_poll_handlers (ctx=0x1079708, max_ns=8000,
timeout=0xb1ba17f8) at
/home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-dir/util/aio-
posix.c:562
#5 0x00a41956 in try_poll_mode (ctx=0x1079708, timeout=0xb1ba17f8) at
/home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-dir/util/aio-
posix.c:597
#6 0x00a41a2c in aio_poll (ctx=0x1079708, blocking=true) at
/home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-dir/util/aio-
posix.c:639
#7 0x0071dc16 in iothread_run (opaque=0x107d328) at
/home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-
dir/iothread.c:75
#8 0x00a44c80 in qemu_thread_start (args=0x1079538) at
/home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-dir/util/qemu-
thread-posix.c:502
#9 0xb67ae5d8 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
This feels like a race condition between the init/free code with
handler. Anthony, does it ring any bell?