qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] qtest hang in /x86_64/ahci/io/ncq/simple (ppc64 host)


From: John Snow
Subject: Re: [Qemu-devel] qtest hang in /x86_64/ahci/io/ncq/simple (ppc64 host)
Date: Mon, 06 Jul 2015 15:21:12 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0


On 07/06/2015 12:24 PM, Paolo Bonzini wrote:
> 
> 
> On 06/07/2015 17:41, John Snow wrote:
>> I'll take a look. More than possible there's a race in the wait
>> conditional that I have just never seen before.
>>
>> I tweaked it a little recently to wait on both NCQ and traditional
>> completion, but it doesn't have a timeout or anything. I'll go ahead and
>> fix that while I'm here.
>>
>> Should be something simple, hopefully.

The one Peter Maydell reported is indeed something simple. Sent out a patch.

> 
> My hang in test_halted_ncq is different:

... Great :)

> 
> (gdb) up
> #1  0x00007f3aaadb4476 in read (__nbytes=1, __buf=0x7ffc0652abbf,
> __fd=<optimized out>) at /usr/include/bits/unistd.h:44
> 44      return __read_alias (__fd, __buf, __nbytes);
> (gdb)
> #2  qtest_qmp_receive (address@hidden) at
> /home/work/upstream/qemu/tests/libqtest.c:371
> 371           len = read(s->qmp_fd, &c, 1);
> (gdb)
> #3  0x00007f3aaadb4c48 in qtest_qmp_eventwait (s=0x7f3aac017360,
> address@hidden "STOP") at
> /home/work/upstream/qemu/tests/libqtest.c:477
> 477           response = qtest_qmp_receive(s);
> (gdb)
> #4  0x00007f3aaadb3514 in qmp_eventwait (event=0x7f3aaadbdc68 "STOP") at
> /home/work/upstream/qemu/tests/libqtest.h:498
> 498       return qtest_qmp_eventwait(global_qtest, event);
> (gdb)
> #5  ahci_guest_io_halt (address@hidden,
> address@hidden '\005', address@hidden 'a',
> address@hidden, address@hidden,
> address@hidden)
>     at /home/work/upstream/qemu/tests/libqos/ahci.c:604
> 604       qmp_eventwait("STOP");
> (gdb)
> #6  0x00007f3aaada9e09 in ahci_halted_io_test (cmd_read=<optimized out>,
> cmd_write=<optimized out>) at
> /home/work/upstream/qemu/tests/ahci-test.c:1244
> 1244      cmd = ahci_guest_io_halt(ahci, port, cmd_write,
> 
> 
> In QEMU, current_run_state is RUN_STATE_RUNNING, so the halt didn't
> happen.  It's 100% reproducible with "make check-qtest-{i386,x86_64} V=1
> -j12".  This is the final part of the qtest log:
> 
> [R +0.066539] readl 0xe0000390
> [S +0.066553] OK 0x0000000000000000
> [R +0.066582] memset 0x10b000 0x100 0x00
> [S +0.066597] OK
> [R +0.067627] write 0x10c000 0x1000
[snip]
> [S +0.067748] OK
> [R +0.067782] readl 0xe00003b8
> [S +0.067796] OK 0x0000000000000000
> [R +0.067824] read 0x10a020 0x20
> [S +0.067843] OK
> 0x0000000000000000000000000000000000000000000000000000000000000000
> [R +0.067874] write 0x10a020 0x20
> 0x0000000000000000000000000000000000000000000000000000000000000000
> [S +0.067897] OK
> [R +0.067926] write 0x10a020 0x20
> 0x450401000000000000e010000000000000000000000000000000000000000000
> [S +0.067940] OK
> [R +0.067964] write 0x10e000 0x14 0x2780610800000040000000000800000000000000
> [S +0.067980] OK
> [R +0.068004] write 0x10e080 0x10 0x00c010000000000000000000ff0f0080
> [S +0.068018] OK
> [R +0.068042] writel 0xe00003b4 0x2
> [S +0.068056] OK
> [R +0.068079] writel 0xe00003b8 0x2
> [S +0.069500] OK
> 
> This is the /tmp/qtest-blkdebug.INHp7s file:
> 
> [inject-error]
> event = "write_aio"
> errno = "5"
> state = "1"
> immediately = "off"
> once = "on"
> [set-state]
> event = "write_aio"
> new_state = "2"
> 
> There is no trace of the write payload in the .qcow2 file.

Smells like the injected error prevented that successfully, but...

> 
> Paolo
> 

Can't reproduce: Fedora 21, 4.0.5, Today's QEMU HEAD (7edd8e56) ...
So it's a race condition of some sort.

Obvious questions: You've tried a distclean?
What was your configure invocation?

My attempt:
make distclean
../../configure --target-list="x86_64-softmmu i386-softmmu" --enable-debug
make -j9
make check-qtest-{i386,x86_64} V=1 -j12

Completes just fine for me.

That aside, I can't figure out why the machine HALT *and* the write
wouldn't occur. Can you stick a bunch of printfs inside the NCQ error
handling function and see if there's something goofing up in there, or
if it's even making it there?

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index bb6a92f..f1af6f9 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -921,6 +921,7 @@ static void ncq_cb(void *opaque, int ret)
     IDEState *ide_state = &ncq_tfs->drive->port.ifs[0];

     if (ret == -ECANCELED) {
+        fprintf(stderr, "ncq_cb: -ECANCELED\n");
         return;
     }

@@ -928,12 +929,17 @@ static void ncq_cb(void *opaque, int ret)
         bool is_read = ncq_tfs->cmd == READ_FPDMA_QUEUED;
         BlockErrorAction action = blk_get_error_action(ide_state->blk,
                                                        is_read, -ret);
+        fprintf(stderr, "ncq_cb: ret (%d) is < 0; action is %s\n",
+                ret, BlockErrorAction_lookup[action]);
         if (action == BLOCK_ERROR_ACTION_STOP) {
             ncq_tfs->halt = true;
             ide_state->bus->error_status = IDE_RETRY_HBA;
+            fprintf(stderr, "ncq_cb: BLOCK_ERROR_ACTION_STOP\n");
         } else if (action == BLOCK_ERROR_ACTION_REPORT) {
             ncq_err(ncq_tfs);
+            fprintf(stderr, "ncq_cb: BLOCK_ERROR_ACTION_REPORT\n");
         }
+        fprintf(stderr, "ncq_cb: -> blk_error_action\n");
         blk_error_action(ide_state->blk, action, is_read, -ret);
     } else {
         ide_state->status = READY_STAT | SEEK_STAT;


My guess is I have goofed something up above there in the error handling
section of ncq_cb, but "worksforme," so I am not sure what's going on.

If no prints show up during testing, maybe turn on AHCI_DEBUG as well
and see if any prints are showing up in the success case.

--js



reply via email to

[Prev in Thread] Current Thread [Next in Thread]