[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9
From: |
Alex Bennée |
Subject: |
Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9 |
Date: |
Fri, 31 Mar 2017 12:21:13 +0100 |
User-agent: |
mu4e 0.9.19; emacs 25.2.12 |
Pavel Dovgalyuk <address@hidden> writes:
>> From: address@hidden [mailto:address@hidden
>> Pavel Dovgalyuk <address@hidden> writes:
>> >> From: address@hidden [mailto:mttcg-
>> address@hidden
>> >> Pavel Dovgalyuk <address@hidden> writes:
>> >> >> From: address@hidden [mailto:mttcg-
>> >> address@hidden
>> >> >> Pavel Dovgalyuk <address@hidden> writes:
>> <snip>
>> >> >> > I tested on vexpress-a9 platform with Debian wheezy.
>> >> >>
>> >> >> Thanks for that. I now have a test case that I can reproduce failures
>> >> >> on
>> >> >> without needing graphics.
>> >> >>
>> >> >> I've been investigating if there are any problems with the timer
>> >> >> processing now they have been moved into the TCG thread. The record
>> >> >> stage seems to work fine but I'm having difficulty figuring out why
>> >> >> playback freezes. It seems we get to a point where we are stuck waiting
>> >> >> for a suspiciously exact timer deadline:
>> >> >
>> >> > I've encountered the following behavior at replay stage:
>> >> > - replay takes some instructions to execute (qemu_icount += counter)
>> >> > - virtual timer is fired
>> >>
>> >> This is the virtual timer based on icount not the virtual rt timer? So
>> >
>> > Regular virtual timer. It's value is based on icount.
>> > virtual_rt is used for internal icount purposes.
>>
>> And this is where the clock warps come in? The bias brings the
>> virtual_rt time forward because execution is waiting for some external
>> event to fire (e.g. a timer IRQ)?
>
> I guess so. But bias is not updated when the vCPU works.
> vCPU thread updates only qemu_icount which is used for virtual clock
> calculation.
>
>> >> under the new scheme of being processed in the vCPU loop we should only
>> >> fire this one once all execution is done (although you may exit the loop
>> >> early before icount is exhausted).
>> >
>> > We should stop the vCPU before processing virtual timers because:
>> > - virtual clock depends on instruction counter
>> > - virtual timers may change virtual hardware state
>>
>> If we do the processing in the top of main vCPU loop this should be
>> equivalent. The instruction counter cannot run because we haven't
>> entered tcg_exec_cpu. We also process virtual timers in this thread
>> outside the loop so nothing else can be poking the hardware state.
>
> This is how qemu worked in older version - it switched between
> processing tasks (vCPU and timers) in one thread.
The only real difference is the sequencing in the old case was protected
by the BQL - now its my program order.
> But how we'll join this behavior with the current design and MTTCG?
Ignore MTTCG for now. We don't even try and run multiple-threads when
icount is enabled. However the change in the BQL locking is what has
triggered the failures.
Anyway I think I'm getting closer to narrowing it down. On record I see
replay_current_step jump backwards with this:
/* A common event print, called when reading or saving an event */
static void print_event(uint8_t event)
{
static int event_count;
static uint64_t last_step;
uint64_t this_step = replay_get_current_step();
fprintf(stderr, "replay: event %d is %d @ step=%#" PRIx64 "\n",
event_count, event, this_step);
if (this_step < last_step) {
fprintf(stderr,"%s: !!! step %d went backwards
%#"PRIx64"=>%#"PRIx64"!!!\n",
__func__, event_count, last_step, this_step);
abort();
}
last_step = this_step;
event_count++;
}
void replay_put_event(uint8_t event)
{
assert(event < EVENT_COUNT);
replay_put_byte(event);
print_event(event);
}
The jump back is fairly consistent in my case (event 66 in the vexpress
run) but I'm fairly sure that is the bug. I can't see any reason for the
step count to go backwards - indeed that breaks the premise of .
Unfortunately when I start putting break and watchpoints in to see how
this jump back occurs the problem goes away until I disable them. So
this very much looks like a race condition corrupting the icount data.
>> >> > - replay puts back unexecuted instructions (qemu_icount -= counter)
>> >> >
>> >> > But virtual timer cannot take in account non-executed instructions
>> >> > (counter) and
>> >> > therefore it reads only qemu_icount, which provides incorrect time.
>> >> >
>> >> >> But the timers are all enabled:
>> >> >>
>> >> >> (gdb) qemu timers
>> >> >> Processing Realtime timers
>> >> >> clock QEMU_CLOCK_REALTIME is enabled:true,
>> >> >> last:-9223372036854775808
>> >> >> Processing Virtual timers
>> >> >> clock QEMU_CLOCK_VIRTUAL is enabled:true, last:-9223372036854775808
>> >> >> timer 34297350016/1 (cb:0x555555a2e952 <ptimer_tick>)
>> >> >> timer 503290000000/1000000 (cb:0x555555bd4d1d <ra_timer_handler>)
>> >> >> Processing Host timers
>> >> >> clock QEMU_CLOCK_HOST is enabled:true, last:1490191319270134000
>> >> >> Processing Virtual RT timers
>> >> >> clock QEMU_CLOCK_VIRTUAL_RT is enabled:true,
>> >> >> last:-9223372036854775808
>> >> >
>> >> > Timers are processed only at checkpoints recorded in the log.
>> >> > When replay is stuck, probably there is a pending checkpoint in the log
>> >> > and pending instructions in CPU (because iothread breaks its
>> >> > synchronization).
>> >>
>> >> How does this work for instructions that sleep (like WFI)? The WFI will
>> >> cause us to exit the loop but I assume we need a "real" timer to trigger
>> >> a firing of whatever the WFI is waiting for as time according to icount
>> >> isn't advancing.
>> >
>> > This is virtual_rt for. It advances even when vCPU is stopped.
>>
>> So I'm not seeing this happen:
>>
>> process_icount_data: icount not increased 117302332749=>117302332749 RT:-1
>>
>> I wonder if the advance of virtual_rt has been deadlocked? The -1
>> suggests its not even setup.
>
> qemu_start_warp_timer says that it is setup only when there are active
> virtual timers.
>
>
>> > By the way, I made small description of the replay log format
>> > to be later included into docs/replay.txt:
>> >
>> Thanks. That's really useful.
>>
>> Do you think it would be worth including the dumper script I wrote in
>> ./scripts/? I mainly wrote it to aid my understanding of the code flow
>> but it might be useful in future investigations?
>
> Usually I get human-readable logs in record and replay modes to find
> divergence between them, but I think your script might be useful in some
> cases.
Cool. I'll include it in the icount series once I have something to post
(fingers crossed I'm getting close).
--
Alex Bennée
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9, (continued)
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9, Pavel Dovgalyuk, 2017/03/16
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9, Alex Bennée, 2017/03/16
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9, Pavel Dovgalyuk, 2017/03/16
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9, Alex Bennée, 2017/03/22
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9, Pavel Dovgalyuk, 2017/03/29
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9, Alex Bennée, 2017/03/29
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9, Pavel Dovgalyuk, 2017/03/30
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9, Alex Bennée, 2017/03/30
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9, Pavel Dovgalyuk, 2017/03/31
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9, Paolo Bonzini, 2017/03/31
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9,
Alex Bennée <=
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9, Paolo Bonzini, 2017/03/31
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9, Alex Bennée, 2017/03/31
- Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9, Alex Bennée, 2017/03/31