qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v7 1/2] hw/ptimer: Fix issues caused by artifici


From: Dmitry Osipenko
Subject: Re: [Qemu-devel] [PATCH v7 1/2] hw/ptimer: Fix issues caused by artificially limited timer timeout
Date: Sun, 25 Oct 2015 16:23:39 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

25.10.2015 02:55, Peter Crosthwaite пишет:
On Sat, Oct 24, 2015 at 3:22 PM, Dmitry Osipenko <address@hidden> wrote:
24.10.2015 22:45, Peter Crosthwaite пишет:


This looks like a give-up without trying to get the correct value. If
the calculated value (using the normal-path logic below) is sane, you
should just use it. If it comes out bad then you should clamp to 1.

I am wondering whether this clamping policy (as in original code as
well) is correct at all though. The value of a free-running
short-interval periodic timer (poor mans random number generator)
without any actual interrupt generation will be affected by QEMUs
asynchronous handling of timer events. So if I set limit to 100, then
sample this timer every user keyboard stroke, I should get a uniform
distribution on [0,100]. Instead in am going to get lots of 1s. This


Right, that's a good example. What about to scale ptimer period to match
adjusted timer_mod interval?


Thats just as incorrect as changing the limit IMO. The guest could get
confused with a timer running at the wrong frequency.

is more broken in the original code (as you state), as I will get >
100, but I think we have traded broken for slightly less broken. I
think the correct semantic is to completely ignoring rate limitin
except for the scheduling on event callbacks. That is, the timer


I'm missing you here. What event callbacks?


when timer_mod() hits, and it turn triggers some device specific event
(usually an interrupt).

There are two basic interactions for any QEMU timer. There are
synchronous events, i.e. the guest reading (polling) the counter which
is what this patch tries to fix. The second is the common case of
periodic interrupt generation. My proposal is that rate limiting does
not affect synchronous operation, only asynchronous (so my keystroke
RNG case works). In the current code, if ptimer_get_count() is called
when the event has passed it returns 0 under the assumption that the
timer_mod callback is about to happen. With rate-limiting that may be
well in the future.


ptimer_tick() would happen on the next QEMU loop cycle, so it might be more reasonable to return counter = 1 here, wouldn't it?

interval is not rate limited, instead the timer_mod interval
(next_event -last_event) just has a 10us clamp.

The means the original code semantic of returning counter = 0 for an
already triggered timer is wrong. It should handle in-the-past
wrap-arounds as wraparounds by triggering the timer and redoing the
math with the new interval values. So instead the logic would be
something like:

timer_val = -1;

for(;;) {

   if (!enabled) {
      return delta;
   }

   timer_val = (next-event - now) *  scaling();
   if (timer_val >=  0) {
       return timer_val;
   }
   /* Timer has actually expired but we missed it, reload it and try again
*/
   ptimer_tick();
}


Why do you think that ptimer_get_count() == 0 in case of the running
periodic timer that was expired while QEMU was "on the way" to ptimer code
is bad and wrong?

Because you may have gone past the time when it was actually zero and
reloaded and started counting again. It should return the real
(reloaded and resumed) value. This is made worse by rate limiting as
the timer will spend a long time at the clamp value waiting for the
rate-limited tick to fix it.

Following on from before, we don't want the async stuff to affect
sync. As the async callbacks are heavily affected by rate limiting we
don't want the polled timer having to rely on the callbacks (async
path) at all for correct operation. That's what the current
implementation of ptimer_get_count assumes with that 0-clamp.


Alright, that make sense now. Thanks for clarifying.

 From the guest point of view it's okay (no?), do we really
need to overengineer that corner case?


Depends on your use case. Your bug report is correct in that the timer
should never be outside the bounds of the limit. But you are fixing
that very specifically with a minimal change rather than correcting
the larger ptimer_get_count() logic which I think is more broken than
you suggest it is.


ptimer_reload() then needs to be patched to make sure it always
timer_mod()s in the future, otherwise this loop could iterate a large
number of times.

This means that when the qemu_timer() actually ticks, a large number
or cycles may have occured, but we can justify that in that callback
event latency (usually interrupts) is undefined anyway and coalescing
of multiples may have happened as part of that. This usually matches
expectations of real guests where interrupt latency is ultimately
undefined.


ptimer_tick() is re-arm'ing the qemu_timer() in case of periodic mode. Hope
I haven't missed your point here.


Yes. But it is also capable of doing the heavy lifting for our already
expired case. Basic idea is, if the timer is in a bad state (should
have hit) but hasn't, do the hit to put the timer into a good state
(by calling ptimer_tick) then just do the right thing. That's what the
loop does. It should also work for an in-the-past one-shot.


Summarizing what we have now:

There are two issues with ptimer:

1) ptimer_get_count() return wrong values with adjusted .limit

Patch V7 doesn't solve that issue, but makes it slightly better by clamping returned value to [0, 1]. That's not what we want, we need to return counter value in it's valid range [0, limit].

You are rejecting variant of scaling ptimer period, saying that it might affect software behavior inside the guest. But by adjusting the timer, we might already causing same misbehavior in case of blazing fast host machine. I'll scratch my head a bit more on it. If you have any better idea, please share.

2) ptimer_get_count() return fake 0 value in case of the expired qemu_timer() without triggering ptimer_tick()

You're suggesting to solve it by running ptimer_tick(). So if emulated device uses ptimer tick event (scheduled qemu bh) to raise interrupt, it would do it by one QEMU loop cycle earlier.

My question here: is it always legal for the guest software to be able to get counter = 0 on poll while CPU interrupt on timer expire hasn't happened yet (would happen after one QEMU cycle). I guess it might cause software misbehavior if it assumes that the real hardware has CPU and timer running in the same clock domain, i.e. such situation might be not possible. So I'm suggesting to return counter = 1 and allow ptimer_tick() happen on it's own.

--
Dmitry



reply via email to

[Prev in Thread] Current Thread [Next in Thread]