qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with


From: Blue Swirl
Subject: Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Date: Fri, 28 May 2010 19:00:04 +0000

On Thu, May 27, 2010 at 10:19 PM, Jan Kiszka <address@hidden> wrote:
> Blue Swirl wrote:
>> On Thu, May 27, 2010 at 7:08 PM, Jan Kiszka <address@hidden> wrote:
>>> Blue Swirl wrote:
>>>> On Thu, May 27, 2010 at 6:31 PM, Jan Kiszka <address@hidden> wrote:
>>>>> Blue Swirl wrote:
>>>>>> On Wed, May 26, 2010 at 11:26 PM, Paul Brook <address@hidden> wrote:
>>>>>>>> At the other extreme, would it be possible to make the educated guests
>>>>>>>> aware of the virtualization also in clock aspect: virtio-clock?
>>>>>>> The guest doesn't even need to be aware of virtualization. It just 
>>>>>>> needs to be
>>>>>>> able to accommodate the lack of guaranteed realtime behavior.
>>>>>>>
>>>>>>> The fundamental problem here is that some guest operating systems 
>>>>>>> assume that
>>>>>>> the hardware provides certain realtime guarantees with respect to 
>>>>>>> execution of
>>>>>>> interrupt handlers.  In particular they assume that the CPU will always 
>>>>>>> be
>>>>>>> able to complete execution of the timer IRQ handler before the periodic 
>>>>>>> timer
>>>>>>> triggers again.  In most virtualized environments you have absolutely no
>>>>>>> guarantee of realtime response.
>>>>>>>
>>>>>>> With Linux guests this was solved a long time ago by the introduction of
>>>>>>> tickless kernels.  These separate the timekeeping from wakeup events, 
>>>>>>> so it
>>>>>>> doesn't matter if several wakeup triggers end up getting merged (either 
>>>>>>> at the
>>>>>>> hardware level or via top/bottom half guest IRQ handlers).
>>>>>>>
>>>>>>>
>>>>>>> It's worth mentioning that this problem also occurs on real hardware,
>>>>>>> typically due to lame hardware/drivers which end up masking interrupts 
>>>>>>> or
>>>>>>> otherwise stall the CPU for for long periods of time.
>>>>>>>
>>>>>>>
>>>>>>> The PIT hack attempts to workaround broken guests by adding artificial 
>>>>>>> latency
>>>>>>> to the timer event, ensuring that the guest "sees" them all.  
>>>>>>> Unfortunately
>>>>>>> guests vary on when it is safe for them to see the next timer event, and
>>>>>>> trying to observe this behavior involves potentially harmful heuristics 
>>>>>>> and
>>>>>>> collusion between unrelated devices (e.g. interrupt controller and 
>>>>>>> timer).
>>>>>>>
>>>>>>> In some cases we don't even do that, and just reschedule the event some
>>>>>>> arbitrarily small amount of time later. This assumes the guest to do 
>>>>>>> useful
>>>>>>> work in that time. In a single threaded environment this is probably 
>>>>>>> true -
>>>>>>> qemu got enough CPU to inject the first interrupt, so will probably 
>>>>>>> manage to
>>>>>>> execute some guest code before the end of its timeslice. In an 
>>>>>>> environment
>>>>>>> where interrupt processing/delivery and execution of the guest code 
>>>>>>> happen in
>>>>>>> different threads this becomes increasingly likely to fail.
>>>>>> So any voodoo around timer events is doomed to fail in some cases.
>>>>>> What's the amount of hacks what we want then? Is there any generic
>>>>> The aim of this patch is to reduce the amount of existing and upcoming
>>>>> hacks. It may still require some refinements, but I think we haven't
>>>>> found any smarter approach yet that fits existing use cases.
>>>> I don't feel we have tried other possibilities hard enough.
>>> Well, seeing prototypes wouldn't be bad, also to run real load againt
>>> them. But at least I'm currently clueless what to implement.
>>
>> Perhaps now is then not the time to rush to implement something, but
>> to brainstorm for a clean solution.
>
> And sometimes it can help to understand how ideas could even be improved
> or why others doesn't work at all.
>
>>
>>>>>> solution, like slowing down the guest system to the point where we can
>>>>>> guarantee the interrupt rate vs. CPU execution speed?
>>>>> That's generally a non-option in virtualized production environments.
>>>>> Specifically if the guest system lost interrupts due to host
>>>>> overcommitment, you do not want it slow down even further.
>>>> I meant that the guest time could be scaled down, for example 2s in
>>>> wall clock time would be presented to the guest as 1s.
>>> But that is precisely what already happens when the guest loses timer
>>> interrupts. There is no other time source for this kind of guests -
>>> often except for some external events generated by systems which you
>>> don't want to fall behind arbitrarily.
>>>
>>>> Then the amount
>>>> of CPU cycles between timer interrupts would increase and hopefully
>>>> the guest can keep up. If the guest sleeps, time base could be
>>>> accelerated to catch up with wall clock and then set back to 1:1 rate.
>>> Can't follow you ATM, sorry. What should be slowed down then? And how
>>> precisely?
>>
>> I think vm_clock and everything that depends on vm_clock, also
>> rtc_clock should be tied to vm_clock in this mode, not host_clock.
>
> Let me check if I got this idea correctly: Instead of tuning just the
> tick frequency of the affected timer device / sending its backlog in a
> row, you rather want to tune the vm_clock correspondingly? Maybe a way
> to abstract the required logic currently sitting only in the RTC for use
> by other timer sources as well.

Yes, that would be a good starting point.

> But just switching rtc_clock to vm_clock when the user wants host_clock
> is obviously not an option. We would rather have to tune host_clock in
> parallel.
>
> Still, this does not answer:
>
> - How do you want to detect lost timer ticks?

With APIC, just like like now. But I think detecting lost ticks
shouldn't be the only way. It's an indication that things have gone
wrong, in the past. It would be better to use also other measurements
to see that the guest is close to a stall _before_ it happens.

> - What subsystem(s) keeps track of the backlog?

Preferably high level (vl.c, exec.c, cpu-exec.c).

> - And depending on the above: How to detect at all that a specific IRQ
>  is a timer tick?

Actually any IRQs acks can be delayed, those could be taken as a sign
of guest overload.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]