qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RESEND][PATCH 0/3] Fix guest time drift under heavy lo


From: Gleb Natapov
Subject: Re: [Qemu-devel] [RESEND][PATCH 0/3] Fix guest time drift under heavy load.
Date: Thu, 6 Nov 2008 16:18:21 +0200

On Thu, Nov 06, 2008 at 02:21:16PM +0100, andrzej zaborowski wrote:
> >> > It is nothing like a -win2k-hack since there is no any guest "bad
> >> > behaviour" that cause the problem. Yes real hardware doesn't do this,
> >> > but real hardware also provides OS with enough CPU power to handle every
> >> > single timer interrupt.
> >>
> >> A guest that counts on having enough CPU for something is
> >> timing-depenent (buggy).
> >>
> > Tell this to RT developers who count each CPU cycle.
> 
> They don't usually use qemu (they certainly shouldn't).
> 
It seems to me that you were saying that counting on CPU availability
is buggy in general and since Windows doing this it is buggy. And what I
am saying is that for certain thing it is valid thing to do. RT people
shouldn't use qemu in production of cause, should people use qemu to run
Windows in production?

> >
> >> > And even if _some_ interrupts are dropped the
> >> > drift is easily fixed with NTP. Try to run Windows XP on very slow 
> >> > machine
> >> > and I am sure you'll see very noticeable time drift.
> >>
> >> Exactly.  You'll find the drift on real hardware, so you should find
> >> it in the emulator too.  You're trying to hack around it.
> >>
> > If I'll try to run windows XP on 486 then yes, I'll see the time drift.
> > After analyzing the problem I, most certainly, will decide that HW is
> > too old will buy modern CPU and will solve the time drift problem. What do
> > you propose for QEMU users? To use real HW?
> >
You ignored this question, but it was not rhetorical at all. Can you
answer it please?

> >> Linux doesn't see this because the clocksource and the
> >> clockevents-device come from separate clks there.  It is a windows'
> >> problem.  It *is* "bad behaviour".
> > OK we will call the flag -win-time-drift-hack.
> 
> I'm not saying it needs a flag, but that's it's a hack so it should stay 
> local.
>
The fix is already local. It is local to PIT and RTC. And to make it
local I change an infrastructure in the first patch.

> >
> >>
> >> >
> >> >> Instead you can have the interrupt sources register a callback in the
> >> >> PIC that the PIC calls when the interrupt wasn't delivered.  Or.. in
> >> > It requires the mapping from interrupt vector inside the PIC to
> >> > interrupt source.
> >>
> >> Of course.
> >>
> >> > This approach was rejected long time ago.
> >>
> >> Then you'll have to find a different one.
> >>
> >
> > I found one. Here it is, implemented by this patch series.
> >
> >> qemu_irq is the wrong place.
> > Why? Don't give me "that is not how real HW works". Real HW, if properly
> 
> I explained in a previous mail, but let me reiterate: qemu_irq
> abstracts a pin on whose one end a device can set a voltage level and
> the other end read it.  That's it - there's communication in one way
> only.  If you want to send a notification the other direction use a
> second qemu_irq or a callback.  It's a quite simple change in your
> PIC.
> 
And why is this abstract is set in stone? Why can't we extend it to be:
qemu_irq abstracts a pin on whose one end a device can set a voltage
level and the other end read it. If the receiving end "excepts" a signal
(for whatever definition of "except" appropriate for a receiving device)
the function returns positive number, if receiving device loose
information about the signal it returns zero.

> > configured, will behave more or less deterministically. I.e if timer
> > interrupt is configured to generate highest priority interrupt vector
> > and IRQ handler is fast enough (can be calculated knowing CPU freq) the
> > chances of loosing interrupt will be minimal. And those few that are
> > lost due to SMM or NMI can be compensated by NTP.
> >
> >> >
> >> >> the case of mc146818rtc.c wouldn't it be enough to check if the irq
> >> >> has been acked by reading RTC_REG_C?  e.g.
> >> >>
> >> >> static void rtc_periodic_timer(void *opaque)
> >> >> {
> >> >>     RTCState *s = opaque;
> >> >>
> >> >>     rtc_timer_update(s, s->next_periodic_time);
> >> >> +   if (s->cmos_data[RTC_REG_C] & 0xc0)
> >> >> +         s->irq_coalesced++;
> >> >>     s->cmos_data[RTC_REG_C] |= 0xc0;
> >> >>     qemu_irq_raise(s->irq);
> >> >> }
> >> >>
> >> > PIC/APIC in effect has a queue of one interrupt. This means that if
> >> > timer tick is still not acknowledged it doesn't mean that interrupt
> >> > was not queued for delivery inside a PIC.
> >>
> >> This doesn't matter, the tick that arrived while a previous interrupt
> >> was not acked yet, is lost anyway,
> > Not it is not. Not necessary. It can be queued inside PIC and delivered
> > by PIC itself immediately after interrupt acknowledgement.
> 
>  You can argue that it's the new irq that's lost or it's the first one
> that was lost, either way the PIC only sees one time the irq rising,
> instead of two.  That means they were coalesced.
Nothing is lost and PIC sees two irq rising. Example:
- RTC triggers first interrupt.
- It is delivered to PIC. PIC sets corespondent bit in IRR.
- CPU picks up RTC interrupt and it's bit is cleared from IRR bitmap.
- CPU jumps to RTC IRQ routing but before it gets a chance to acknowledge
  IRQ to PIC new timer is triggered.
- With your patch you increment irq_coalesced in that case.
- Interrupt is delivered to PIC. PIC sets corespondent bit in IRR
  but don't deliver the interrupt to the CPU since one is already 
  in progress.
- CPU acks first IRQ.
- PIC delivers second IRQ from IRR.
  
The result is that two interrupts are delivered to CPU and
irq_coalesced == 1 so one more will be needlessly signaled.

> >>                                    i.e. had been coalesced.  So
> >> this'll give you the right number of interrupts to re-inject.
> > No it will not. You'll inject more interrupt then needed and clock will
> > drift forwards.
> 
> The PIC won't see more interrupts (rising edges) than times the timer
> had ticked to given moment.  Thus the guest OS won't see them either.
> 
See above.

> >
> >>
> >> Ofcourse this, as well as your approach are both wrong because the
> >> guest may be intentionally ignoring the irq and expecting the
> >> interrupts to coalesce.  Once it starts processing the RTC interrupts
> >> it will get an unexpected storm.
> >>
> > This is one more thing which is broken in your suggestion, not mine. If a
> > guest wants to ignore interrupt it will mask it and my patch don't report
> > interrupts delivered while masked as been coalesced.
> 
> This is moot, it may choose to mask it or not, it can also mask it
> lower down the path.
> 
Are we talking about real OSes that we want to run as guests or we give
purely theoretical examples? Can you provide an examples of an OS that
doesn't mask interrupt or disable interrupt source in case it doesn't
want to dial with interrupts from a device?

--
                        Gleb.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]