qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] EHCI USB regression in 1.2.0 - ehci_state_fetchqtd() as


From: Hans de Goede
Subject: Re: [Qemu-devel] EHCI USB regression in 1.2.0 - ehci_state_fetchqtd() asserting
Date: Mon, 08 Oct 2012 15:51:08 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120911 Thunderbird/15.0.1

Hi,

On 10/08/2012 03:01 PM, Johannes Stezenbach wrote:
Hi Hans,

On Mon, Oct 08, 2012 at 01:27:28PM +0200, Hans de Goede wrote:
On 10/02/2012 05:26 PM, Shawn Starr wrote:

Reopening this issue with usb-host stalling now

ehci warning: guest updated active QH
USBDEVFS_DISCARDURB: Invalid argument
USBDEVFS_DISCARDURB: Invalid argument
husb: leaking iso urbs because of discard failure


Now with qemu-XXX-1.2.0-12.fc18.x86_64

if I have webcam open, it will stall and not resume. This is with usb-host
directly.

Shall I enable debugging again?

Hmm, this likely is caused by too high latencies in your system,
which are caused in turn I believe by you running an F-18 kernel which
has various debugging options enabled inside the kernel which can
cause significant latencies. I've spend 1.5 days tracing this very
same issue down in the past. So please first of all make sure that you're
running a kernel without debugging options enabled, either the latest
F-18 build from koji:
http://koji.fedoraproject.org/koji/buildinfo?buildID=358570

or an F-17 kernel, almost all the F-18 "rc" kernels have debugging enabled
and thus cause significant latency issues.

If you can reproduce this with a kernel without the debugging options,
then we can investigate this further.

By changing the kernel, don't you just make the issue harder to reproduce?
I mean Linux isn't real-time so any kernel can show latency spikes
and it's a show-stopper if iso transfers stall instead of just
dropping some packets.

There will always be a race between the call to USBDEVFS_DISCARDURB
and the URB completing.  IMHO the handling in usb_host_stop_n_free_iso()
is buggy.  How about dropping the "killed" and "free" variables and
calling async_complete() and g_free() unconditionally?

This race is well known already handled correctly, the real problem is the
"ehci warning: guest updated active QH" message, which most likely indicates
that the guest has hit the doorbell (IAAD) in the EHCI controller, and then
has not gotten an IAA interrupt within
a certain amount of time triggering its IAAD watchdog (some real EHCI
hardware is broken wrt delivering IAA interrupt) causing us to not see
an unlinked qh as unlinked, and then later on triggering the
"warning: guest updated active QH" message.

This is unavoidable when we get too large latencies, the ehci hardware
simple was not designed to be virtualized, anything but actually.

Regards,

Hans



reply via email to

[Prev in Thread] Current Thread [Next in Thread]