qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Question about nonblocking stderr and lost logs


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] Question about nonblocking stderr and lost logs
Date: Mon, 14 Dec 2015 15:29:49 +0800
User-agent: Mutt/1.5.24 (2015-08-30)

On Mon, Dec 14, 2015 at 04:19:50PM +1100, Sam Bobroff wrote:
> On Thu, Dec 10, 2015 at 04:37:22PM +0800, Stefan Hajnoczi wrote:
> > On Mon, Dec 07, 2015 at 02:22:18PM +1100, Sam Bobroff wrote:
> > > On Fri, Dec 04, 2015 at 02:33:01PM +0800, Stefan Hajnoczi wrote:
> > > > On Mon, Nov 30, 2015 at 04:34:47PM +1100, Sam Bobroff wrote:
> > > > > Hi QEMU programmers,
> > > > > 
> > > > > While doing some experimental work on QEMU that has involved adding a 
> > > > > lot of
> > > > > new log messages (using qemu_log_mask()), I've discovered that under 
> > > > > some
> > > > > conditions a lot of my log messages go missing.
> > > > > 
> > > > > I've tracked the issue down to qemu_logfile being left at the default 
> > > > > (stderr)
> > > > > (so when not using -D) and according to strace what is happening is 
> > > > > that the
> > > > > debug messages are being passed to write() but write() is returning 
> > > > > EWOULDBLOCK
> > > > > and the messags are dropped.
> > > > > 
> > > > > This seems to be happening because stderr is being set non-blocking 
> > > > > (which is a
> > > > > bit odd to me), and it seems like NONBLOCK is being set as 
> > > > > qmp_chardev_add() adds a
> > > > > device for stdout (yes stdout, not stderr; perhaps file descriptors 
> > > > > have been
> > > > > dup'd by that point?).
> > > > > 
> > > > > Is this by design to prevent a slow console from blocking QEMU? If 
> > > > > not, should
> > > > > I delve further and try to prevent non-blocking being set on stderr?
> > > > > 
> > > > > (Unfortunately I don't have a replication for this using an 
> > > > > unmodified QEMU but
> > > > > I suspect I could find one if necessary.)
> > > > 
> > > > This sounds like a bug.  stderr should be blocking.
> > > > 
> > > > But it's specific to your QEMU build or your command-line.  Here are my
> > > > results on an x86 host:
> > > > 
> > > > $ qemu-system-x86_64 -enable-kvm -m 1024 -cpu host \
> > > >                      -drive 
> > > > if=virtio,cache=none,file=test.img,format=raw
> > > > $ cat /proc/$(pgrep qemu)/fdinfo/2
> > > > pos:    0
> > > > flags:  0100002
> > > > mnt_id: 22
> > > > 
> > > > The flags are O_RDWR | O_LARGEFILE.
> > > > 
> > > > O_NONBLOCK would have been 04000 (octal).
> > > > 
> > > > Please retry with qemu.git/master and post your QEMU command-line.
> > > > 
> > > > Stefan
> > > 
> > > Hi Stefan,
> > > 
> > > Thanks for checking this! I think I've worked out what's happening and 
> > > it's a
> > > bit complicated ;-)
> > > 
> > > It appears that the behaviour is not directly related to the command line 
> > > or
> > > build but rather the way the shell sets up the environment.
> > > 
> > > Here's a test very similar to yours using origin/master QEMU on my Debian
> > > stable machine, with bash as my shell:
> > > 
> > > $ qemu-system-x86_64 -S -vga none -nographic
> > > ( In another virtual console: )
> > > $ cat /proc/$(pgrep qemu)/fdinfo/2
> > > pos:    0
> > > flags:  0104002
> > > mnt_id: 19
> > > 
> > > So I seem to have O_NONBLOCK set on stderr.
> > > 
> > > However, if I perform that test by backgrounding QEMU and using the same
> > > console:
> > > $ qemu-system-x86_64 -S -vga none -nographic &
> > > [1] 4970
> > > [1]+  Stopped                 qemu-system-x86_64 -S -nographic -vga none
> > > $ cat /proc/$(pgrep qemu)/fdinfo/2
> > > pos:    0
> > > flags:  0100002
> > > mnt_id: 19
> > > 
> > > Hmm!
> > > 
> > > After a bit of reading, it seems like this is caused by the fact that 
> > > bash,
> > > when there is no redirection, sets up fds 0, 1 and 2 by dup'ing a single 
> > > file
> > > descriptor. I found this surprising!
> > > 
> > > A small test program that sets O_NONBLOCK on fd 1 and checks for it on fd 
> > > 2
> > > seems to corroborate it:
> > > 
> > > #include <stdio.h>
> > > #include <fcntl.h>
> > > #include <assert.h>
> > > 
> > > int main() {
> > >         int f1, f2;
> > > 
> > >         f1 = fcntl(1, F_GETFL, 0);
> > >         f2 = fcntl(2, F_GETFL, 0);
> > >         assert(!(f1 & O_NONBLOCK));
> > >         assert(!(f2 & O_NONBLOCK));
> > >         assert(!fcntl(1, F_SETFL, f1 | O_NONBLOCK));
> > >         if (fcntl(2, F_GETFL, 0) & O_NONBLOCK)
> > >                 fprintf(stderr, "File descriptors are duplicates.\n");
> > >         else
> > >                 fprintf(stderr, "File descriptors are separate.\n");
> > >         return 0;
> > > }
> > > 
> > > gcc -Wall foo.c
> > > $ ./a.out
> > > File descriptors are duplicates.
> > > $ ./a.out > /dev/null
> > > File descriptors are separate.
> > > 
> > > The nonblocking flag is always (?) being set on stdin and stdout during
> > > qemu_chr_open_stdio(), but it doesn't affect stderr unless they are dup'd.
> > > 
> > > So this is probably not going to occur often, and certainly not when 
> > > running
> > > under libvirt or other tools. But if it is actually what is happening, 
> > > I'm not
> > > sure what to do about it. Any ideas?
> > 
> > This is unfortunate.  I wonder if opening /proc/fd/<n> is just a dup or
> > if it actually creates separate file status flags.
> 
> Calling open() always creates a new "file description" (which contains file
> status flags), so it's certainly a way to approach this.
> 
> I've testing doing this if we're going to use stderr as the qemu_logfile:
> 
>       dup2(open("/proc/self/fd/2", O_WRONLY), 2)
> 
> ... in do_qemu_set_log() and it fixes the issue, however...

Thanks for trying it out.  Will you send a patch?

> * Doing the dup2() causes it to fix any other fprintf()s to stderr, but
>   I feel bad for changing the fd that's inside the stderr "FILE *" "underneath
>   it". Would it be better to do the following...
>       qemu_logfile = fopen("/proc/self/fd/2", "a")
>   ... even though that will leave other fprintf()s unfixed. (I think that is
>   probably OK.)
> * We could also unconditionally do the dup2(), maybe early in main().

This is probably the cleanest way to solve the problem.  Early on during
startup QEMU can "undup" stderr and never worry about it again.

> * I'm not sure which platforms other than modern Linux support
>   /proc/self/fd/<n>. (Just fixing it for Linux seems a lot better than
>   nothing.)

This workaround is platform-specific but still useful (on Linux).

> * We could print a warning if stderr is nonblocking and we can't work around
>   it. This seems like a good idea even if don't fix it for Linux.

This is good when the workaround is not possible.

> * We could also handle it by converting the qemu_log_mask calls from fprintf()
>   to sprintf() + write(), as we'd then be able to handle EAGAIN correctly. 
> This
>   doesn't feel any cleaner to me.

I agree.  Bypassing fprintf() doesn't help since the fd is still
O_NONBLOCK.

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]