qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Question about nonblocking stderr and lost logs


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] Question about nonblocking stderr and lost logs
Date: Tue, 22 Dec 2015 14:43:12 +0800
User-agent: Mutt/1.5.24 (2015-08-30)

On Wed, Dec 16, 2015 at 11:53:16AM +1100, Sam Bobroff wrote:
> On Mon, Dec 14, 2015 at 03:29:49PM +0800, Stefan Hajnoczi wrote:
> > On Mon, Dec 14, 2015 at 04:19:50PM +1100, Sam Bobroff wrote:
> > > On Thu, Dec 10, 2015 at 04:37:22PM +0800, Stefan Hajnoczi wrote:
> > > > On Mon, Dec 07, 2015 at 02:22:18PM +1100, Sam Bobroff wrote:
> > > > > On Fri, Dec 04, 2015 at 02:33:01PM +0800, Stefan Hajnoczi wrote:
> > > > > > On Mon, Nov 30, 2015 at 04:34:47PM +1100, Sam Bobroff wrote:
> > > > > > > Hi QEMU programmers,
> > > > > > > 
> > > > > > > While doing some experimental work on QEMU that has involved 
> > > > > > > adding a lot of
> > > > > > > new log messages (using qemu_log_mask()), I've discovered that 
> > > > > > > under some
> > > > > > > conditions a lot of my log messages go missing.
> > > > > > > 
> > > > > > > I've tracked the issue down to qemu_logfile being left at the 
> > > > > > > default (stderr)
> > > > > > > (so when not using -D) and according to strace what is happening 
> > > > > > > is that the
> > > > > > > debug messages are being passed to write() but write() is 
> > > > > > > returning EWOULDBLOCK
> > > > > > > and the messags are dropped.
> > > > > > > 
> > > > > > > This seems to be happening because stderr is being set 
> > > > > > > non-blocking (which is a
> > > > > > > bit odd to me), and it seems like NONBLOCK is being set as 
> > > > > > > qmp_chardev_add() adds a
> > > > > > > device for stdout (yes stdout, not stderr; perhaps file 
> > > > > > > descriptors have been
> > > > > > > dup'd by that point?).
> > > > > > > 
> > > > > > > Is this by design to prevent a slow console from blocking QEMU? 
> > > > > > > If not, should
> > > > > > > I delve further and try to prevent non-blocking being set on 
> > > > > > > stderr?
> > > > > > > 
> > > > > > > (Unfortunately I don't have a replication for this using an 
> > > > > > > unmodified QEMU but
> > > > > > > I suspect I could find one if necessary.)
> > > > > > 
> > > > > > This sounds like a bug.  stderr should be blocking.
> > > > > > 
> > > > > > But it's specific to your QEMU build or your command-line.  Here 
> > > > > > are my
> > > > > > results on an x86 host:
> > > > > > 
> > > > > > $ qemu-system-x86_64 -enable-kvm -m 1024 -cpu host \
> > > > > >                      -drive 
> > > > > > if=virtio,cache=none,file=test.img,format=raw
> > > > > > $ cat /proc/$(pgrep qemu)/fdinfo/2
> > > > > > pos:        0
> > > > > > flags:      0100002
> > > > > > mnt_id:     22
> > > > > > 
> > > > > > The flags are O_RDWR | O_LARGEFILE.
> > > > > > 
> > > > > > O_NONBLOCK would have been 04000 (octal).
> > > > > > 
> > > > > > Please retry with qemu.git/master and post your QEMU command-line.
> > > > > > 
> > > > > > Stefan
> > > > > 
> > > > > Hi Stefan,
> > > > > 
> > > > > Thanks for checking this! I think I've worked out what's happening 
> > > > > and it's a
> > > > > bit complicated ;-)
> > > > > 
> > > > > It appears that the behaviour is not directly related to the command 
> > > > > line or
> > > > > build but rather the way the shell sets up the environment.
> > > > > 
> > > > > Here's a test very similar to yours using origin/master QEMU on my 
> > > > > Debian
> > > > > stable machine, with bash as my shell:
> > > > > 
> > > > > $ qemu-system-x86_64 -S -vga none -nographic
> > > > > ( In another virtual console: )
> > > > > $ cat /proc/$(pgrep qemu)/fdinfo/2
> > > > > pos:    0
> > > > > flags:  0104002
> > > > > mnt_id: 19
> > > > > 
> > > > > So I seem to have O_NONBLOCK set on stderr.
> > > > > 
> > > > > However, if I perform that test by backgrounding QEMU and using the 
> > > > > same
> > > > > console:
> > > > > $ qemu-system-x86_64 -S -vga none -nographic &
> > > > > [1] 4970
> > > > > [1]+  Stopped                 qemu-system-x86_64 -S -nographic -vga 
> > > > > none
> > > > > $ cat /proc/$(pgrep qemu)/fdinfo/2
> > > > > pos:    0
> > > > > flags:  0100002
> > > > > mnt_id: 19
> > > > > 
> > > > > Hmm!
> > > > > 
> > > > > After a bit of reading, it seems like this is caused by the fact that 
> > > > > bash,
> > > > > when there is no redirection, sets up fds 0, 1 and 2 by dup'ing a 
> > > > > single file
> > > > > descriptor. I found this surprising!
> > > > > 
> > > > > A small test program that sets O_NONBLOCK on fd 1 and checks for it 
> > > > > on fd 2
> > > > > seems to corroborate it:
> > > > > 
> > > > > #include <stdio.h>
> > > > > #include <fcntl.h>
> > > > > #include <assert.h>
> > > > > 
> > > > > int main() {
> > > > >         int f1, f2;
> > > > > 
> > > > >         f1 = fcntl(1, F_GETFL, 0);
> > > > >         f2 = fcntl(2, F_GETFL, 0);
> > > > >         assert(!(f1 & O_NONBLOCK));
> > > > >         assert(!(f2 & O_NONBLOCK));
> > > > >         assert(!fcntl(1, F_SETFL, f1 | O_NONBLOCK));
> > > > >         if (fcntl(2, F_GETFL, 0) & O_NONBLOCK)
> > > > >                 fprintf(stderr, "File descriptors are duplicates.\n");
> > > > >         else
> > > > >                 fprintf(stderr, "File descriptors are separate.\n");
> > > > >         return 0;
> > > > > }
> > > > > 
> > > > > gcc -Wall foo.c
> > > > > $ ./a.out
> > > > > File descriptors are duplicates.
> > > > > $ ./a.out > /dev/null
> > > > > File descriptors are separate.
> > > > > 
> > > > > The nonblocking flag is always (?) being set on stdin and stdout 
> > > > > during
> > > > > qemu_chr_open_stdio(), but it doesn't affect stderr unless they are 
> > > > > dup'd.
> > > > > 
> > > > > So this is probably not going to occur often, and certainly not when 
> > > > > running
> > > > > under libvirt or other tools. But if it is actually what is 
> > > > > happening, I'm not
> > > > > sure what to do about it. Any ideas?
> > > > 
> > > > This is unfortunate.  I wonder if opening /proc/fd/<n> is just a dup or
> > > > if it actually creates separate file status flags.
> > > 
> > > Calling open() always creates a new "file description" (which contains 
> > > file
> > > status flags), so it's certainly a way to approach this.
> > > 
> > > I've testing doing this if we're going to use stderr as the qemu_logfile:
> > > 
> > >   dup2(open("/proc/self/fd/2", O_WRONLY), 2)
> > > 
> > > ... in do_qemu_set_log() and it fixes the issue, however...
> > 
> > Thanks for trying it out.  Will you send a patch?
> >
> > > * Doing the dup2() causes it to fix any other fprintf()s to stderr, but
> > >   I feel bad for changing the fd that's inside the stderr "FILE *" 
> > > "underneath
> > >   it". Would it be better to do the following...
> > >   qemu_logfile = fopen("/proc/self/fd/2", "a")
> > >   ... even though that will leave other fprintf()s unfixed. (I think that 
> > > is
> > >   probably OK.)
> > > * We could also unconditionally do the dup2(), maybe early in main().
> > 
> > This is probably the cleanest way to solve the problem.  Early on during
> > startup QEMU can "undup" stderr and never worry about it again.
> 
> OK.
> 
> > > * I'm not sure which platforms other than modern Linux support
> > >   /proc/self/fd/<n>. (Just fixing it for Linux seems a lot better than
> > >   nothing.)
> > 
> > This workaround is platform-specific but still useful (on Linux).
> > 
> > > * We could print a warning if stderr is nonblocking and we can't work 
> > > around
> > >   it. This seems like a good idea even if don't fix it for Linux.
> > 
> > This is good when the workaround is not possible.
> 
> OK, I'll include this too.
> 
> > > * We could also handle it by converting the qemu_log_mask calls from 
> > > fprintf()
> > >   to sprintf() + write(), as we'd then be able to handle EAGAIN 
> > > correctly. This
> > >   doesn't feel any cleaner to me.
> > 
> > I agree.  Bypassing fprintf() doesn't help since the fd is still
> > O_NONBLOCK.
> 
> That wasn't exactly what I was trying to say, what I meant was that it *would*
> help but that I don't think it's a good solution. I'll explain (sorry if I'm
> over-explaining!):
> 
> The problem with fprintf() is that it may write "some" bytes and then return
> failure, so it's impossible to handle the EAGAIN case correctly even if we
> wanted to: we can't know where to continue writing. IMHO, this seems like it
> could be considered a bug in fprintf().
> 
> Because write() always indicates how many bytes were written, we could 
> manually
> block on EAGAIN and resume writing at the correct place. But this would be
> slower, require a local buffer (that may need to be resized sometimes) and 
> more
> complex code, so I don't think it's a good fix.  Although, it would be much 
> more
> portable and it would work.
> 
> Would you like me to post a patch for the re-open approach or explore the
> sprintf()/write() idea?

The program may crash/terminate before stderr becomes writeable after
EAGAIN.

I prefer a reliable, blocking stderr so re-open seems like a good idea.

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]