>From mingo@elte.hu Fri Jan 28 17:55:16 2011 Return-Path: mingo@elte.hu Received: from zmta01.collab.prod.int.phx2.redhat.com (LHLO zmta01.collab.prod.int.phx2.redhat.com) (10.5.5.31) by mail03.corp.redhat.com with LMTP; Fri, 28 Jan 2011 11:55:16 -0500 (EST) Received: from localhost (localhost.localdomain [127.0.0.1]) by zmta01.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id B283192521; Fri, 28 Jan 2011 11:55:16 -0500 (EST) Received: from zmta01.collab.prod.int.phx2.redhat.com ([127.0.0.1]) by localhost (zmta01.collab.prod.int.phx2.redhat.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id P+HIwniMugBs; Fri, 28 Jan 2011 11:55:16 -0500 (EST) Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by zmta01.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id 0F9A191E51; Fri, 28 Jan 2011 11:55:16 -0500 (EST) Received: from mx1.redhat.com (ext-mx05.extmail.prod.ext.phx2.redhat.com [10.5.110.9]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p0SGtFL2010912; Fri, 28 Jan 2011 11:55:15 -0500 Received: from mx3.mail.elte.hu (mx3.mail.elte.hu [157.181.1.138]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id p0SGt3Rd023101; Fri, 28 Jan 2011 11:55:04 -0500 Received: from elvis.elte.hu ([157.181.1.14]) by mx3.mail.elte.hu with esmtp (Exim) id 1Pirb8-0003br-HD from ; Fri, 28 Jan 2011 17:54:58 +0100 Received: by elvis.elte.hu (Postfix, from userid 1004) id 27A693E2322; Fri, 28 Jan 2011 17:54:54 +0100 (CET) Date: Fri, 28 Jan 2011 17:54:55 +0100 From: Ingo Molnar To: Tejun Heo Cc: roland@redhat.com, oleg@redhat.com, jan.kratochvil@redhat.com, linux-kernel@vger.kernel.org, torvalds@linux-foundation.org, akpm@linux-foundation.org, Peter Zijlstra , Thomas Gleixner , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker Subject: Re: [PATCHSET] ptrace,signal: group stop / ptrace updates Message-ID: <20110128165455.GA18194@elte.hu> References: <1296227324-25295-1-git-send-email-tj@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1296227324-25295-1-git-send-email-tj@kernel.org> User-Agent: Mutt/1.5.20 (2009-08-17) Received-SPF: neutral (mx3: 157.181.1.14 is neither permitted nor denied by domain of elte.hu) client-ip=157.181.1.14; envelope-from=mingo@elte.hu; helo=elvis.elte.hu; X-ELTE-SpamScore: -2.0 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-2.0 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.2.5 -2.0 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] X-RedHat-Spam-Score: -0.012 (SPF_HELO_PASS,SPF_PASS,T_RP_MATCHES_RCVD) X-Scanned-By: MIMEDefang 2.68 on 10.5.11.25 X-Scanned-By: MIMEDefang 2.67 on 10.5.110.9 Status: RO X-Status: A Content-Length: 982 Lines: 43 Hi, I'm hijacking this thread, to report a signal handling bug that Linux and Bash has, and which has been there at least for 10 years since i started using SMP Linux systems ... It's not easy to reproduce but today i found a reproducer - maybe you guys have an idea what's going on. There's two very simple scripts, one calls the other in an infinite loop: $ cat test-signal #!/bin/bash while true; do ./test-signal2; done $ cat test-signal2 #!/bin/bash true The bug is that occasionally Ctrl-C does not get processed, and that the Ctrl-C is 'lost'. It can be reproduced here by running ./test-signal several times, and Ctrl-C-ing it: $ ./test-signal ^C $ ./test-signal ^C^C $ ./test-signal ^C See that '^C^C' line? That is where i had to do Ctrl-C twice. It only fails here about once every 10 times, so it's very rare. I have a stock F14 system running on that box, with the very latest .38 based kernel. Any ideas what's going on? Thanks, Ingo >From tglx@linutronix.de Fri Jan 28 18:42:07 2011 Return-Path: tglx@linutronix.de Received: from zmta03.collab.prod.int.phx2.redhat.com (LHLO zmta03.collab.prod.int.phx2.redhat.com) (10.5.5.33) by mail03.corp.redhat.com with LMTP; Fri, 28 Jan 2011 12:42:07 -0500 (EST) Received: from localhost (localhost.localdomain [127.0.0.1]) by zmta03.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id 0CBB14DAD2; Fri, 28 Jan 2011 12:42:07 -0500 (EST) Received: from zmta03.collab.prod.int.phx2.redhat.com ([127.0.0.1]) by localhost (zmta03.collab.prod.int.phx2.redhat.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ePRDBrqyKoWg; Fri, 28 Jan 2011 12:42:06 -0500 (EST) Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by zmta03.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id EAB694DA6E; Fri, 28 Jan 2011 12:42:06 -0500 (EST) Received: from mx1.redhat.com (ext-mx07.extmail.prod.ext.phx2.redhat.com [10.5.110.11]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p0SHg6en021159; Fri, 28 Jan 2011 12:42:06 -0500 Received: from www.tglx.de (www.tglx.de [62.245.132.106]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id p0SHfr14010819; Fri, 28 Jan 2011 12:41:54 -0500 Received: from localhost (www.tglx.de [127.0.0.1]) by www.tglx.de (8.13.8/8.13.8/TGLX-2007100201) with ESMTP id p0SHfXld018645 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 28 Jan 2011 18:41:34 +0100 Date: Fri, 28 Jan 2011 18:41:33 +0100 (CET) From: Thomas Gleixner To: Ingo Molnar cc: Tejun Heo , roland@redhat.com, oleg@redhat.com, jan.kratochvil@redhat.com, linux-kernel@vger.kernel.org, torvalds@linux-foundation.org, akpm@linux-foundation.org, Peter Zijlstra , =?ISO-8859-15?Q?Fr=E9d=E9ric_Weisbecker?= Subject: Re: [PATCHSET] ptrace,signal: group stop / ptrace updates In-Reply-To: <20110128165455.GA18194@elte.hu> Message-ID: References: <1296227324-25295-1-git-send-email-tj@kernel.org> <20110128165455.GA18194@elte.hu> User-Agent: Alpine 2.00 (LFD 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: clamav-milter 0.95.3 at www.tglx.de X-Virus-Status: Clean X-Spam-Status: No, score=-1.4 required=5.0 tests=ALL_TRUSTED autolearn=failed version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on www.tglx.de X-RedHat-Spam-Score: 0 () X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-Scanned-By: MIMEDefang 2.67 on 10.5.110.11 Status: RO Content-Length: 495 Lines: 14 On Fri, 28 Jan 2011, Ingo Molnar wrote: > See that '^C^C' line? That is where i had to do Ctrl-C twice. > > It only fails here about once every 10 times, so it's very rare. I have a stock F14 > system running on that box, with the very latest .38 based kernel. Tripped over the refuse ^C thing today twice. Had to kill a kernel build from another shell. It just happily displayed ^C and never stopped. That happens once in a while and I have no idea either how to debug that. Thanks, tglx >From anca.emanuel@gmail.com Fri Jan 28 19:04:25 2011 Return-Path: anca.emanuel@gmail.com Received: from zmta02.collab.prod.int.phx2.redhat.com (LHLO zmta02.collab.prod.int.phx2.redhat.com) (10.5.5.32) by mail03.corp.redhat.com with LMTP; Fri, 28 Jan 2011 13:04:25 -0500 (EST) Received: from localhost (localhost.localdomain [127.0.0.1]) by zmta02.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id BF6D39E33D; Fri, 28 Jan 2011 13:04:25 -0500 (EST) Authentication-Results: zmta02.collab.prod.int.phx2.redhat.com (amavisd-new); dkim=pass header.i=@gmail.com Authentication-Results: zmta02.collab.prod.int.phx2.redhat.com (amavisd-new); domainkeys=pass header.from=anca.emanuel@gmail.com Received: from zmta02.collab.prod.int.phx2.redhat.com ([127.0.0.1]) by localhost (zmta02.collab.prod.int.phx2.redhat.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7M6e+MWnicAP; Fri, 28 Jan 2011 13:04:25 -0500 (EST) Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by zmta02.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id A73D29E2E4; Fri, 28 Jan 2011 13:04:25 -0500 (EST) Received: from mx1.redhat.com (ext-mx08.extmail.prod.ext.phx2.redhat.com [10.5.110.12]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id p0SI4P7l020641; Fri, 28 Jan 2011 13:04:25 -0500 Received: from mail-wy0-f174.google.com (mail-wy0-f174.google.com [74.125.82.174]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id p0SI4EGL020929; Fri, 28 Jan 2011 13:04:14 -0500 Received: by wyb28 with SMTP id 28so3665163wyb.33 for ; Fri, 28 Jan 2011 10:04:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=KuEwBNPA1si9mAtCoMJXN8RPwVTkx274dWCCFtM5Wys=; b=R1YADBgCuQJ1s1oygHscmmEgNwTAwIPKxtan1vu5dhooIVVK46p2v/wLQ9tud8NLk5 5QeV/h9WvggFYkG319byn4rY0eKTztAMIhU0ehAeCwkWB6cLqMCL+HcPCv6ShZO7syLx LdwLJhlDmPlH85e9kT2PM8AFDLa/lCdpNPFI0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=umRs0el7h6ygZAchyP+shCh7kpgeKM96SZoZP12W3Vpw1UkO4SFM9/L7C5PFD00A0q fJ0jk2S6p3QQedECxdDk4SonHMuEysbWEQLE6/gq7Uz0b2sDEEW+diLTgu0N7mE+EpoH tpFuOF87jiITCxfnxqH7tKPZbWZQ+LGrQEtpk= MIME-Version: 1.0 Received: by 10.227.54.11 with SMTP id o11mr3094625wbg.88.1296237853514; Fri, 28 Jan 2011 10:04:13 -0800 (PST) Received: by 10.227.27.196 with HTTP; Fri, 28 Jan 2011 10:04:13 -0800 (PST) In-Reply-To: References: <1296227324-25295-1-git-send-email-tj@kernel.org> <20110128165455.GA18194@elte.hu> Date: Fri, 28 Jan 2011 20:04:13 +0200 Message-ID: Subject: Re: [PATCHSET] ptrace,signal: group stop / ptrace updates From: Anca Emanuel To: Thomas Gleixner Cc: Ingo Molnar , Tejun Heo , roland@redhat.com, oleg@redhat.com, jan.kratochvil@redhat.com, linux-kernel@vger.kernel.org, torvalds@linux-foundation.org, akpm@linux-foundation.org, Peter Zijlstra , =?ISO-8859-1?Q?Fr=E9d=E9ric_Weisbecker?= , Mathieu Desnoyers Content-Type: text/plain; charset=ISO-8859-1 X-RedHat-Spam-Score: -0.8 (DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS) X-Scanned-By: MIMEDefang 2.67 on 10.5.11.12 X-Scanned-By: MIMEDefang 2.67 on 10.5.110.12 Status: RO Content-Length: 595 Lines: 15 On Fri, Jan 28, 2011 at 7:41 PM, Thomas Gleixner wrote: > On Fri, 28 Jan 2011, Ingo Molnar wrote: >> See that '^C^C' line? That is where i had to do Ctrl-C twice. >> >> It only fails here about once every 10 times, so it's very rare. I have a stock F14 >> system running on that box, with the very latest .38 based kernel. > > Tripped over the refuse ^C thing today twice. Had to kill a kernel > build from another shell. It just happily displayed ^C and never > stopped. That happens once in a while and I have no idea either how to > debug that. cc: Mathieu Use lttng ? >From compudj@mail.openrapids.net Fri Jan 28 19:37:03 2011 Return-Path: compudj@mail.openrapids.net Received: from zmta03.collab.prod.int.phx2.redhat.com (LHLO zmta03.collab.prod.int.phx2.redhat.com) (10.5.5.33) by mail03.corp.redhat.com with LMTP; Fri, 28 Jan 2011 13:37:02 -0500 (EST) Received: from localhost (localhost.localdomain [127.0.0.1]) by zmta03.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id EAD534E101; Fri, 28 Jan 2011 13:37:02 -0500 (EST) Received: from zmta03.collab.prod.int.phx2.redhat.com ([127.0.0.1]) by localhost (zmta03.collab.prod.int.phx2.redhat.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gTXxFCnYWl+r; Fri, 28 Jan 2011 13:37:02 -0500 (EST) Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by zmta03.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id CAB294DAD0; Fri, 28 Jan 2011 13:37:02 -0500 (EST) Received: from mx1.redhat.com (ext-mx08.extmail.prod.ext.phx2.redhat.com [10.5.110.12]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p0SIb2Bh002078; Fri, 28 Jan 2011 13:37:02 -0500 Received: from blackscsi.openrapids.net (mail.openrapids.net [64.15.138.104]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id p0SIaqbY027778; Fri, 28 Jan 2011 13:36:53 -0500 Received: from localhost (localhost [127.0.0.1]) by blackscsi.openrapids.net (Postfix) with ESMTP id AB30C140209; Fri, 28 Jan 2011 13:36:51 -0500 (EST) Received: from blackscsi.openrapids.net ([127.0.0.1]) by localhost (blackscsi.openrapids.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EgDquPjv+8Tc; Fri, 28 Jan 2011 13:36:50 -0500 (EST) Received: by blackscsi.openrapids.net (Postfix, from userid 1003) id B8815141336; Fri, 28 Jan 2011 13:36:50 -0500 (EST) Date: Fri, 28 Jan 2011 13:36:50 -0500 From: Mathieu Desnoyers To: Anca Emanuel Cc: Thomas Gleixner , Ingo Molnar , Tejun Heo , roland@redhat.com, oleg@redhat.com, jan.kratochvil@redhat.com, linux-kernel@vger.kernel.org, torvalds@linux-foundation.org, akpm@linux-foundation.org, Peter Zijlstra , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker Subject: Re: [PATCHSET] ptrace,signal: group stop / ptrace updates Message-ID: <20110128183650.GA26633@Krystal> References: <1296227324-25295-1-git-send-email-tj@kernel.org> <20110128165455.GA18194@elte.hu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Editor: vi X-Info: http://www.efficios.com X-Operating-System: Linux/2.6.26-2-686 (i686) X-Uptime: 13:29:41 up 65 days, 23:32, 1 user, load average: 0.19, 0.09, 0.05 User-Agent: Mutt/1.5.18 (2008-05-17) X-RedHat-Spam-Score: -0.01 (T_RP_MATCHES_RCVD) X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Scanned-By: MIMEDefang 2.67 on 10.5.110.12 Status: RO Content-Length: 1563 Lines: 43 * Anca Emanuel (anca.emanuel@gmail.com) wrote: > On Fri, Jan 28, 2011 at 7:41 PM, Thomas Gleixner wrote: > > On Fri, 28 Jan 2011, Ingo Molnar wrote: > >> See that '^C^C' line? That is where i had to do Ctrl-C twice. > >> > >> It only fails here about once every 10 times, so it's very rare. I have a stock F14 > >> system running on that box, with the very latest .38 based kernel. > > > > Tripped over the refuse ^C thing today twice. Had to kill a kernel > > build from another shell. It just happily displayed ^C and never > > stopped. That happens once in a while and I have no idea either how to > > debug that. > > cc: Mathieu > > Use lttng ? Heh :) I'm sure Ingo and Thomas have their own tools for that ;) There is one extra thing in the LTTng instrumentation that can help solve this problem: the "input subsystem" instrumentation (enabled with ltt-armall -i). You can then get a dump of: - Your keystrokes (you can then grep for your ctrl-c input) - Read/poll/select system calls (so you know when your terminal receives the input). - Signals sent/delivered Some of these are already instrumented in the mainline kernel, so you might get away without the input subsystem instrumentation. If I had to take a wild guess, my bet would be to take a look in the area of signal delivery, but you never know, maybe it's a userspace bug in the X terminal emulator code that is causing this weirdness. Hope this helps, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com >From oleg@redhat.com Fri Jan 28 18:55:32 2011 Date: Fri, 28 Jan 2011 18:55:33 +0100 From: Oleg Nesterov To: Ingo Molnar Cc: Tejun Heo , roland@redhat.com, jan.kratochvil@redhat.com, linux-kernel@vger.kernel.org, torvalds@linux-foundation.org, akpm@linux-foundation.org, Peter Zijlstra , Thomas Gleixner , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker Subject: Re: [PATCHSET] ptrace,signal: group stop / ptrace updates Message-ID: <20110128175532.GA26727@redhat.com> References: <1296227324-25295-1-git-send-email-tj@kernel.org> <20110128165455.GA18194@elte.hu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110128165455.GA18194@elte.hu> User-Agent: Mutt/1.5.18 (2008-05-17) Status: RO Content-Length: 2436 Lines: 66 On 01/28, Ingo Molnar wrote: > > The bug is that occasionally Ctrl-C does not get processed, and that the Ctrl-C is > 'lost'. It can be reproduced here by running ./test-signal several times, and > Ctrl-C-ing it: > > $ ./test-signal > ^C > $ ./test-signal > ^C^C > $ ./test-signal > ^C > > See that '^C^C' line? That is where i had to do Ctrl-C twice. Reproduced. At first glance, /bin/sh should be blamed... Hmm, probably yes, I even reproduced this under strace, and this is what I see wait4(-1, 0x7fff388431c4, 0, NULL) = ? ERESTARTSYS (To be restarted) --- SIGINT (Interrupt) @ 0 (0) --- rt_sigreturn(0) = -1 EINTR (Interrupted system call) wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 9706 So, ^C is not lost, but ./test-signal doesn't want to exit. This is what ./test-signal does when ^C does work: wait4(-1, 0x7fff1c283b74, 0, NULL) = ? ERESTARTSYS (To be restarted) --- SIGINT (Interrupt) @ 0 (0) --- rt_sigreturn(0) = -1 EINTR (Interrupted system call) OK, it doesn't exit immediately, but then it kills itself: wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGINT}], 0, NULL) = 19585 rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f3c3035b150}, {0x433d30, [], SA_RESTORER, 0x7f3c3035b150}, 8) = 0 rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f3c3035b150}, {SIG_DFL, [], SA_RESTORER, 0x7f3c3035b150}, 8) = 0 kill(19584, SIGINT) Looking into the previous log (when it doesn't exit) again, wait4(-1, 0x7fff388431c4, 0, NULL) = ? ERESTARTSYS (To be restarted) --- SIGINT (Interrupt) @ 0 (0) --- rt_sigreturn(0) = -1 EINTR (Interrupted system call) wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 9706 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 --- SIGCHLD (Child exited) @ 0 (0) --- wait4(-1, 0x7fff38842d24, WNOHANG, NULL) = -1 ECHILD (No child processes) rt_sigreturn(0x8) = 0 rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f3cbdbd0150}, {0x433d30, [], SA_RESTORER, 0x7f3cbdbd0150}, 8) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f3cbe9ab780) = 9707 Perhaps the handler for SIGCHLD clears some internal i_am_going_to_exit flag, I dunno. Oleg. >From mingo@elte.hu Fri Jan 28 19:30:18 2011 Return-Path: mingo@elte.hu Received: from zmta03.collab.prod.int.phx2.redhat.com (LHLO zmta03.collab.prod.int.phx2.redhat.com) (10.5.5.33) by mail03.corp.redhat.com with LMTP; Fri, 28 Jan 2011 13:30:18 -0500 (EST) Received: from localhost (localhost.localdomain [127.0.0.1]) by zmta03.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id 582774E0FB; Fri, 28 Jan 2011 13:30:18 -0500 (EST) Received: from zmta03.collab.prod.int.phx2.redhat.com ([127.0.0.1]) by localhost (zmta03.collab.prod.int.phx2.redhat.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id J6ZiSqngDwV1; Fri, 28 Jan 2011 13:30:18 -0500 (EST) Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by zmta03.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id C80D94DAD0; Fri, 28 Jan 2011 13:30:17 -0500 (EST) Received: from mx1.redhat.com (ext-mx05.extmail.prod.ext.phx2.redhat.com [10.5.110.9]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p0SIUGmT003086; Fri, 28 Jan 2011 13:30:16 -0500 Received: from mx3.mail.elte.hu (mx3.mail.elte.hu [157.181.1.138]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id p0SIU4Nr010577; Fri, 28 Jan 2011 13:30:06 -0500 Received: from elvis.elte.hu ([157.181.1.14]) by mx3.mail.elte.hu with esmtp (Exim) id 1Pit4x-0000zE-Ks from ; Fri, 28 Jan 2011 19:30:00 +0100 Received: by elvis.elte.hu (Postfix, from userid 1004) id 6D9093E2322; Fri, 28 Jan 2011 19:29:43 +0100 (CET) Date: Fri, 28 Jan 2011 19:29:47 +0100 From: Ingo Molnar To: Oleg Nesterov Cc: Tejun Heo , roland@redhat.com, jan.kratochvil@redhat.com, linux-kernel@vger.kernel.org, torvalds@linux-foundation.org, akpm@linux-foundation.org, Peter Zijlstra , Thomas Gleixner , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker Subject: Re: Bash not reacting to Ctrl-C Message-ID: <20110128182947.GB20056@elte.hu> References: <1296227324-25295-1-git-send-email-tj@kernel.org> <20110128165455.GA18194@elte.hu> <20110128175532.GA26727@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110128175532.GA26727@redhat.com> User-Agent: Mutt/1.5.20 (2009-08-17) Received-SPF: neutral (mx3: 157.181.1.14 is neither permitted nor denied by domain of elte.hu) client-ip=157.181.1.14; envelope-from=mingo@elte.hu; helo=elvis.elte.hu; X-ELTE-SpamScore: -2.0 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-2.0 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.2.5 -2.0 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] X-RedHat-Spam-Score: -0.012 (SPF_HELO_PASS,SPF_PASS,T_RP_MATCHES_RCVD) X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-Scanned-By: MIMEDefang 2.67 on 10.5.110.9 Status: RO X-Status: A Content-Length: 1263 Lines: 40 * Oleg Nesterov wrote: > On 01/28, Ingo Molnar wrote: > > > > The bug is that occasionally Ctrl-C does not get processed, and that the Ctrl-C is > > 'lost'. It can be reproduced here by running ./test-signal several times, and > > Ctrl-C-ing it: > > > > $ ./test-signal > > ^C > > $ ./test-signal > > ^C^C > > $ ./test-signal > > ^C > > > > See that '^C^C' line? That is where i had to do Ctrl-C twice. > > Reproduced. > > At first glance, /bin/sh should be blamed... Hmm, probably yes, > I even reproduced this under strace, and this is what I see > > wait4(-1, 0x7fff388431c4, 0, NULL) = ? ERESTARTSYS (To be restarted) > --- SIGINT (Interrupt) @ 0 (0) --- > rt_sigreturn(0) = -1 EINTR (Interrupted system call) > wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 9706 > > So, ^C is not lost, but ./test-signal doesn't want to exit. Might be some Bash assumption or race that works under other OSs but somehow Linux does differently. IIRC Bash is being developed on MacOS-X. But it's happening all the time (with yum for example - but also with makejobs, as Thomas has reported it) - this is simply the first time i managed to reproduce it with something really simple. Thanks, Ingo >From oleg@redhat.com Sat Feb 5 21:34:22 2011 Date: Sat, 5 Feb 2011 21:34:22 +0100 From: Oleg Nesterov To: Ingo Molnar Cc: Tejun Heo , roland@redhat.com, jan.kratochvil@redhat.com, linux-kernel@vger.kernel.org, torvalds@linux-foundation.org, akpm@linux-foundation.org, Peter Zijlstra , Thomas Gleixner , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker Subject: Re: Bash not reacting to Ctrl-C Message-ID: <20110205203422.GA12443@redhat.com> References: <1296227324-25295-1-git-send-email-tj@kernel.org> <20110128165455.GA18194@elte.hu> <20110128175532.GA26727@redhat.com> <20110128182947.GB20056@elte.hu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110128182947.GB20056@elte.hu> User-Agent: Mutt/1.5.18 (2008-05-17) Status: RO X-Status: A Content-Length: 1970 Lines: 55 On 01/28, Ingo Molnar wrote: > > * Oleg Nesterov wrote: > > > On 01/28, Ingo Molnar wrote: > > > > > > The bug is that occasionally Ctrl-C does not get processed, and that the Ctrl-C is > > > 'lost'. It can be reproduced here by running ./test-signal several times, and > > > Ctrl-C-ing it: > > > > > > $ ./test-signal > > > ^C > > > $ ./test-signal > > > ^C^C > > > $ ./test-signal > > > ^C > > > > > > See that '^C^C' line? That is where i had to do Ctrl-C twice. > > > > Reproduced. > > > > At first glance, /bin/sh should be blamed... Hmm, probably yes, > > I even reproduced this under strace, and this is what I see > > > > wait4(-1, 0x7fff388431c4, 0, NULL) = ? ERESTARTSYS (To be restarted) > > --- SIGINT (Interrupt) @ 0 (0) --- > > rt_sigreturn(0) = -1 EINTR (Interrupted system call) > > wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 9706 > > > > So, ^C is not lost, but ./test-signal doesn't want to exit. > > Might be some Bash assumption or race that works under other OSs but somehow Linux > does differently. IIRC Bash is being developed on MacOS-X. > > But it's happening all the time (with yum for example - but also with makejobs, as > Thomas has reported it) - this is simply the first time i managed to reproduce it > with something really simple. OK, I seem to understand what happens. Of course I am not sure, I never looked into these sources before... Suppose that jctl ^C races with the normal child exit. In this case waitchld() sets child->status = status (zero in this case) and calls set_job_status_and_cleanup(). set_job_status_and_cleanup() notice wait_sigint_received and send SIGINT to itself (termsig_handler (SIGINT)), but somehow it assumes that the last foreground job should be terminated by SIGINT too: else if (wait_sigint_received && (WTERMSIG (child->status) == SIGINT) && Then the next wait_for() clears wait_sigint_received and bash looses ^C Oleg. >From oleg@redhat.com Mon Feb 7 14:08:41 2011 Date: Mon, 7 Feb 2011 14:08:41 +0100 From: Oleg Nesterov To: Ingo Molnar Cc: Tejun Heo , roland@redhat.com, jan.kratochvil@redhat.com, linux-kernel@vger.kernel.org, torvalds@linux-foundation.org, akpm@linux-foundation.org, Peter Zijlstra , Thomas Gleixner , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker Subject: Re: Bash not reacting to Ctrl-C Message-ID: <20110207130841.GA16054@redhat.com> References: <1296227324-25295-1-git-send-email-tj@kernel.org> <20110128165455.GA18194@elte.hu> <20110128175532.GA26727@redhat.com> <20110128182947.GB20056@elte.hu> <20110205203422.GA12443@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110205203422.GA12443@redhat.com> User-Agent: Mutt/1.5.18 (2008-05-17) Status: RO Content-Length: 3145 Lines: 84 On 02/05, Oleg Nesterov wrote: > > On 01/28, Ingo Molnar wrote: > > > > * Oleg Nesterov wrote: > > > > > On 01/28, Ingo Molnar wrote: > > > > > > > > The bug is that occasionally Ctrl-C does not get processed, and that the Ctrl-C is > > > > 'lost'. It can be reproduced here by running ./test-signal several times, and > > > > Ctrl-C-ing it: > > > > > > > > $ ./test-signal > > > > ^C > > > > $ ./test-signal > > > > ^C^C > > > > $ ./test-signal > > > > ^C > > > > > > > > See that '^C^C' line? That is where i had to do Ctrl-C twice. > > > > > > Reproduced. > > > > > > At first glance, /bin/sh should be blamed... Hmm, probably yes, > > > I even reproduced this under strace, and this is what I see > > > > > > wait4(-1, 0x7fff388431c4, 0, NULL) = ? ERESTARTSYS (To be restarted) > > > --- SIGINT (Interrupt) @ 0 (0) --- > > > rt_sigreturn(0) = -1 EINTR (Interrupted system call) > > > wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 9706 > > > > > > So, ^C is not lost, but ./test-signal doesn't want to exit. > > > > Might be some Bash assumption or race that works under other OSs but somehow Linux > > does differently. IIRC Bash is being developed on MacOS-X. > > > > But it's happening all the time (with yum for example - but also with makejobs, as > > Thomas has reported it) - this is simply the first time i managed to reproduce it > > with something really simple. > > OK, I seem to understand what happens. Of course I am not sure, I never > looked into these sources before... > > Suppose that jctl ^C races with the normal child exit. In this case > waitchld() sets child->status = status (zero in this case) and calls > set_job_status_and_cleanup(). > > set_job_status_and_cleanup() notice wait_sigint_received and send > SIGINT to itself (termsig_handler (SIGINT)), but somehow it assumes > that the last foreground job should be terminated by SIGINT too: > > else if (wait_sigint_received && (WTERMSIG (child->status) == SIGINT) && > > Then the next wait_for() clears wait_sigint_received and bash > looses ^C IOW. Now that it is clear what happens, the test-case becomes even more trivial: bash-4.1$ ./bash -c 'while true; do /bin/true; done' ^C^C needs 4-5 attempts on my machine. The patch below fixes the problem, but most probably it is not correct. Although I don't understand the point of "status == SIGINT" check, we already checked this job is dead. But I won't pretend I really understand this code. Oleg. --- bash-4.1/jobs.c~ctrlc_exit_race 2011-02-07 13:52:48.000000000 +0100 +++ bash-4.1/jobs.c 2011-02-07 13:55:30.000000000 +0100 @@ -3299,7 +3299,7 @@ set_job_status_and_cleanup (job) signals are sent to process groups) or via kill(2) to the foreground process by another process (or itself). If the shell did receive the SIGINT, it needs to perform normal SIGINT processing. */ - else if (wait_sigint_received && (WTERMSIG (child->status) == SIGINT) && + else if (wait_sigint_received /*&& (WTERMSIG (child->status) == SIGINT)*/ && IS_FOREGROUND (job) && IS_JOBCONTROL (job) == 0) { int old_frozen;