savannah-hackers-public
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Savannah-hackers-public] git, svn, cvs, Outage Postmortem 2019-09-09


From: Bob Proulx
Subject: [Savannah-hackers-public] git, svn, cvs, Outage Postmortem 2019-09-09
Date: Mon, 9 Sep 2019 21:19:33 -0600
User-agent: NeoMutt/20170113 (1.7.2)

Thursday Sept 5 at about 11am the Ansible configuration management
tool installed the latest Trisquel Linux kernel security upgrade on
nfs1 our NFS server for the main storage array.  Installed but not
booted and therefore not yet active.  Friday I received the normal
notification that there was a new kernel installed on it and therefore
it would eventually need to be rebooted.  But Friday I was mostly
offline and had no time for it.  And of course over the weekend there
is no FSF admin support in case there are problems.  Therefore the
reboot slid until today.

Today around 12:30 US/Mountain time I rebooted nfs1 for the new kernel
and the new systemd packages.  The reboot initially appeared to
complete successfully.  But then rebooting download0 for the same
upgrades failed to NFS mount one of the two partitions mounted from
nfs1.  And vcs0 also started reporting stale nfs mounts.  We started
debugging the problem immediately.

It took a while before I started realize that the problem was the
kernel because initially it looked like a networking connectivity
problem.  Looked like IPv4 failing but IPv6 working.  Looked like a
firewall blocking the mount handshake.  Looked like other things.
Very strange was that one of the two mount points on download0 would
usually mount okay but the other would would time out.  Very bizarre!

I chased down those dead ends before I decided that it must be the
kernel and should reboot back to the previously installed and
previously working one.  Had already rebooted with the new kernel
multiple times yet it still had these weird failures.  Became very
happy when booting back to the old kernel returned things to sanity.
The problem appears to be in the new kernel.

  ii  linux-image-unsigned-4.4.0-161-generic  4.4.0-161.189+8.0trisquel2   
amd64  Linux-libre kernel image for version 4.4.0
  ii  linux-modules-4.4.0-161-generic         4.4.0-161.189+8.0trisquel2   
amd64  Linux-libre kernel extra modules for version 4.4.0

I have marked the working previous kernel as held so as to prevent it
being removed in a future upgrade.

  hi  linux-image-unsigned-4.4.0-159-generic  4.4.0-159.187+8.0trisquel2   
amd64  Linux-libre kernel image for version 4.4.0
  hi  linux-modules-4.4.0-159-generic         4.4.0-159.187+8.0trisquel2   
amd64  Linux-libre kernel extra modules for version 4.4.0

And that is all I know.  Things are back working as before using the
previously installed and running kernel.  I filed a ticket with the
FSF RT system about the issue as it concerns our systems.

Bob



reply via email to

[Prev in Thread] Current Thread [Next in Thread]