[lmi] More robust multibooting

lmi

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lmi] More robust multibooting

From:	Greg Chicares
Subject:	[lmi] More robust multibooting
Date:	Tue, 10 Sep 2019 23:46:03 +0000
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0

Vadim--I'm writing this mainly as documentation. I do ask a couple
of questions, which searching for the word 'question' will find. But
feel free to comment on anything else if you're so inclined.

For quite a while, I had been managing a multiboot system as follows:
 - set up a dedicated boot partition
 - mount that as /boot in every installation's /etc/fstab
 - when debian issues a new 'stable' release, create a new partition
     for it, and do a fresh installation from scratch there
I don't worry about disk space: I've been using this computer for
almost four years, and haven't yet managed to consume even a hundred
gigabytes, which would be seven percent of the 1500 available. I
figured it would be most robust to leave a stable old installation
in place when installing a new one, in case the new one doesn't work.

Then I tried to install fedora (so that I'd have an optional system
that's more similar to the RHEL server in the office). I figured
nothing could go wrong: I've had OpenBSD installed for years, and
fedora's less dissimilar from debian. But in retrospect I can see
that OpenBSD's dissimilarity was a virtue: I had to chainload it in
grub, so it couldn't mess up grub.

You see where this is going. When I install a new GNU/Linux system,
by default it wants to take ownership of grub. Long story short, my
debian system became unbootable. I tried rebooting it from several
different live CDs (debian; 'grub rescue disk'; 'rescatux'), but
none of those actually worked in this case--at worst, they failed
utterly, and at best, they booted into a partly-working system.

Now, since debian had promoted 'buster' to 'stable', I figured it
was time to upgrade anyway, so I installed 'buster' on its own
partition. When I chrooted into the old 'stretch' system, it mostly
worked, but not quite: notably, my trackball didn't work, and it's
rather difficult to use xfce with no pointing device.

Then I added a new installation of 'stretch', figuring that at
worst I could just 'dd' the old 'stretch' system onto it. By this
time, I had gathered that a shared /boot partition was part of the
problem, so I installed this new system without any bootloader.
That worked just fine.

But by now I had dug so deeply into grub that I wanted to find out
how to make it "just work". Here's the answer I came up with:

$cat /etc/grub.d/40_custom
menuentry 'Debian GNU/Linux 9 (stretch SIMPLE) (on /dev/sda1)' {
        insmod part_msdos
        insmod ext2
        set root='hd0,msdos1'
        search --no-floppy --label --set=root --label stretch
        echo    'Loading stretch (simply) ...'
        linux   /vmlinuz root=LABEL=stretch ro intel_iommu=on 
libata.force=noncqtrim
        echo    'Loading initial ramdisk ...'
        initrd  /initrd.img
}

This is starkly different from the menu entries written by
'update-grub'. Most notably, the boot partition isn't mentioned
here at all. This installation (which is the 'stretch' system I've
been using for years) is on /dev/sda1 = (hd0,msdos1), and this
40_custom stanza mentions no other drive or partition at all.
And it has the great virtue of actually working.

Of course, I went back and did some cleanup. First, I commented
out the old /boot entry in this installation's /etc/fstab. Then I
fixed up its swap file (details below) and recreated its initrd.
But now it seems to work perfectly.

I conjecture that adding a fresh installation of 'stretch' made
recovery more difficult, because versioned files like
  vmlinuz-4.9.0-9-amd64
  initrd.img-4.9.0-9-amd64
were written to the same /boot by different installations.
Normally, I guess, no one would do what I did, and collisions
wouldn't occur, but in this case I suspect they did collide.

Here are some things I've learned.

First of all, UUIDs are really not such a great idea. True, they
were helpful on my old supermicro where I often swapped rotary
hard disks in and out: that is, they're less impermanent than
device names like /dev/sda. But UUIDs can change, for reasons
that I don't necessarily understand. The debian installer, for
instance, reformats any swap partitions it finds, resulting in a
different UUID. I now think labels should be used instead: they're
less likely to change; any software that incidentally alters them
is more likely to erase them altogether, which may be inconvenient
but is easily seen and fixed; and they're easier to read and type.

This is a single-user system with 'hibernate' and 'suspend' both
inhibited, so it should be perfectly fine to share a swap partition
across all installed linuces. However, coping with UUID changes is
not as simple as changing /etc/fstab: there's a swap UUID in
  /etc/initramfs-tools/conf.d/resume
which matters at boot time even though I never hibernate or suspend,
and setting its contents to 'RESUME=' or 'RESUME=NONE' doesn't work:
apparently it's necessary to insert the updated UUID, and then of
course 'update-initramfs -u'. Questions:

 - Does it even make any sense to use swap, on a 32-hyperthread box
     with 64GB of RAM?

 - If swap is still useful, wouldn't a swapfile be better than a
     swap partition, given that partitions have fragile UUIDs,
     while a swapfile can be local?

Searching online yields no clearly definitive answer. Here's one of
the better-written articles:
  https://haydenjames.io/linux-performance-almost-always-add-swap-space/
which suggests two benefits:
 - Pages that are hardly ever used get swapped out, liberating RAM
     for more useful purposes. But the only time this box ever has
     a heavy load is during parallel compilation, which never seems
     to use even half the RAM available.
 - Swap space provides a sort of cushion in case memory is about to
     be exhausted: responsiveness degrades more slowly, and perhaps
     that provides an opportunity to kill a rogue process before the
     OOM handler is triggered. But a process that can eat 64 GB may
     just as well eat 164 GB; and without the audible feedback of an
     old-fashioned rotary HDD, I'm not sure I'd notice a problem in
     time to do anything about it.
So I'm inclined to suppress swapping altogether. Is that unwise?

At any rate, this box has two SSDs, each of which has a dedicated
swap partition (rationale: it should still work if I remove one of
the drives, or if one gets bricked), and the debian installer tries
to use both; I think I should use one at most, if not zero. It seems
silly enough to use a 4 GB swap partition, but using two for a total
of 8 GB is surely much more trouble than it's worth.

[Prev in Thread]

Current Thread

[Next in Thread]

[lmi] More robust multibooting, Greg Chicares <=
- Re: [lmi] More robust multibooting, Vadim Zeitlin, 2019/09/11

Prev by Date: Re: [lmi] Debian buster changed its 'Version' value
Next by Date: [lmi] RHEL userid puzzlement
Previous by thread: [lmi] Debian buster changed its 'Version' value
Next by thread: Re: [lmi] More robust multibooting
Index(es):
- Date
- Thread