Re: Grub Failure when HDD descriptor changes

bug-grub
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Grub Failure when HDD descriptor changes

From:	pjones
Subject:	Re: Grub Failure when HDD descriptor changes
Date:	Tue, 22 Mar 2005 15:00:58 -0500
On Tue, 2005-03-22 at 19:41 +0100, Molle Bestefich wrote:
> Oh, but there is a way.
> 
> You could, at boot-up, calculate a md5 hash based on the first sector
> of every disk.  If there's a duplicate hash, load the last sector of
> every disk also and calculate the md5 based on both.  If there's still
> duplicates, add one more sector from the head of the disk.  Continue
> till you have completely unique hashes, or, a (user-definable) maximum
> number of sectors to traverse has been reached.
>
> When Linux has finished bringing up IDE drivers and device-mapper
> devices, scan the disks again.  (The bootloader should probably
> include information next to the md5 hashes on how many sectors it had
> to scan).  There you go, Linux can easily tell which BIOS disks map to
> which Linux disks.  :-).

This idea's been entertained in a rather wide swath of different
methods.  It doesn't work.  If you're installing the OS and you've got
two identical disks, in both geometry and data (that is, disks that
Seagate says passed their burn-in tests, or that were just retasked from
an existing raid1 setup which haven't been artificially made unique),
you get a collision.  And that's the very time you need to know where to
install a bootloader.

The checksum method is *exactly* analogous to the earlier hack for this,
which is that somebody (*cough*) marks the drive and partition as
bootable in the MBR and partition table, so whatever comes along later
can tell. They both have the same insurmountable flaw -- the first time
you need to know they're different is when you're dropping those
breadcrumbs to begin with.

Sadly, the best heuristic to really find out what's bootable tends to be
"Did the manufacturer mark this partition as bootable in the partition
table when they put windows on the machine?", but of course this is a
complete cop-out method, and it doesn't work if you're not buying the
machine with windows pre-installed.

There is something of a hack you can do for the two-identical-disks
case, though.  It's not a solution, but it avoids the problem.  That is:
assume they're both bootable, write some random data to the
mbr_signature (see below when I talk about EDD some more), and then
write *exactly* the same boot block out to each disk.  You also have to
write the same data for all of the /boot partition to both disks.
Essentially, raid1 of /boot, but without actually using raid.

> Another, somewhat kinkier approach, would be to load a kernel module
> that took the Big-Kernel-Lock (tm), did a lot of INT 13h'ing to
> calculate unique md5 hashes, undid the kernel lock, and went back into
> Linux space and did the same through the standard interfaces.

There's code to do essentially this already, but it has the same failure
I mentioned above.  Right now the EDD detection is in two parts.  The
first runs before Linux switches out of real mode and can int13 to its
heart's content.  The second part is the "edd" kernel module, which
exposes (via /sys/firmware/edd) the data the first part collects.

That MBR has a "signature" field, which is read and stored in a kernel
buffer, and then edd tests if the drive can do EDD.  If so, it retrieves
that data as well.  So if you look at /sys/firmware/edd/ (after loading
edd.ko), you see at least something like:

vroomfondel:/sys/firmware/edd$ find
.
./int13_dev80
./int13_dev80/mbr_signature

[ Depending on what rev of EDD is supported by the drive, you may
  see more files.  Nothing in edd 1 or 2 is useful at all, but 3
  provides us the (generally bogus) data about what card/bus/device
  the drive is. ]

But you still have the same problem -- on two identical drives that have
been wiped clean, mbr_signature will be the same.  So you can't just
read /dev/hda and compare to the signature, and checksums don't help,
even with extra sectors being taken into account.

If you can do int13 reads and writes once your OS is started, which I've
not seen code to do safely on Linux, but _might_ be possible, then this
is all overkill -- just write out unique mbr_signatures to the bios IDs,
read the MBRs from the unix device, and be done.

I've tried doing int13-type stuff from Linux before.  It works
sometimes, not other times, mostly depending on which registers the call
uses and which locks are taken.  Probing DMI and DDC generally work ok,
but I couldn't get e.g. EDD probing to work from userland.  I haven't
had time to figure out why not yet, as it's not a big priority.  You
could do this same thing by writing sentinel values out to the mbr
signature on the unix device, taking a reboot, and making the early-boot
real mode code write out unique IDs where it finds your sentinel.  But
that *really* sucks for dual boot, where ideally we wouldn't touch the
other OS's disk.

> That said, it seems a big solution to a small problem.

Yep.

Heuristics present an OK answer.  Likely if you've got SCSI and IDE, the
ide only has a CD drive, so obviously you're booting SCSI.  If you've
got a bunch of SCSI disks and a single IDE disk, that's probably because
CVS's grub-install won't work with /boot on software raid, so Pogo
shipped you a box with an IDE boot drive ;)

But you basically have to figure out a scenario for every case, and you
still can't do squat about two identical disks, so that sucks too.

> It would be much easier to just ask the user how he thinks the disks
> are layed out.

This is the "scare the user off so we don't have to worry about them any
more" plan.

> If he's in doubt, tell him to go download a GRUB bootdisk and run eg.:
> grub> geometry (hd0)
> grub> geometry (hd1)
> Then go back to Linux, do the same and compare.

That doesn't work for the mass market at all, and that's who we've got
to contend with.  When presented with this situation, an inconveniently
large chunk of e.g. Red Hat's customers will respond with something
along the lines of: "I have two drives?"  That's if you're lucky.  More
often, they say "Windows just does this for me!  This is too hard", even
though Windows doesn't try to solve this problem at all, AFAIK.

> Not fool-proof, especially if someone uses multiple disks of the same
> size, but it should do.

Same scenario where everything else fails.

> I'm advocating a bit against the current approach of taking a wild
> guess since it decreases the likelihood that anyone will trust GRUB to
> modify their MBR (eg. use GRUB ;-)).

Well, in reality, OS installation programs do the decision making, not
grub.  The OS installer writes device.map , and it tells grub (through
grub-install or the grub shell) "setup (hd0)", etc.

> But I'd like to know how it's done before I judge it to be completely
> insane :-).

It isn't something grub typically has to do ; each distro does it
differently during installation when they write device.map .

> Also, whom should I ask if I wanted something to be committed to GRUB CVS?

Okuji, preferably via this list.

-- 
        Peter
[Prev in Thread]
Current Thread
[Next in Thread]
Re: Grub Failure when HDD descriptor changes, (continued)
Prev by Date: Re: Grub Failure when HDD descriptor changes
Next by Date: Re: Grub Failure when HDD descriptor changes
Previous by thread: Re: Grub Failure when HDD descriptor changes
Next by thread: Re: Grub Failure when HDD descriptor changes
Index(es):
- Date
- Thread