grub-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: grub rescue read or write sector outside of partition


From: Dale Carstensen
Subject: Re: grub rescue read or write sector outside of partition
Date: Sat, 27 Jun 2015 15:17:38 -0700

TL;DR it looks to me like grub has a problem with leaving failed
mdadm RAID6 members around

Thanks to Fajar A. Nugraha for the advice about --modules for
grub-install (seems to me to be undocumented).  I managed to
stumble through without enhancing the commands for "grub rescue",
but it's good to know I could have.

I still have a question, though.

The grub.cfg file has menuentry nesting, with an outer name of
"Gentoo GNU/Linux", and inner names by version/recovery.  But
I can't find any documentation of how to navigate to choose,
say, 3.8.11, now that I've made 4.0.5 default.  Seems to me
all the lines used to show up.  Maybe I manually took out the
nesting before??

So what key(s) drill down into sub-menus on the grub menu?
Did I miss it in the info page / manual?

>Date:   Fri, 26 Jun 2015 11:11:14 +0300
>To:     "Dale Carstensen" <address@hidden>
>cc:     address@hidden
>From:   Andrei Borzenkov <address@hidden>
>Subject: Re: grub rescue read or write sector outside of partition

> Thu, 25 Jun 2015 17:33:25 -0700
>"Dale Carstensen" <address@hidden> :
>
>> I had a drive fail, and it is the one that had grub on it.
>> It had parts of two RAID-6 partitions, too.  So I bought a
>> new drive and added partitions on it to replace the failed
>> RAID-6 parts.  That was still booting OK from the failed
>> drive, but then I updated the kernel, and I decided to also
>> install a new grub on the new drive.
>
>How? Please show exact commands you used as well as your disk
>configuration.

The bash history is long gone.  My feeble memory is that it
was simply

 grub2-install /dev/sdf

and it responded there were no errors.

Eventually I booted from a DVD and used chroot to do

 grub2-install /dev/sdb

The disk configuration, as shown by /proc/mdstat, is:

md126 : active raid6 sdf8[5] sdd1[4] sdc1[3] sdb1[2] sda1[1]
      87836160 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/5] [UUUUU]
      
md127 : active raid6 sdf10[5] sdd3[4] sdc3[3] sdb3[2] sda3[1]
      840640512 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/5] [UUUUU]
      bitmap: 1/3 pages [4KB], 65536KB chunk

/ is mounted from md126p1, /home from md127p1.

The sdf8 and sdf10 partitions are on the replacement drive.
The former partitions those replaced are still on sde8 and
sde10.

Grub calls sde (hd0), sdf (hd1), md126 (md/1) and md127 (md/3).
The DVD boot calls sde sda, and sdf sdb.  All neatly made
consistent by those long UUID strings.  And grub calls
md126p1 (md/1,gpt1), but for command input seems to like
(md/1,1) without the label-type distinction.

Or maybe I have md/1 and md/3 swapped??  I hope not.

The command that replaced the bad drive with the good in RAID6
was

 mdadm --add /dev/md126 /dev/sdf8

Below, I'll show what I think has made it stable and useful
again.

>> That seemed to go OK until I tried to reboot.  I landed in
>> grub rescue.  Fortunately I have several computers, so I can
>> look up documentation, etc. without my main desktop functioning.
>> Somewhere I found that grub rescue has only a few commands, none
>> of them "help" or a list of commands, and no TAB-expansions.
>> Well, they seem to be ls, set, unset and insmod.  Supposedly,
>> running insmod normal, then normal, will get back to the
>> fuller set of commands with help, but that's where it gets
>> the "outside of partition" error, it seems.
>>
>> I can ls the /boot/grub/i386-pc/ directory, where normal.mod
>> is, so I would think grub rescue could find and read normal.mod,
>> too, but, I guess not.
>>
>
>Please show output of "set" command at this point.

In the original grub rescue event, I think set output this:

cmdpath=(hd0)
prefix=(mduuid/73fc9531-525f-05e9-6992-6654b5b95a33,1)/boot/grub
root=mduuid/73fc9531-525f-05e9-6992-6654b5b95a33,1

And the number 73fc...5a33 is the blkid for /dev/sdf8.  I think it
was just the three variables.

Note that I booted from (hd1), but somehow cmdpath got
diverted to (hd0), though the UUID for prefix and root were
still on (hd1).  Unless I misremember.

>
>> So, set debug=all helped a little, expanding the message
>> from just something like (I'd have to keep trying to
>> reboot to get it verbatim) read or write bad, to
>> the specific size of the partition (in decimal, around
>> 175 million 512-byte blocks) and the sector it is trying
>> to read (read.c:461) (in hexadecimal), around 10 million.
>> But 10 million hex really is larger than 175 million
>> decimal.
>>
>> So maybe my BIOS has some limitation on how deep it can
>> read into this 2 TB drive, or maybe the drive having
>> hardware sectors of 4096 bytes replacing one with
>> 512 confuses grub.  But the old drive with the failures
>> gets the same problem.
>>
>> It's gentoo, grub2 (I could look up the version once it's
>> running again),

Part of the output of

 eix grub | cat

is

 [I] sys-boot/grub
  ...
 Installed versions:  2.02_beta2-r3(2)^t(07:25:12 03/13/15)(multislot nls sdl 
truetype -debug -device-mapper -doc -efiemu -libzfs -mount -static -test 
GRUB_PLATFORMS="-coreboot -efi-32 -efi-64 -emu -ieee1275 -loongson -multiboot 
-pc -qemu -qemu-mips -xen")

>> 64-bit (although grub seems not to really
>> notice 32- vs 64-bit, or the kernel, so I'm not sure it's
>> just smart or really dumb),

It is multilib, so 32- vs 64-bit appearance is nuanced.

>> and, like I say, the / partition
>> is RAID-6, including /boot.  I'm going to try making a
>> non-RAID /boot, maybe later I'll try making it RAID-1,
>> to see if that helps.
>>
>> Any advise?
>>
>> Thanks.

Well, it seems to work again.

The first baby step was to make a partition on (hd1)/sdb/sdf
starting at block 34 and ending at block 2047.  Partition 8
begins at block 2048, and originally I set it to type ef02.
Then I changed it to fd00 and made the block 34 partition (11)
type ef02.  I tried to make that partition 11 ext3 and
put some of /boot in it, but obviously it's way too short
for anything but grub's persistent environment.  So I used
dd with if=/dev/zero to clear it.  And I did grub2-install
with the --recheck option.  All while booted from DVD and
using chroot, keeping in mind the device was /dev/sdb.

That avoided "grub rescue", but the only kernel it found was
the old one, 3.8.11.

I stabbed in the dark through another 4 or 5 reboots, until
eventually pager=1 and cat to look at /boot/grub/grub.cfg
showed that the only menuentry in it for Linux was for
3.8.11, while I knew the latest grub.cfg I had also had
the new 4.0.5, as well as older 3.8.9 and 3.8.7 ones.
I'm still not sure where that grub.cfg came from, but I
made the assumption that it had to do with grub being too
liberal about failed members of RAID6 partitions.

So I ran

 mdadm --zero-superblock /dev/sde8

and also for 10.

I think that fixed things.  Oh, I also had, before the
zero-superblock, changed /etc/default/grub to set the
default menu item to the long weird id for 4.0.5.

So, it's working, or at least appears to work.  I suppose
I should check whether cmdpath in grub is (hd1) or maybe
is still the incorrect (hd0).





reply via email to

[Prev in Thread] Current Thread [Next in Thread]