help-grub
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Install failes after md raid replace


From: Tobias Lang
Subject: Re: Install failes after md raid replace
Date: Mon, 4 Jan 2016 20:19:59 +0100
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:38.0) Gecko/20100101 Thunderbird/38.5.0

Am 04.01.16 um 18:23 schrieb Andrei Borzenkov:
> 04.01.2016 20:14, Tobias Lang пишет:
>> Am 04.01.16 um 17:51 schrieb Andrei Borzenkov:
>>> 04.01.2016 12:40, Tobias Lang пишет:
>>>> Hi everybody,
>>>>
>>>> after the replacement of a faulty disk in a md raid setup (Raid 1), we
>>>> are not able to update grub.
>>>>
>>>> The system is an Ubuntu 12.04.5 LTS, with the latests patches installed,
>>>> and running Linux version 3.2.0-95-generic (address@hidden).
>>>>
>>>> Disk replacement procedure involved the following steps:
>>>>
>>>> # sgdisk -R /dev/sda /dev/sdb
>>>> # sgdisk -G /dev/sda
>>>> # mdadm /dev/mdX -a /dev/sdbX
>>>>
>>>
>>> Sorry, I do not understand this. I assume /dev/sdb is the current good
>>> disk and /dev/sda is replacement disk. In this case you are trying to
>>> add disk that already exists in array once more. This is simply not
>>> possible.
>>
>> The disk showed a failure like this:
>>
>> ----------------------------------------------------
>> md2 : active raid1 sdb3[1] sda3[0](F)
>>       1073740664 blocks super 1.2 [2/1] [_U]
>> ----------------------------------------------------
>>
>> I removed the disk from the array before replacing it with a new one:
>>
>> # mdadm /dev/mdX -r /dev/sdaX
>>
>> After that, the array only had one disk (as this is a two disk array).
>>
> 
> Again - your array had /dev/sdb3 at this point (because this is the only
> non-failed disk). You tried to add /dev/sdb3 again. That makes no sense,
> sorry. Apparently some step is missing or incorrectly described.
> 
>>>> After resync, this failes:
>>>>
>>>
>>> Resync? I get "/dev/vdb1 busy".
>>
>> Of course, the raid has to resync the data to the replaced (and
>> currently) empty disk.
>>
> 
> You misunderstand. When I execute this command /dev/vdb1 *is* part of
> active array; I cannot "add" it again. There is nothing to sync yet.

Now I get it. I just had a typo in the first email, sorry for the
confusion. Here are the exact steps (backed by bash history) I took to
replace the faulty disk '/dev/sda':

Contents of '/proc/mdstat' on failure:

---------------
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md0 : active raid1 sdb1[1] sda1[0]
      33553336 blocks super 1.2 [2/2] [UU]

md2 : active raid1 sdb3[1] sda3[0](F)
      1073740664 blocks super 1.2 [2/1] [_U]

md1 : active raid1 sdb2[1] sda2[0]
      524276 blocks super 1.2 [2/2] [UU]

md3 : active raid1 sdb4[1] sda4[0](F)
      1822442815 blocks super 1.2 [2/1] [_U]

unused devices: <none>
---------------

Removing of the faulty disk '/dev/sda':

# mdadm --manage /dev/md0 --fail /dev/sda1
# mdadm --manage /dev/md1 --fail /dev/sda2

# mdadm /dev/md0 -r /dev/sda1
# mdadm /dev/md1 -r /dev/sda2
# mdadm /dev/md2 -r /dev/sda3
# mdadm /dev/md3 -r /dev/sda4

Next step was to shutdown the server and replace the disk. This was not
done by myself. The server is a production system hosted with the German
provider Hetzner.

After the system came up again, I added the disk to the array:

# sgdisk -R /dev/sda /dev/sdb
# sgdisk -G /dev/sda

# mdadm /dev/md0 -a /dev/sda1
# mdadm /dev/md1 -a /dev/sda2
# mdadm /dev/md2 -a /dev/sda3
# mdadm /dev/md3 -a /dev/sda4

I have no '/proc/mdstat' for during the sync process. However, I tried
to reinstall grub while the syncing was not yet finished (and I skipped
grub-mkdevicemap in the beginning, because I just forgot about it):

# grub-install /dev/sda
  /usr/sbin/grub-probe: error: unknown filesystem.
  Auto-detection of a filesystem of /dev/md1 failed.
  Try with --recheck.
# grub-install --recheck /dev/sda
  /usr/sbin/grub-probe: error: unknown filesystem.
  Auto-detection of a filesystem of /dev/md1 failed.
  Try with --recheck.
# grub-mkdevicemap -n
# grub-install /dev/sda
  /usr/sbin/grub-probe: error: unknown filesystem.
  Auto-detection of a filesystem of /dev/md1 failed.
  Try with --recheck.

>>> Please show exact steps to reproduce starting with degraded array; and
>>> show array state at each step (cat /proc/mdstat and mdadm --detail
>>> /dev/mdX).
>>>
>>
>> The current /proc/mdstat looks like this:
>>
> 
> And grub-probe still fails? Do you still have failed disk present in the
> system and visible?

Yes, grub-probe still fails. And no, the failed disk has been replaced
and a restart has been made. I have no idea, how the failed still could
be present somewhere.

>> ----------------------------------------------------
>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>> [raid4] [raid10]
>> md1 : active raid1 sda2[2] sdb2[1]
>>       524276 blocks super 1.2 [2/2] [UU]
>>
>> md3 : active raid1 sda4[2] sdb4[1]
>>       1822442815 blocks super 1.2 [2/2] [UU]
>>
>> md2 : active raid1 sda3[2] sdb3[1]
>>       1073740664 blocks super 1.2 [2/2] [UU]
>>
>> md0 : active raid1 sda1[2] sdb1[1]
>>       33553336 blocks super 1.2 [2/2] [UU]
>>
>> unused devices: <none>
>>
>> ----------------------------------------------------
>>
>> An example 'mdadm --detail /dev/md0' looks like this:
>>
> 
> I can produce such example myself. I am interested in steps to reproduce
> this problem.

And I would be interested in fixing this problem and making the system
bootable again (which I would suppose it cannot do right now). Any ideas?

>> ----------------------------------------------------
>> /dev/md0:
>>         Version : 1.2
>>   Creation Time : Fri Apr 19 10:11:20 2013
>>      Raid Level : raid1
>>      Array Size : 33553336 (32.00 GiB 34.36 GB)
>>   Used Dev Size : 33553336 (32.00 GiB 34.36 GB)
>>    Raid Devices : 2
>>   Total Devices : 2
>>     Persistence : Superblock is persistent
>>
>>     Update Time : Sat Jan  2 22:25:50 2016
>>           State : clean
>>  Active Devices : 2
>> Working Devices : 2
>>  Failed Devices : 0
>>   Spare Devices : 0
>>
>>            Name : rescue:0
>>            UUID : 72974d0d:40970c9e:83dfc291:dd07bfba
>>          Events : 162
>>
>>     Number   Major   Minor   RaidDevice State
>>        2       8        1        0      active sync   /dev/sda1
>>        1       8       17        1      active sync   /dev/sdb1
>>
>> ----------------------------------------------------
>>
>> Best regards
>>
>> Tobi
>>
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]