[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Savannah-hackers-public] Re: [gnu.org #622071] colonialone: disk 'sdd'
From: |
Sylvain Beucler via RT |
Subject: |
[Savannah-hackers-public] Re: [gnu.org #622071] colonialone: disk 'sdd' failed |
Date: |
Sun, 10 Oct 2010 03:42:07 -0400 |
Hi,
On Thu, Oct 07, 2010 at 02:53:22PM -0400, Peter Olson via RT wrote:
> > [beuc - Wed Oct 06 15:21:47 2010]:
> > Hi,
> >
> > On Wed, Oct 06, 2010 at 03:05:04PM -0400, Peter Olson via RT wrote:
> > > > [beuc - Wed Oct 06 14:46:46 2010]:
> > > >
> > > > Hi,
> > > >
> > > > Disk 'sdd' is not available anymore at colonialone.
> > > >
> > > > Smartmontools detected an issue, and mdadm removed it from the
> > RAID
> > > > array.
> > > >
> > > > Can you investigate and possibly replace the failed disk?
> > > >
> > > > Btw, did you receive the failure notifications?
> > > >
> > > > Thanks,
> > >
> > > We took the failed disk out of the RAID array because it appears to
> > be a hard failure rather than a
> > > glitch (all partitions containing the disk degraded at the same
> > time).
> > >
> > > The array contained 4 members and now contains 3 members, all in
> > service. We expect to replace it when
> > > we next make a trip to the colo.
> > >
> > > colonialone:~# cat /proc/mdstat
> > > Personalities : [raid1]
> > > md3 : active raid1 sda6[0] sdb6[2] sdc6[1]
> > > 955128384 blocks [3/3] [UUU]
> > >
> > > md2 : active raid1 sda5[0] sdb5[2] sdc5[1]
> > > 19534976 blocks [3/3] [UUU]
> > >
> > > md1 : active raid1 sda2[0] sdb2[2] sdc2[1]
> > > 2000000 blocks [3/3] [UUU]
> > >
> > > md0 : active raid1 sda1[0] sdb1[2] sdc1[1]
> > > 96256 blocks [3/3] [UUU]
> > >
> > > unused devices: <none>
> >
> >
> > I'm worried that 'dmesg' shows lots of ext3 errors.
> >
> > How can a failed disk in a RAID1x4 array cause *filesystem*-level
> > errors?
> >
> > Do we need a fsck or something?
>
> Here are some of the errors from dmesg:
>
> [20930306.805714] ext3_orphan_cleanup: deleting unreferenced inode 86646
> [20930306.805714] ext3_orphan_cleanup: deleting unreferenced inode 85820
> [20930306.822520] ext3_orphan_cleanup: deleting unreferenced inode 86643
> [20930306.829335] ext3_orphan_cleanup: deleting unreferenced inode 86645
> [20930306.840398] EXT3-fs: dm-5: 30 orphan inodes deleted
> [20930306.840542] EXT3-fs: recovery complete.
> [20930307.015205] EXT3-fs: mounted filesystem with ordered data mode.
>
> I found some discussion on the Net that says these messages are a normal
> byproduct of making an LVM
> snapshot. Are you doing this as part of your backup procedure?
Yes (cf. remote_backup.sh).
Good to know it's not a disk error, thanks.
> I wrote a script to convert dmesg timestamps to wall clock. These messages
> are issued every morning
> between 07:58 and 08:15 (or sometimes as late as 08:27).
Yes, the backup from savannah-backup.gnu.org runs at 12:00 GMT.
Also, LVM is still looking for /dev/sdd7
colonialone:~# lvs
/dev/sdd7: read failed after 0 of 2048 at 0: Input/output error
[...]
I suggest we plan a reboot this week.
--
Sylvain