[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Savannah-hackers-public] [gnu.org #622071] colonialone: disk 'sdd' fail
From: |
Peter Olson via RT |
Subject: |
[Savannah-hackers-public] [gnu.org #622071] colonialone: disk 'sdd' failed |
Date: |
Thu, 07 Oct 2010 14:53:22 -0400 |
> [beuc - Wed Oct 06 15:21:47 2010]:
>
> Hi,
>
> On Wed, Oct 06, 2010 at 03:05:04PM -0400, Peter Olson via RT wrote:
> > > [beuc - Wed Oct 06 14:46:46 2010]:
> > >
> > > Hi,
> > >
> > > Disk 'sdd' is not available anymore at colonialone.
> > >
> > > Smartmontools detected an issue, and mdadm removed it from the
> RAID
> > > array.
> > >
> > > Can you investigate and possibly replace the failed disk?
> > >
> > > Btw, did you receive the failure notifications?
> > >
> > > Thanks,
> >
> > We took the failed disk out of the RAID array because it appears to
> be a hard failure rather than a
> > glitch (all partitions containing the disk degraded at the same
> time).
> >
> > The array contained 4 members and now contains 3 members, all in
> service. We expect to replace it when
> > we next make a trip to the colo.
> >
> > colonialone:~# cat /proc/mdstat
> > Personalities : [raid1]
> > md3 : active raid1 sda6[0] sdb6[2] sdc6[1]
> > 955128384 blocks [3/3] [UUU]
> >
> > md2 : active raid1 sda5[0] sdb5[2] sdc5[1]
> > 19534976 blocks [3/3] [UUU]
> >
> > md1 : active raid1 sda2[0] sdb2[2] sdc2[1]
> > 2000000 blocks [3/3] [UUU]
> >
> > md0 : active raid1 sda1[0] sdb1[2] sdc1[1]
> > 96256 blocks [3/3] [UUU]
> >
> > unused devices: <none>
>
>
> I'm worried that 'dmesg' shows lots of ext3 errors.
>
> How can a failed disk in a RAID1x4 array cause *filesystem*-level
> errors?
>
> Do we need a fsck or something?
>
>
Here are some of the errors from dmesg:
[20930306.805714] ext3_orphan_cleanup: deleting unreferenced inode 86646
[20930306.805714] ext3_orphan_cleanup: deleting unreferenced inode 85820
[20930306.822520] ext3_orphan_cleanup: deleting unreferenced inode 86643
[20930306.829335] ext3_orphan_cleanup: deleting unreferenced inode 86645
[20930306.840398] EXT3-fs: dm-5: 30 orphan inodes deleted
[20930306.840542] EXT3-fs: recovery complete.
[20930307.015205] EXT3-fs: mounted filesystem with ordered data mode.
I found some discussion on the Net that says these messages are a normal
byproduct of making an LVM
snapshot. Are you doing this as part of your backup procedure?
I wrote a script to convert dmesg timestamps to wall clock. These messages are
issued every morning
between 07:58 and 08:15 (or sometimes as late as 08:27).
Peter Olson
FSF Senior Systems Administrator
-------------snip-------------
#! /usr/bin/env python
import sys
import datetime
dt = datetime.datetime(2010, 10, 7, 14, 29, 26)
uptime = 20952599
while True:
line = sys.stdin.readline()
if not line:
break
curtime = int(line.split('.')[0].split('[')[1])
delta = datetime.timedelta(0, curtime - uptime)
dt2 = dt + delta
print dt2.isoformat(' '), line,