Sat, 01 Dec 2007

AFS fileserver issue

One of our AFS fileservers lost a disk late this afternoon, resulting in a couple hours of downtime. A single disk failure shouldn't result in any downtime, but in this case it did. The disk was part of a mirror set hosting the machine's root filesystem and boot blocks, and for some reason it didn't seem to notice correctly that the disk had failed, so it continued trying to access it. This resulted in access attempts hanging, causing the machine to develop a backlog of AFS fileserver requests eventually triggering an alert to the TIG oncall people (which included me this weekend).

The dead disk has been replaced, and things are OK again...

0 writebacks

writebacks...

trackback

TrackBack ping me at:

http://people.csail.mit.edu/noahm/blosxom.cgi/sims.trackback

comment...

 
Name:
URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Comments:
Please enter the text shown in this image in the adjacent text field: