[WBEL-users] Server hard disk failure

Kirby C. Bohling kbohling at birddog.com
Wed Mar 30 16:02:08 CST 2005


On Wed, Mar 30, 2005 at 10:40:55PM +0100, Francies Moore wrote:
> Hi everyone
> 
> One of my WBEL servers has crashed due to the failure of one of its hard 
> disks.  My hardware support technician (a recent Linux convert) says it 
> could be a filesystem failure.  Is such a thing possible on a machine 
> which was doing nothing over Easter?  I thought Linux was more stable 
> than "that other system" in this regard.

Yes.  Just because you aren't there doing anything doesn't really
mean much.  I've had drives fail, generally because of heat
problems, while the machine was under virtually no load.

> 
> Whatever happened to it, it cannot reboot as the ext3 journal cannot get 
> its head around the situation.

You'll need a much better description for anyone to actually advise
you on the situation (the exact error message when you boot would be
a good start).  Any weird messages in /var/log/messages would also
be helpful.

> 
> How do I recover what is left on the surviving hard disk (which contains 
> the operating system and some user files)?  Do I revert to ext2 by 
> deleting the journal and changing the fstab?  If so, where do I find the 
> journal file, and what is it called?

First, go get a new disk.  Put both the disks into a machine with
other boot media (either a Rescue CD, or another linux system).
Copy from the problem drive to the other drive.  Using "dd" more
then likely.  If you can't make the copy, unless the drives is worth
shipping to a speciality data recovery shop, you are are probably
stuck.  If I was doing this, I'd have a fourth drive, I'd copy the
good drive, to the second good drive.  I've had dying drives that
were only good enough to make a copy once.

Work off of the new good hardware and try running fsck.  See what
kinds of errors it gives.  The fsck will either finish cleanly, and
you'll have what you have.  The other alternative is that fsck won't
finish cleanly.  Send it to a data recovery place if it is worth the
money to you.

Finally, if fsck won't finish, sometimes you can just mount the
drive, run tar, and get something.  I've done this before.  

There's no way, I'd even try booting the drive that died other then
to get the error messages.  Even then, I'd probably just put it into
another machine, and try mounting it.  It's much safer.

> Can I go back to ext3 when a new HD is fitted?

Yes.  We've run ext2/3 on all of our Linux machines for 5 years where
I work.  My guess is that your drive failed.  We have run our
production database machines on it for 3.5.  Ext3 is very reliable.
I've seen a handful of corruption issues with ext3 systems in that
time.  Most of those, I'm fairly sure were hardware, not software
related.

	Thanks,
		Kirby



More information about the Whitebox-users mailing list